Following the well-attended seminar on ‘Managing, sharing and curating your research data in a digital environment’ on 6th March 2018, Sonia Barbosa, Manager of Data Curation, Institute of Quantitative Social Science Dataverse, Harvard University and Danny Brooke, Dataverse Development Project Manager, Harvard University, conducted two ‘Bring Your Own Data’ workshops, one for soft sciences and another for hard sciences researchers.
The ‘Bring Your Own Data’ workshops provided researchers with the opportunity with a hands-on experience in sharing their datasets on DR-NTU (Data).
Participants began by creating a sub-dataverse for themselves under their respective school/research centre and learnt that they could do the following customisations to make their sub-dataverses more user-friendly:
- Customising the browse/search facets in their sub-dataverses to facilitate better browsing and discovery of their datasets
- Customising a dataset template with the relevant fields to describe their research data
After which, Sonia went on to explain what constitutes a high quality dataset records and dispensed the following tips to ensure the visibility and reusability of datasets:
- Description of the dataset record should be as comprehensive as possible to provide ample context of the research data files and if they should be accessed in any particular order or with any particular software
- Related publication’s citation (if any) should be included in the dataset record to allow other researchers to refer the publication for more information
- Code files, on top of final research data files, should be included to allow other researchers to reproduce the data where possible
- File names should be short but meaningful so that users will know what to expect when their access the files
- Use data tags to label the data files for better organization
- Data files should be saved in open (i.e. software-agnostic) file formats to ensure long term accessibility where possible
- Sensitive data should be sufficiently de-identified before sharing them publicly
To wrap up the workshops, Sonia and Danny went through the list of questions (see below) which the participants posed on slido. We hope to incorporate as many of the questions as possible in our FAQ soon.
We hope that the workshop participants would continue with the best practices in sharing their datasets on DR-NTU (Data) as demonstrated by Sonia and Danny!
- How do you ensure data security? Is Dropbox a secure platform for intermediary data sharing (e.g. lab members enter data and update datasheets in Dropbox)?
- Some types of data are not stable over time (e.g. reproducibility issues when R. packages become obsolete over time and scripts no longer run). What are some ways to circumvent these issues?
- How do we handle hardcopy data (e.g. paper questionnaires, consent and demographic forms with sensitive information)?
- Can the school repository be linked to OSF?
- Is it possible to create connections to other published studies or projects by international collaborators? Or is the school repository limited to sharing between NTU researchers? Ease of international collaborators in using the system.
- Possible ways to make external storage devices safer? Currently my lab has a Synology hard drive where all data backup is done (i.e. we transfer data from thumbdrives or portable hard disks to the lab hard drive). It has a few layers of password protection, but still I worry about security.
- Much of the information required in the DMP was detailed in the IRB. Are they considered equivalent?
- How to de-identify data? Any guidelines?? Especially if we have demographic info and various info about the experimental settings?
- 10 year retention for data — what about sensitive data ? (fear for thefts etc.)
- Possible to make amendments to DMP after submission? (like how IRB allows amendments)
- Can we still use the repository after graduating?
- Is the repository safe from hackers? are librarians/curators able to access all our research data?
- Can doi ever be eliminated? if you accidentally put up wrong data and don't want it to be up there
- If I have 200+ data files, is there a way to batch upload them? or uploading them in .tar / .zip is the only option?
- Dataset within a dataset?
- Is it possible for the data provider to delete the data after I used it and cite it in my paper?
- Can files be previewed inside a dataverse or a dataset except downloading for integrity checks?