NTU Data Management Plan Template

The NTU DMP template resides in RIMS (NTU Research Information Management System) and comes with 10 questions. A guide and a couple of samples are provided for each question.

There are minor updates to the NTU DMP template in RIMS from 15 Jan 2018 onwards. Please refer to this page for information about the earlier version as well as a comparison between the two versions.

The following is an off-RIMS compilation of the NTU DMP template questions, guides and samples:

Types and Size of data

 

a. What data will you be collecting or creating?

b. What is the estimated size of the project data?

Guide for a.:

 

  • Describe type of data e.g. quantitative, qualitative, survey data, experimental measurements, models, images, audio-visual data, samples etc.
  • Describe format of data e.g. text, numeric, audio-visual, models, computer code, discipline-specific, instrument-specific.
  • Describe data collection method e.g. observational, experimental, simulation, derived/compiled.
  • Indicate which data are of long-term value and should be shared and/or preserved.
  • Are there any existing data or methods that you can use? This could include data from earlier projects or third-party sources. Provide the title, author, date, URLs/name of these sources. Do you need to pay to reuse existing data? If purchasing or reusing existing data sources, explain how issues such as copyright and IPR have been addressed.
  • Consider how your data could complement and integrate with existing data.

Additional Information:

  • You may refer to Re3data for a list of data repositories where you might find existing relevant third-party research data.

 

Guide for b.:

  • Do you have sufficient storage or should you include costs for more?
  • Will the scale of the data pose challenges when sharing or transferring data between sites?
  • Have you consulted with a data repository to determine preservation costs?
  • Consider the implications of data volumes in terms of storage, backup and access.
  • Consider how the data volume will grow to make sure any additional storage and technical support required can be met.

 

 

SAMPLE 1:

Class observation data, faculty interview data and student survey data will be collected. The data will be collected during the research period (Jan 2013 – Dec 2013). Most of the data will be in text format (notes, paper survey).

(Adapted from: Cmor, D., & Marshall, V. (2006). Librarian Class Attendance: Methods, Outcomes and Opportunities. 27th Annual IATUL Conference.)

SAMPLE 2:

Experimental and observational data in physical paper format will be collected. These are data related to production and decomposition, ecophysiological functional traits, soil extractable nutrients and mineralization rates.
As these original data in physical paper format will be used to identify outliers and possible transcription errors, the physical paper copies will be kept for at least 10 years.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. Retrieved Nov 24, 2015, from UC San Diego ‍Sample ‍NSF ‍Data ‍Management ‍Plans ‍website: ‍ http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Cleland.pdf)

SAMPLE 3:

Experimental lab data will be collected using microscope. The data generated will be time- and location- stamped image files of natural resources in Delaware County, PA. The images will be served as a record of the occurrence of creatures, natural artefacts, and conditions at specific places and times during the period 2003 through 2011.
For many of the photos, taxonomic information and metadata will also be available. The occurrence data will be observational and qualitative. Metadata files shall be retained to facilitate reuse.

(Adapted from:   Hampton, S. Examples of Data Management Plans. Retrieved  Nov 24, 2015, ‍from ‍DataOne ‍website: ‍https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf‍)

SAMPLE 4:

Recorded oral interviews from 30 residents will be collected at the Nnindye community located in the Mpigi district in Uganda over a period of 6 months in the form of photos and videos.

(Adapted from:  Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032 )

SAMPLE 5:

The primarily public data from 2000 to 2015 from the US Census Bureau will be acquired. Some preliminary (non-public) Census data, and some other sources, e.g. the US Bureau of Labour Statistics, and New York State Dept of Health will also be purchased and gathered.

(Adapted from:  Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 6:

Primary data of audio files including Cheyenne and English language will be collected. Text files are generated after the files are transcribed.

(Adapted from:  Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7. ‍
http://dx.doi.org/10.5703/1288284315007)

SAMPLE 7:

Sensor data, images and possibly 3rd party data (weather and road conditions) will be collected. Data is saved as excel spreadsheets and in SQL database.

(Adapted from:  Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)

SAMPLE 8:

Experimental data will be generated from pressure sensors using Labview and generated from chromatographs. They includes variety of files including text, video specific to the equipment involved.

(Adapted from:  Kashyap, Nabil (2011) “Aerospace Engineering / Chemical Kinetics – University of Michigan,” Data Curation Profiles Directory: Vol. 3, Article 1. http://dx.doi.org/10.5703/1288284314989)

SAMPLE 9:

Field data from survey & bioessays will be collected using excel spreadsheet. Raw data of samples from lab will be collected using proprietary instrument. Ancillary data includes GIS data.

(Adapted from:  Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)

SAMPLE 10:

Quantitative data will be collected using motion capture system. The processed data types will include Matlab files, MS Excel files, codebook texts, and graphical files.

(Adapted from:  Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6.
http://dx.doi.org/10.5703/1288284314998)

 

Collection Methods and Processing of Data

How will the data be collected and processed?

  • Describe data collection method, e.g. observational, experimental, simulation, derived/compiled.
  • Describe the methods and standards that you will adopt to ensure quality data. This may include processes such as calibration, repeat samples or measurements, standardised data capture, data entry validation, peer review of data or representation with controlled vocabularies.
  • Describe how the data will be organised in your research project e.g. naming conventions, version control, folder structures, any community data standards (if any) will be used.

Additional Information:

 

SAMPLE 1:

Most datasets will be collected 1-3 times per year for a period of 3 years. Temperature, light availability and soil moisture at multiple depths in the experiment will be logged every 15 minutes. These data will be stored on local data loggers and downloaded every two weeks.

Data originally recorded on paper will be transferred into spreadsheets using .csv formats. DGVM simulation runs will be performed on a high performance parallel computing platform, a 96-node Linux cluster, maintained jointly by USFS Pacific Northwest Research Station and Oregon State University. DGVM output will be analysed and displayed with the ESRI ArcGIS software suite. To ensure data quality, data will be checked for outliers in the R statistical program, and any outliers will be checked for transcription errors.

As the data will be generated, processed and analysed by different project team members, I will recommend the project team members to name the data file by using their name initials, date and version, e.g. LGH_20150801_v1.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. ‍Retrieved Nov 24, 2015, ‍‍from UC ‍San ‍Diego Sample ‍NSF Data ‍Management ‍Plans ‍website‍: ‍http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Cleland.pdf)‍

SAMPLE 2:

Interviews conducted will be recorded using digital recorders. The interview recordings will be transcribed and then translated. Both transcripts and translations will be saved in Microsoft Word documents. There will be two Microsoft Word documents for each interview: one in the original Luganda language and the other translated to English. The English translated interview will be coded by using the ethnographic software.

The raw data will all be stored in a folder titled “Raw data_YYYYMMDD”; the processed or analysed data will be kept at different folders by data type, e.g. all audio recordings will be saved in the same folder and video recordings will be stored at another folder. We will be using the following file-naming convention for each data file and folder:

      • data file name: Subject_v1 (e.g. interview_v1)
      • folder name: datatype_v1_YYYYMMDD (e.g. audiorecordings_v1_20151120)

(Adapted from: Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032)

SAMPLE 3:

New data will be appended to existing time series in the MS SQL database. Aggregation of the data to state economic regions will be done to generate reports based on regions. Estimates/Projections will be calculated and reported. Website will be provided for users to view charts, maps, and tables that are dynamically created via an automated process that pulls data directly from the MS SQL database.

JISC has provided a guide on choosing a file name. We will name our data files based on the recommendations available in this website: http://www.jiscdigitalmedia.ac.uk/guide/choosing-a-file- name. All data files will be stored in different folders organised by researchers’ initials and date.

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 4:

The raw data of audio files will be normalized and cleaned up, then transcribed using a transcription software, ideally Elan. The audio and the transcription are synchronized. New audio recordings will be added each year throughout the project timeline (2015 – 2020).

The data will be organised and stored in different folders with the following file-naming convention: Subjectkeyword_V2_YYYYMMDD; Subjectkeyword_V2_YYYYMMDD.

(Adapted from: Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7.
http://dx.doi.org/10.5703/1288284315007)

SAMPLE 5:

Experiment will capture videos of the 200ms-long process and physical samples of the mixture at different stages of the process. Samples will be separated by chromatography machines.

The data will be analysed which involves generating proprietary files for processing software and convenient printable formats for manually examining the data, for example Excel spreadsheets or PDF files. The pressure trace graphs and chromatographs will be the focus of analysis. Chromatograms will be interpreted for Clarity software. Some graphs on Arrhenius plots and concentration plots will be generated using Origin software. The video from the experiment will be used primarily for verification that the experiment ran correctly. Video stills will be generated from the video files and merged with some graphs using Photoshop.

Data cleansing (e.g. removing outliers, missing data interpolation) will be performed to improve the data quality. Data quality will also be ensured by repeated samples.

We will store all the data in a shared drive and will name each file by the following file-naming convention:

      • 20140603_MAEProject_DesignDocument_Tan_v2-01.docx
      • 20140809_MAEProject_MasterData_Daniel_v1-00.xlsx
      • 20140825_MAEProject_Ex1Test1_Data_Jason_v3-03.xlsx
      • 20141023_MAEProject_ProjectMeetingNotes_Kumar_v1-00.docx

(Adapted from:

SAMPLE 6:

Data will be generated by subjecting plant samples to analysis using coupled Gas Chromatography- Mass Spectrophotometry (GC-MS).The data will then be analysed using the instrument specific proprietary software to measure the area underneath the peaks for specific known Volatile Organic Compounds (VOCs). The peak area data will be entered into an Excel spreadsheet along with the field survey data. Statistical analysis of the data will be performed using StatView to prepare the tables and graphs for the research.

All data columns that refer to Master Data will be validated for its consistency check to ensure quality. Analytical data quality will be tested using appropriate tests.

We have not decided on how the data files will be organised yet. However, we will follow the file naming conventions recommended by the Stanford University Libraries (http://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming) to name our data files.

(Adapted from: Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)

SAMPLE 7:

Motion capture markers of the system will be attached to various parts of the body, usually the joints. The data will be moved to Excel for automated and filtering to removing errors and noise that occur due to the system being sensitive to light (e.g. reflections) and motion marker occlusion. More automatic threshold- based filtering will be carried out along with visual review of the data and manual cleaning. This process will take place in Matlab and the data will eventually be converted to represent several variables (e.g. angle data, displacement velocity, or acceleration of joint segments). The data will then be aggregated across subjects and will be stored in an Excel spreadsheet.

The precise placement of markers is very important for the quality of the data and its reliability. About 40 markers on each subject will be used.

The data will be organised through a file folder system where each trial will be documented in a single spreadsheet, and all the files from particular study will be stored in the same folder structure

(Adapted from: Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6. http://dx.doi.org/10.5703/1288284314998)

SAMPLE 8:

Traffic flow data will be collected using sensors and video cameras. The road sensors placed in each lane of traffic will record the status of the intersection ( that the light is red, yellow, or green). Data from the sensors will be FTP-ed out on an hourly basis as compressed files. Data will be processed, normalized and reformatted from the vendor’s proprietary format into .csv and then into Microsoft Excel. Video of the traffic sites will be taken for data verification purposes and to ensure quality. The video gathered will be parsed out into .gif or .jpg images at the rate of 20 frames per second.

The data files will be primarily organized by date.

(Adapted from:Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)

File Formats & Software/Tools

 

a. Check the relevant file format(s) that you will be using (you may choose more than one):

b. What software(s) and/or tool(s) is/are needed to process/read the file(s)?

c. Where can this/these software(s) and/or tool(s) be obtained?

  • File formats affect one’s ability to use and re-use data in the future.
  • Strive to use a data format that is easy to read and easy to manipulate in a variety of commonly-used operating systems and programs.
  • Non-proprietary (‘open’) formats are also recommended to enhance accessibility.
  • For specialized data formats, provide information on the name, supplier information (if applicable) and version number to obtain the software(s) and/or tools to read your data.

Additional Information:

Confidentiality, Privacy & Security of Data

 

 

If your data is sensitive, how will you be managing and using it?

  • Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention. (Source: Australian National Data Services)
  • If your data is sensitive, state appropriate security measures that you will be taking.
  • Consider how you will protect the identity of participants, e.g., via anonymisation or using managed access procedures.
  • Describe the process of providing security to the data and files from unauthorized access or security breaches.
  • Investigators carrying out research involving human participants should request consent to preserve and share the data. Do not just ask for permission to use the data in your study or make unnecessary promises to delete it at the end.

Additional Information:

SAMPLE 1:

I have sensitive data as it will contain personal data.

The research will include data from subjects being screened for STDs. The final dataset will include self-reported demographic and behavioural data from interviews and laboratory data from urine specimens. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers, there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and documentation available only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate technology; and (3) a commitment to destroying or returning the data after analyses are completed.

(Adapted from: NIH ‍Data ‍Sharing ‍Policy ‍and ‍Implementation ‍Guidance. ‍(9 ‍February ‍2012), ‍from ‍http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex)

SAMPLE 2:

I have sensitive data as it is national security related.

Access to research records will be limited to primary research team members. Recorded data will have any identifying information removed and will be relabelled with study code numbers. A database which relates study code numbers to consent forms and identifying information will be stored separately on password-protected computers in a secured, locked office. To maintain the privacy of the participants, any report of individual data will only consist of performance measures without any demographic or identifying information.

(Adapted from: Collaborative Research in Computational Neuroscience (CRCNS): Innovative Approaches to Science and Engineering Research on Brain Function. Retrieved Nov 24, 2015, from UC San Diego Sample NSF Data Management Plans website: http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Psych.doc)

Access & Usage Restrictions

 

Will there be restrictions on accessing and sharing your final research data?

  • Is the data you propose to collect (or existing data you propose to use) in the study suitable for sharing? Consider copyright ownership, consent agreement from subjects, data sharing agreements or any other agreements with external collaborators and parties, e.g. non-disclosure or proprietary use of the data. For multi-partner projects, IPR ownership should be covered in the consortium agreement.
  • If you are unable to make your final research data available to others, you must state the reasons, e.g. patentable data, etc.
  • According to the NTU Research Data Policy, the final research data from projects carried out at NTU shall be made available for sharing.

Additional Information:

  • A licence is a document that clearly sets out how the data can be used and attributed to the original data owner.
  • Without a licence, it is unclear how your data can be reused and this may discourage the potential re-user.
  • The final research data is the final version of data that exists during the last stage in the data lifecycle in which all re-workings and manipulations of the data by the researcher have ceased.
  • The final research data also refers to the recorded factual materials commonly accepted by the scientific community as necessary to document, support, and validate research findings. Final research data does not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory.
  • Visit the ‘How to License Research Data’ in the Digital Curation Centre website to learn more about the why and how of research data licensing. AusGOAL (Australian Governments Open Access and Licensing Framework) also offers a good research data FAQ.
  • Types of Creative Commons licenses.
  • See EUDAT’s data and software licensing wizard.

SAMPLE 1:

I will share my final data under the CC-BY-NC Creative Commons (CC) license.

SAMPLE 2:

I will not be applying any Creative Commons license but will instead be imposing the following restriction to the sharing of my final data: not open sharing but on a private individual basis.

My reasons are: There are certain terms in the agreement that I sign with a third party that do not allow me to openly share some of my data. Anyone who is interested in my data could write to me at email: abd@yahoo.com and I would see what I can share based on his/her needs.

SAMPLE 3:

I will not be able to share my final data.

My reasons are: Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.

 

Metadata & Standards

What metadata and/or data standards will you be using to describe your data?

  • The term metadata is commonly defined as “data about data,” information that describes or contextualises the data.
  • Metadata helps to place your dataset in a broader context, allowing those outside your institution, discipline, or software environment to understand how to interpret your data. (Source: MANTRA)
  • You are strongly encouraged to use community standards to describe and structure data, where these are in place. The Digital Curation Centre (DCC) offers a catalogue of disciplinary metadata standards.
  • If you are using a specific metadata scheme or standard, please state what it is and provide the references.
  • If you are not using a specific metadata scheme or standard, describe the type of metadata (e.g. descriptive, structural, administrative, etc.) you will be providing, if any.

Additional Information:

  • Three broad categories of metadata are:
    • Descriptive – common fields such as title, author, abstract, keywords which help users to discover online sources through searching and browsing.
    • Administrative – preservation, rights management, and technical metadata about formats.
    • Structural – how different components of a set of associated data relate to one another, such as a schema describing relations between tables in a database, variable list, directory and file listing and taxonomy.
  • The difference between documentation (refer to DMP question 7) and metadata is that the first is meant to be read by humans and the second implies computer-processing (though metadata may also be human-readable).
  • Metadata may not be required if you are working alone on your own computer, but become crucial when data are shared online. Your data management plan should determine whether you need to apply metadata descriptors or tags at some point during your project.
SAMPLE 1:

I will not be using any metadata or international standard for the data collected and generated for this project. However, I will ensure each document that I have created using the Microsoft Word, Microsoft Excel and Microsoft PowerPoint has sufficient basic information such as Author’s name, Title, Subject, Keywords and etc. in the document properties. In addition, a separate readme file will be prepared to describe the details of each data. I will be applying the recommendations provided by Cornell University at http://data.research.cornell.edu/content/readme in the creation of readme file(s). Key elements could include: introductory information about the data, methodological, date-specific and sharing/access related information.

SAMPLE 2:

The clinical data collected from this project will be documented using CDASH v1.1 standards. The standard is available at CDISC website at http://www.cdisc.org/cdash.

SAMPLE 3:

Using an electronic lab notebook, we would be generating metadata along with each notebook and postings. The metadata would include Sections, Categories and Keys which would be assigned by collaborators for reuse so as to maintain consistency in the use of terminology. We would also be using the Properties Ontology (ChemAxiomProp) when describing the chemical and materials properties.

SAMPLE 4:

Metadata about timing and exposure of individual images will be automatically generated by the camera. GPS locations will subsequently be added by post-processing GPS track data based on shared time stamps. Metadata for the image dataset as a whole will be generated by the image management software (iMatch) and will include time ranges, locations, and a taxon list. Those metadata will be translated into Ecological Metadata Language (EML), created using the Morpho software tool, and will include location and taxonomic summaries.

(Adapted from: Hampton, S. Examples of Data Management Plans. Retrieved  Nov 24, 2015, from DataOne website: https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf)

SAMPLE 5:

We will be using some core elements from the TEI metadata standards http://www.tei-c.org/index.xml to describe our data. We will also be adding some customised elements in the metadata to provide more details on the rights management.

SAMPLE 6:

The data will be stored in several tables in an MS SQL database, which also includes some “metatables” that describe the original source of various tables and variables. These metatables will also include configuration information for the public website, such as short and long names for variables, numeric format, colours for mapping, etc.  Several standard Census variables, ref: Office of National Statistics https://www.ons.gov.uk/ will also be used.

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

Data Documentation

What documentation will you be providing to facilitate a better understanding of the project data?

  • List the documentation that you would be providing to explain how the data is to be interpreted and used. Examples of documentation:
    • codebooks
    • lay summary
    • txt
    • webpage
    • electronic lab notebook
  • Content of documentation could include the following:
    • Methodology and procedures used to collect the data
    • Details about codes
    • Definitions of variables
    • Variable field locations
    • Frequencies

Additional Information:

  • Visit the ‘Document your data’ by UK Data Service for more guidance on the different types of documentation.
  • Visit the ‘How to write a lay summary’ by Digital Curation Centre for more guidance on why lay summaries are important, how to write one and examples.

SAMPLE 1:

I would be providing the following accompanying documentation to facilitate a better understanding of the project data.

  • lay summary
  • readme.txt

others: I will also be writing a journal article to share the research data management aspect of my project. The paper would be made available on DR-NTU later.

Data Storing 

Where and how are you storing the data during the project?

  • List the platforms and devices that will be used to store the data, e.g. electronic lab notebook, Sharepoint, WordPress.
  • Consider institutional data security policies, e.g. NTU Information Security Policies & Recommendations at http://www.ntu.edu.sg/cits/securityregulations/Pages/default.aspx.
  • Identify the location where the data would be stored, e.g. school server. Provide the URLs of online locations.
  • Provide names of people performing data storage roles.

Additional Information:

SAMPLE 1:

I will be using a networked storage drive XXX, which is a storage for active data for all research staff and students. It is fully backed-up, secure, resilient, and has multi-site storage. It is accessible via VPN (Virtual Private Network) from outside the University.  I will also be using an external storage device such as hard drives / USB flash drives (also known as memory sticks, USB keyrings or pen drives) / Compact Discs (CDs) / Digital Video Discs (DVDs).  Researcher ABC would be coordinating and overall-in-charge for data storage.

SAMPLE 2:

The data will be stored locally on a secure password-protected data server. One set of hard drives and one set of tapes will be stored in XXX building. A second set of hard drives and a second set of tapes will be stored at a XXX building. All data will be back up on a daily basis by XXX (researcher).

SAMPLE 3:

The data (on staff computers and the web server) will be managed according to the standard practices of the college’s IT department and will be password protected. Any restricted, non-public data will be stored on CRADC (Cornell Restricted Access Data Center). All files will be backed up every day by xxx (project team member).

 

Backup & Versioning Control

 

What backup and versioning control procedures will you be undertaking?

  • Describe the backup and archiving regime you will use to back up all your data to prevent its loss, e.g. through hard disk failure, virus infection or theft.
  • Describe the method you will use to ensure that different versions of your data are identifiable and properly controlled and used.
  • How will the data be backed up? i.e. how often, to where, how many copies, is this automated
  • Provide names of the people performing data backup roles.

Additional Information:

  • Storing data on laptops, computer hard drives or external storage devices alone is very risky. The use of robust, managed storage with automatic backup, for example that provided by university IT team, is preferable.
  • See ‘Backing-up Data’ in UK Data Archive for more tips.
  • See ‘Storage and Security’ in MANTRA for more guidance.
  • 3-2-1 principle of backup (Source: BACKBLAZE)
    • Keep 3 copies of any important file
    • Store files on at least 2 different media types
    • Keep at least 1 copy offsite
    • Learn more about versioning control:

     

SAMPLE 1:

A complete copy of materials will be generated and stored independently on primary and backup sources for both the PI and Co-PI (as data are generated) and with all members of the Expert Panel every 6 months. The project team will be adopting the Version Control guidelines (http://www.nidcr.nih.gov/Research/ToolsforResearchers/Toolkit/VersionControlGuidelines.htm) provided by National Institute of Dental and Craniofacial Research to organise and ensure different versions of the data are identifiable and properly controlled and use.

SAMPLE 2:

We will adopt and use the version control standards recommended by University of Leicester https://www2.le.ac.uk/services/research-data/documents/UoL_VersionControlChart_d0-1.pdf for the transcripts of the interviews and coding in terms of changes the research team has made to the files.

SAMPLE 3:

We will be using Mercurial (https://www.mercurial-scm.org/), a free, distributed source control management tool to manage the data, so that the data would easily be identifiable and properly controlled and used.

SAMPLE 4:

All data will be backed up manually on monthly basis by researcher xxx on a computer hard drive kept at the research team office. The computer will be password protected and only team members will be given the password and right to access the computer. Incremental back-ups will be performed nightly and full back-ups will be performed monthly. Staff xxxx will be keeping versions by appending the date of the update to the file name. Versions of the file that have been revised due to errors/updates will be retained in an archive system. A revision history document will describe the revisions made.

(Adapted from: NSF General: Mauna Loa example. Retrieved from Data Management Planning website: https://www.dataone.org/sites/all/documents/DMP_MaunaLoa_Formatted.pdf)

Long-term Storage & Preservation

 

 

a. NTU Research Data Policy requires you to retain your research data for a minimum of 10 years. Where will you be depositing the data after the completion of your research project? (You may choose more than one)

b. Is there any data that will not be deposited in any data repository (ies) mentioned in question 10a?

Guide for 10a and b:

An open access data repository must be actively managed in order to:

    1. enable access to the dataset
    2. ensure dataset persistence
    3. ensure dataset stability
    4. enable searching and retrieval of datasets
    5. collect information about repository statistics

(Source: Callaghan, S., Tedds, J., Kunze, J., et al. (2014). Guidelines on recommending data repositories as partners in publishing research data. International Journal of Digital Curation, 9(1), 152-163. doi:10.2218/ijdc.v9i1.309)

  • The NTU open access data repository, DR-NTU (Data) will store and preserve the final version of your final research data for long-term access.
  • Preservation of digital outputs is necessary in order for the research data to endure changes in the technological environment and remain potentially re-usable in the future.
  • NTU researchers should deposit their final research data in DR-NTU (Data). Where the DR-NTU (Data) does not meet your requirements, you are required to submit information about the data that is deposited elsewhere, to the DR-NTU (Data). The DR-NTU (Data) would provide a link to the external data repository(ies).

 

Additional Information for 10a and b:

 

 

FAQ

1. What is a Data Management Plan?

A Data Management Plan (DMP) outlines the steps you intend to take in managing the data you collect or generate in the entire course of your research project. It provides information on what the data is about, how it will be processed, the security measures to be taken, where and when it will be stored and preserved and who can have access and use of the data.

2. Why do I need to submit a DMP?

 

The NTU Research Data Policy requires all research proposals to include a data management plan (DMP). A DMP would be useful in getting the Principal Investigator (PI) and his team to start planning for the necessary resources including an effective and efficient approach for managing their research data. Proper management of research data is necessary to ensure that the research integrity of research work carried out in NTU is beyond reproach.

3. What are the benefits of having a DMP?

 

A systematic approach in planning the lifecycle of your data will help to alert you on possible data collection and management issues so that you can prepare for them before the project begins. In addition, it will help in the continuity of your research project when staff leaves or new staff start work. Importantly, adherence to data management processes will ensure that the evidence of your research is properly recorded, kept and available when there are concerns on research integrity.

4. Why do I need to share my research data?

 

 

The general expectation is that publicly funded research data is a public good, and should be made openly available with as few restrictions as possible. Many research funding agencies worldwide require their grant recipients to share their research data. Examples include National Science Foundation and National Institute of Health in US (see US list by University of Minnesota Libraries); Engineering and Physical Sciences Research Council and Wellcome Trust in UK (see UK list by Digital Curation Centre) as well as National Health and Medical Research Council and Australian Research Council in Australia (see Australian list by Australian National Data Service).

In Singapore, the National Medical Research Council (NMRC) of the Ministry of Health in Singapore has indicated a possible upcoming requirement for recipients of new grant projects of S$250,000 and above to share their research data no later than 12 months after publication of paper that has used the dataset.

NTU researchers are required to share their research data where possible according to the NTU Research Data Policy.

5. What are the benefits of sharing my data?

There are many possible benefits of sharing data. Making your research data easily discoverable and accessible could encourage greater collaboration with people within as well as outside your discipline(s). It could also help reduce unintentional redundancies in replicating past research efforts. Sharing the data underpinning your research publications could increase visibility of your research publications and academic profile. Other researchers could cite your data so you gain credit.

 

6. I am working with sensitive data. Can I be exempted from filing a DMP?

 

No. You could still fill up a DMP even when your project data is of sensitive nature. You do not need to disclose any sensitive aspects of your project data when writing a DMP. There is sample text for the relevant question in the NTU DMP template which might give you some idea of what other researchers say in their DMPs in regards to the management of sensitive data.

7. When do I need to submit a DMP?

According to the NTU Research Data Policy, you are required to submit your DMP within the first 3 months of the approval of your project grant. However, the online system would be asking you to submit within 2 weeks so as not to delay the opening of the research funding account for your project.

8. When do I need to submit a new version of DMP?

You are strongly recommended to submit a new version of DMP as and when there is any significant change to the data specified earlier and how the data would be managed.

9. How does the format of completed DMP look like?

The format looks like this.

10. Who can I contact if I need help with answering the questions in DMP or data sharing?

Please contact a scholarly communication librarian at library@ntu.edu.sg if you need help with answering the questions in DMP or data sharing. We would also like to invite you to attend a NTU DMP Writing Workshop. Click here to register.

For RIMS IT support, please send your enquiry to RIMS_SUPPORT@ntu.edu.sg.

Print Friendly, PDF & Email
Skip to toolbar