NTU Data Management Plan Template

The NTU DMP template consists of 10 Questions. Guide and samples are provided for each question.

Types, Formats & Collection Methods of Data

What data will you be collecting and how?

  • Describe type of data e.g. quantitative, qualitative, survey data, experimental measurements, models, images, audio-visual data, samples etc.
  • Describe format of data e.g. text, numeric, audio-visual, models, computer code, discipline-specific, instrument-specific.
  • Describe data collection method e.g. observational, experimental, simulation, derived/compiled.
  • How stable is the data? Does the data ever change or grow?
  • Are there any existing data that you will be using? This could include data from earlier projects or third-party sources. Provide the title, author, date, URLs/name of these sources. Do you need to pay to reuse existing data? Are there any restrictions (copyright or licensing) on the reuse of third-party data?

Additional Information:

  • You may refer to Re3data for a list of data repositories where you might find existing relevant third-party research data.

 

SAMPLE 1:

Class observation data, faculty interview data and student survey data will be collected.  Observations will be diarised and interviews will be recorded, and then each will be coded using Ethnograph. Survey data will be captured using Qualtrics. The data will be collected during the research period (Jan 2013 – Dec 2013).

(Adapted from: Cmor, D., & Marshall, V. (2006). Librarian Class Attendance: Methods, Outcomes and Opportunities. 27th Annual IATUL Conference.)

SAMPLE 2:

Datasets to be collected are mainly experimental and observational. Most datasets will be collected 1-3 times per year (i.e. production and decomposition, ecophysiological functional traits, soil extractable nutrients and mineralization rates) for a period of 3 years. Temperature, light availability and soil moisture at multiple depths in the experiment will be logged every 15 minutes, these data will be stored on local data loggers and downloaded every two weeks.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. Retrieved Nov 24, 2015, from UC San Diego ‍Sample ‍NSF ‍Data ‍Management ‍Plans ‍website: ‍ http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Cleland.pdf)

SAMPLE 3:

This project will generate time- and location-stamped image files of natural resources in Delaware County, PA. The images will be served as a record of the occurrence of creatures, natural artefacts, and conditions at specific places and times during the period 2003 through 2011. For many of the photos, taxonomic information will also be available. The occurrence data will be observational and qualitative. Data will be captured with a digital camera capable of creating images with sufficient taxonomic detail to allow identification to the species level for many taxa.

(Adapted from:   Hampton, S. Examples of Data Management Plans. Retrieved  Nov 24, 2015, ‍from ‍DataOne ‍website: ‍https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf‍)

SAMPLE 4:

Recorded oral interviews will be collected at an interview at the Nnindye community located in the Mpigi district in Uganda for about 30 residents over a period of 6 months. Photos and videos also will be included. The interview will be conducted in the Luganda language and will be recorded using digital recorders. Photographs and video will be taken with digital cameras and digital video cameras.

(Adapted from:  Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032 )

SAMPLE 5:

The primarily public data from 2000 to 2015 from the US Census Bureau will be acquired. Some preliminary (non-public) Census data, and some other sources, e.g. the US Bureau of Labour Statistics, and New York State Dept of Health will also be purchased and gathered. All data will be processed, analyses and aggregated and the results will be stored in an MS SQL database. Eventually, a public website will be provided to this final data in the form of charts, tables, maps and downloadable Excel spreedsheets.

(Adapted from:  Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 6:

We will be using portable audio recorder to collect the data. We will be conducting face-to-face interview to a group of Cheyenne natives to share about their culture in short stories format. New audio recordings will be added each year throughout the project timeline (2015 – 2020). The data will be collected by the research team members and will be transcribed by using a transcription software, ELAN. Some online free resources will also be used in the research for proper pronunciation guide.

(Adapted from:  Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7. ‍
http://dx.doi.org/10.5703/1288284315007)

SAMPLE 7:

We will be placing a few sensors and video cameras to monitor the traffic flow at several intersections around Indiana. The road sensors will be placed in each lane of traffic. When a car or other vehicle passes over it, the frequency of the sensor will change or revert when the vehicle rolls off of it. The sensors which placed near the traffic lights will record the status of the intersection. Data from the sensors will be FTP-ed out on an hourly basis as compressed files. Video of the sites will be taken and used for data verification purposes too. The video gathered will be parsed out into .gif or .jpg images at the rate of 20 frames per second. Furthermore, data on weather conditions will also be collected from the “Weather Underground” website for explanatory purposes as weather conditions may affect traffic flows. We will be collecting the data from Mar 2007 – Feb 2008.

(Adapted from:  Carlson, Jake R. (2009) “Traffic Flow – Purdue University,” Data Curation Profiles Directory: Vol. 1, Article 4. http://dx.doi.org/10.5703/1288284315016)

SAMPLE 8:

We will be conducting a few experiments (a piston drives the volume of a tube 2.7m long into a much smaller test volume, compressing a fuel-air mixture into a 100 – 200cc and igniting the mixture in the process) in the lab which fall roughly into two categories: those which capture video (use Phantom camera which firing 30,000/s) of the 200ms-long process and those which take physical samples of the mixture at different stages during the reaction process. The samples collected will be separated using four chromatography machines to generate reactants. In addition, data will also be generated from the pressure transducers in the compressed tube.

(Adapted from:  Kashyap, Nabil (2011) “Aerospace Engineering / Chemical Kinetics – University of Michigan,” Data Curation Profiles Directory: Vol. 3, Article 1. http://dx.doi.org/10.5703/1288284314989)

SAMPLE 9:

The data that we will be collecting for this research project consists of field survey data and bioassays measuring herbivore damage, pollination, fruit set under different experimental treatments (addition of herbivores, treatment with MeJA- plant hormone, removal of visual cues, etc.). The information about plant location and habitat description will also be collected as ancillary data at the outset of the experiment. Data will also be generated in the lab by subjecting plant samples to analysis using coupled Gas Chromatography- Mass Spectrophotometry (GC-MS).

(Adapted from:  Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)

SAMPLE 10:

The data for the research project will be produced with a motion capture system, which includes hardware and proprietary software. Motion capture markers will be attached to various parts of the body, usually the joints. While the study subject performs target motions, the marker coordinates are recorded by the motion capture system. We will use about 40 markers on each subject and the 3-D marker coordinates (x, y, z) will capture multiple times (tx) in a session. We are expecting to generate a large number of data from the motion capture system.  Sampling rates and number of data points captured per second will also be collected

(Adapted from:  Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6.
http://dx.doi.org/10.5703/1288284314998)

SAMPLE 11:

Background data comprise (1) text retrieved from literature search engines (usually .txt or .xml); (2) Full text of publications retrieved from PubMed Central, Institutional Repositories and publishers (usually .pdf or .html); and (3) categorical or numerical data extracted by human investigators from those full text publications. Foreground data comprise (1) computer code used to perform the tasks outlined in the main application (identification and retrieval of relevant publications, extraction of publication meta-data and extraction of publication outcome data); and (2) quantitative data generated from comparison of different approaches to the tasks outlines above.

(MRC Template for a Data Management Plan, v01-00, 6 Aug 2014, http://www.dcn.ed.ac.uk/camarades/files/SLiM%20datasharing.pdf )

SAMPLE 12:

The first stage of data generation will be Simulation. This is a data stage that generates a set of initial conditions, a model of the universe at an early age. Differential equations will be applied and the universe will evolve. Its state will be captured at different times. While the process is straightforward, a lot of work would be involved as it is done at a large scale. The purpose of this stage is to try to define the state of the universe. A model of the universe will be evolved so that there will be something to go back to and measure. The second stage will be Data Reduction. Scientists will use the data set to identify clusters of particles. They will analyze the clusters to generate spherical averages. From these averages, they can create .txt and binary HDF files.

(Adapted from: Kozbial, Ardys (2010) “Astrophysics – University of California San Diego,” Data Curation Profiles Directory: Vol. 2 , Article 1. DOI: 10.5703/1288284314996 )

SAMPLE 13:

One part of the data will consist of the various parameters, settings, and version of the Weather Research and Forecasting model (http://www.wrf-model.org/index.php), which is a community model for weather forecasting and currently is a community standard. Another major component of the data will be the output of the model, which will go through multiple transformation stages.

In the initial step, the typical weather profile of temperature, pressure, moisture and such (called “the sounding”) will be combined with the “namelist” file, which will contain the model options representing the physical processes in the atmosphere and the storms’ relative locations. The “sounding” and the “namelist” will be compiled together by the pre-processing module of Weather Research and Forecasting Model called “ideal.exe” (see Initiation for Ideal Cases in *insert relevant URL). This compilation will produce ready-to-use “idealized” input that now can be used to initiate the model.

The output of the model will be a three-dimensional floating point data that is considered the raw data for this research area (i.e. sub-discipline). This data will be generated from the model at regular intervals of time. The output at each time step will be saved as a separate file, since appending all the output to one file makes the file too large. The raw output data will be uniform in its horizontal dimension, but not in the vertical one due to terrain differences. Thus, the raw data will be made uniform all around by interpolating it onto the regular Cartesian grid. This interpolation will be done by putting the raw data through a post-processing tool called “Read/Interpolate/Plot” or “RIP.” (*insert relevant URL).

(Adapted from: Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2010) “Atmospheric Modeling – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 2 , Article 2.
DOI: 10.5703/1288284314997 )

Processing & Transformation of Data

How will the data be used and managed in your research project?

  • Describe how the data will be used in your research project. Take into consider the technical development process from the point of data capture or data creation through to final delivery or analysis.
  • Describe the practices and standards that you will adopt to ensure quality data is collected or generated and the processing is well documented.
  • Describe how the data will be organised in your research project e.g. naming conventions, version control, folder structures, any community data standards (if any) will be used.

Additional Information:

 

 

 

 

 

SAMPLE 1:

Data originally recorded on paper will be transferred into spreadsheets using .csv formats. Data will be checked for outliers in the R statistical program, and any outliers will be checked for transcription errors. DGVM simulation runs will be performed on a high performance parallel computing platform, a 96-node Linux cluster, maintained jointly by USFS Pacific Northwest Research Station and Oregon State University. DGVM output will be analysed and displayed with the ESRI ArcGIS software suite.

As the data will be generated, processed and analysed by different project team members, I will recommend the project team members to name the data file by using their name initials, date and version, e.g. LGH_20150801_v1.

(Adapted from: Cleland, E., Lipson, D., & Kim, J. The influence of plant functional types on ecosystem responses to altered rainfall. ‍Retrieved Nov 24, 2015, ‍‍from UC ‍San ‍Diego Sample ‍NSF Data ‍Management ‍Plans ‍website‍: ‍http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Cleland.pdf)‍

SAMPLE 2:

The audio and video recordings will be saved in common standardized formats, e.g. WMV and the photographs will be saved in JPEG format. The interview recordings will be transcribed into Luganda and then translated into English. Both translations will be saved in word documents. The English translated interview will be coded by using the ethnographic software.

The raw data will all be stored in a folder titled “Raw data_YYYYMMDD”; the processed or analysed data will be kept at different folders by data type, e.g. all audio recordings will be saved in the same folder and video recordings will be stored at another folder. We will be using the following file-naming convention for each data file and folder:

  • data file name: Subject_v1 (e.g. interview_v1)
  • folder name: datatype_v1_YYYYMMDD (e.g. audiorecordings_v1_20151120)

(Adapted from: Sapp Nelson, Megan and Beavis, Katherine (2013) “History / Sustainable Development – Purdue University,” Data Curation Profiles Directory: Vol. 5, Article 1. http://dx.doi.org/10.7771/2326-6651.1032)

SAMPLE 3:

All acquired original data from government sources in different format, such as .xls, .csv, .html, .shp, etc will be loaded and appended to existing time series in the MS SQL database. After that, all data will be aggregated by economic regions. Mapping will be carried out and created by using ArcGIS. Eventually, the data will be made available for searching and retrieving (querying and visualising) in html on a website. Analytical reports are also made available for downloading on the website. SAS, Excel, ColdFusion (query, retrieve and display data from the MS SQL), Google Chart API (to visualized data in various forms, e.g. population pyramids and trendlines) and Google Map API (to display dynamic maps of the data) will be used in the process of making the data for public to access on the website.

JISC has provided a guide on choosing a file name. We will name our data files based on the recommendations available in this website:  http://www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name. All data files will be stored in different folders organised by researchers’ initials and date.

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

SAMPLE 4:

The audio files in wav format will be transcribed by using a transcription software ELAN and saved in Microsoft Word document. Both version of data will be cleaned up and normalised. Metadata will be created using morphological glosses so that the textual files can be searched for either the native language or English or the gloss for specific linguistic features and then call up the audio files. Since data will be added each year, the data will be organised by folder with year indicated. Ultimately, the data will be ingested in a publicly accessible searchable database or repository. The audio and the transcription are synchronised.

The data will be organised and stored in different folders with the following file-naming convention: Subjectkeyword_V2_YYYYMMDD; Subjectkeyword_V2_YYYYMMDD…

(Adapted from: Tancheva, Kornelia (2012) “Linguistics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 7.
http://dx.doi.org/10.5703/1288284315007)

SAMPLE 5:

The data will be analysed which involves generating proprietary files for processing software and convenient printable formats for manually examining the data, for example Excel spreadsheets or PDF files. The pressure trace graphs and chromatographs are the focus of analysis. Hence, pressure trace data will be analysed using Matlab, Excel and Adobe Portable File Format. Chromatograms will be interpreted for Clarity software. Some graphs on Arrhenius plots and concentration plots will be generated using Origin software. The video from the experiment will be used primarily for verification that the experiment ran correctly. Video stills will be generated from the video files and will be merged with some graphs using Photoshop.

We will store all the data in a shared drive and will name each file by the following file-naming convention:

  • 20140603_MAEProject_DesignDocument_Tan_v2-01.docx
  • 20140809_MAEProject_MasterData_Daniel_v1-00.xlsx
  • 20140825_MAEProject_Ex1Test1_Data_Jason_v3-03.xlsx
  • 20141023_MAEProject_ProjectMeetingNotes_Kumar_v1-00.docx

(Adapted from:

  1. Kashyap, Nabil (2011) “Aerospace Engineering / Chemical Kinetics – University of Michigan,” Data Curation Profiles Directory: Vol. 3, Article 1. http://dx.doi.org/10.5703/1288284314989
  1. Brandt, S. (29 July 2015). Data Management for Undergraduate Researchers: File Naming Conventions, from http://guides.lib.purdue.edu/c.php?g=353013&p=2378293)

SAMPLE 6:

The Gas Chromatography- Mass Spectrophotometry (GC-MS) data will be analysed using the instrument specific proprietary software to measure the area underneath the peaks for specific known Volatile Organic Compounds (VOCs). The peak area will then be entered into an Excel spreadsheet along with the field survey data. Statistical analysis of the data will be performed using StatView to prepare the tables and graphs for the research.

We have not decided on how the data files will be organised yet. However, we will follow the file naming conventions recommended by the Stanford University Libraries (http://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming) to name our data files.

(Adapted from: Wright, Sarah J. (2012) “Environmental Science / Herbivory – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 3. http://dx.doi.org/10.5703/1288284315002)


SAMPLE 7:

The data will be moved to Excel for automated and filtering to removing errors and noise which occur due to the system being sensitive to light (e.g. reflections) and motion marker occlusion. More automatic threshold- based filtering will be carried out after that, along with visual review of the data and manual cleaning. This process will take place in Matlab and the data will eventually be converted to represent several variables (e.g. angle data, displacement velocity, or acceleration of joint segments). The data will then be aggregated across subjects and will be stored in an Excel spreadsheet. The data will be organised through a file folder system where each trial will be documented in a single spreadsheet, and all the files from particular study will be stored in the same folder structure.

(Adapted from: Cragin, Melissa; Kogan, Marina; and Collie, Aaron (2011) “Bio-Mechanics Motion Studies – University of Illinois Urbana-Champaign,” Data Curation Profiles Directory: Vol. 3, Article 6. http://dx.doi.org/10.5703/1288284314998)

 

File Extensions & Software/Tools

a. Identify the relevant file extensions that you will be using.

b. What software(s) and/or tool(s) is/are needed to process/read the file(s)?

c. Where can this/these software(s) and/or tool(s) be obtained?

  • File formats affect one’s ability to use and re-use data in the future.
  • Strive to use a data format that is easy to read and easy to manipulate in a variety of commonly-used operating systems and programs.
  • Non-proprietary (‘open’) formats are also recommended to enhance accessibility.
  • For specialized data formats, provide information on the name, supplier information (if applicable) and version number to obtain the software(s) and/or tools to read your data.

Additional Information:

SAMPLE 1:

a. My research data would be in the following file formats:

.docx

.jpeg, .jpg

.mp4

.txt

.xlsx

.rtf

 

b. Software(s) needed to process/ read the file(s):

Microsoft Office applications 2010 edition, NotePad version 6.1, Windows Media Player. Digital video data files generated will be processed in MPEG-4 (.mp4) format.

c. The software(s) would be obtainable via:

No uncommon proprietary or custom designed software will be used in this research. The softwares required to read the files are readily available in most current computers and are provided by my institution.

 

SAMPLE 2:

a. My research data would be in the following file formats:

.docx

Other file formats: .netCDF

b. Software(s) needed to process/ read the file(s):

Simulation output will be in netCDF format, a data format popular in climate research, readable with many free software programs. The simulation data will be generated using NetCDF software, version 4.4.1.1

c. The software(s) would be obtainable via:

The software to read netCDF format can be downloaded at http://www.unidata.ucar.edu/software/. Microsoft Word 2013 or beyond would be required to read .doc files.

 

SAMPLE 3:

a. My research data would be in the following file formats:

.doc

.docx

.pdf

.txt

 

b. Software(s) needed to process/ read the file(s):

Primary data will all be created or transcribed into standard Microsoft Office (Word, Excel, and PowerPoint) files. Files will be stored and available both in original format and as pdf documents. In the case of answers to forced-choice and open-ended questions, data will be stored both in pdf and tab-delimited formats for the purpose of subsequent statistical analyses. Microsoft Office 2013 and beyond and Adobe Acrobat Reader version 11.0.10 are recommended to read the generated data.

c. The software(s) would be obtainable via:

No uncommon proprietary or custom designed software will be used in this research. The softwares required to read the files are readily available in most current computers. The software to read PDF can also be easily downloadable for free from the Internet.

 

SAMPLE 4:

a. My research data would be in the following file formats:

.docx

.wav

.txt

 

b. Software(s) needed to process/ read the file(s):

Microsoft Office applications 365, Adobe Acrobat Version 2015.009.20069, Notepad++6.8.6.

 

c. The software(s) would be obtainable via:

  1. Office 365 can be purchased from https://www.microsoftstore.com/store/mssg/en_SG/cat/Office/categoryID.65861600?icid=CNavSoftwareOffice. Price: SG$138.00 per year.

 

  1. Adobe Acrobat can be obtained from https://get.adobe.com/reader/ at $0 cost.

 

  1. Notepad++ can be downloaded for free from https://notepad-plus-plus.org/download/v6.8.6.html.

 

 

SAMPLE 5:

a. My research data would be in the following file formats:

.zip

.dat

.vbs

 

b. Software(s) needed to process/ read the file(s):

Zacros version 1.02 (Kinetic Monte Carlo software package written in Fortran 2003).

 

c.The software(s) would be obtainable via:

The software, Zacros, can be obtained from http://www.e-lucid.com/i/software/Zacros.html. Price: Free of charge for academic use & £5000 for Commercial Licence.

An in-house developed VBScripts will be created and used in this project. The VBScript will be stored in the computer (computer name: xxx) housed at the research lab.

Confidentiality, Privacy & Security of Data

If your data is sensitive, how will you be managing and using it?

  • Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention. (Source: Australian National Data Services)
  • If your data is sensitive, state appropriate security measures that you will be taking. Note the main risks and how these will be managed e.g.: strategies to minimize risks of unauthorized disclosure of personal identifiers.
  • Describe the process of providing security to the data and files from unauthorized access or security breaches.

Additional Information:

  • See ‘Sensitive data: publishing and sharing’ for more guidance on how to manage and share sensitive data. (Source: Australian National Data Services)
  • Learn Confidentialising data in 3 steps from Publishing and sharing sensitive data-ANDS Guide, pg. 12-14. (Source: Australian National Data Services)
  • See ‘Data Security’ in UK Data Service for more details on physical data security, network security, security of computer systems and files, etc.

SAMPLE 1:

I have sensitive data as it will contain personal data.

The research will include data from subjects being screened for STDs. The final dataset will include self-reported demographic and behavioural data from interviews and laboratory data from urine specimens. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers, there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and documentation available only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate technology; and (3) a commitment to destroying or returning the data after analyses are completed.

(Adapted from: NIH ‍Data ‍Sharing ‍Policy ‍and ‍Implementation ‍Guidance. ‍(9 ‍February ‍2012), ‍from ‍http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex)

SAMPLE 2:

I have sensitive data as it is national security related.

Access to research records will be limited to primary research team members. Recorded data will have any identifying information removed and will be relabelled with study code numbers. A database which relates study code numbers to consent forms and identifying information will be stored separately on password-protected computers in a secured, locked office. To maintain the privacy of the participants, any report of individual data will only consist of performance measures without any demographic or identifying information.

(Adapted from: Collaborative Research in Computational Neuroscience (CRCNS): Innovative Approaches to Science and Engineering Research on Brain Function. Retrieved Nov 24, 2015, from UC San Diego Sample NSF Data Management Plans website: http://libraries.ucsd.edu/services/data-curation/data-management/dmpsample/DMP-Example-Psych.doc)

Access & Usage Restrictions

Will there be restrictions on accessing & sharing your final research data?

  • Is the data you propose to collect (or existing data you propose to use) in the study suitable for sharing?
  • If you are unable to make your final research data available to others, you must state the reasons.
  • The default requirement is for you to share your final research data.

Additional Information:

  • A licence is a document that clearly sets out how the data can be used and attributed to the original data owner.
  • Without a licence, it is unclear how your data can be reused and this may discourage the potential re-user.
  • The final research data is the final version of data that exists during the last stage in the data lifecycle in which all re-workings and manipulations of the data by the researcher have ceased.
  • The final research data also refers to the recorded factual materials commonly accepted by the scientific community as necessary to document, support, and validate research findings. Final research data does not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory.
  • Visit the ‘How to License Research Data’ in the Digital Curation Centre website to learn more about the why and how of research data licensing. AusGOAL (Australian Governments Open Access and Licensing Framework) also offers a good research data FAQ.
  • Types of Creative Commons licenses.

 

SAMPLE 1:

I will share my final data under the CC-BY-NC Creative Commons (CC) license. This permits others to use my data for non-commercial applications only and with proper attribution to my work.

SAMPLE 2:

I will not be applying any Creative Commons license but will instead be imposing the following restriction to the sharing of my final data: not open sharing but on a private individual basis.

My reasons are: There are certain terms in the agreement that I sign with a third party that do not allow me to openly share some of my data. Anyone who is interested in my data could write to me at email: abd@yahoo.com and I would see what I can share based on his/her needs.

SAMPLE 3:

I will not be able to share my final data.

My reasons are: Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.

 

Metadata & Standards

What metadata and/or data standards will you be using to describe your data?

  • The term metadata is commonly defined as “data about data,” information that describes or contextualises the data.
  • Metadata helps to place your dataset in a broader context, allowing those outside your institution, discipline, or software environment to understand how to interpret your data. (Source: MANTRA)
  • You are strongly encouraged to use community standards to describe and structure data, where these are in place. The Digital Curation Centre (DCC) offers a catalogue of disciplinary metadata standards.
  • If you are using a specific metadata scheme or standard, please state what it is and provide the references.
  • If you are not using a specific metadata scheme or standard, describe the type of metadata (e.g. descriptive, structural, administrative, etc.) you will be providing, if any.

Additional Information:

  • Three broad categories of metadata are:
    • Descriptive – common fields such as title, author, abstract, keywords which help users to discover online sources through searching and browsing.
    • Administrative – preservation, rights management, and technical metadata about formats.
    • Structural – how different components of a set of associated data relate to one another, such as a schema describing relations between tables in a database, variable list, directory and file listing and taxonomy.
  • The difference between documentation (refer to DMP question 7) and metadata is that the first is meant to be read by humans and the second implies computer-processing (though metadata may also be human-readable).
  • Metadata may not be required if you are working alone on your own computer, but become crucial when data are shared online. Your data management plan should determine whether you need to apply metadata descriptors or tags at some point during your project.

SAMPLE 1:

I will not be using any metadata or international standard for the data collected and generated for this project. However, I will ensure each document that I have created using the Microsoft Word, Microsoft Excel and Microsoft PowerPoint has sufficient basic information such as Author’s name, Title, Subject, Keywords and etc. in the document properties. In addition, a separate readme file will be prepared to describe the details of each data. I will be applying the recommendations provided by Cornell University at http://data.research.cornell.edu/content/readme in the creation of readme file(s). Key elements could include: introductory information about the data, methodological, date-specific and sharing/access related information.

 

 

SAMPLE 2:

The clinical data collected from this project will be documented using CDASH v1.1 standards. The standard is available at CDISC website at http://www.cdisc.org/cdash.

 

 

SAMPLE 3:

Using an electronic lab notebook, we would be generating metadata along with each notebook and postings. The metadata would include Sections, Categories and Keys which would be assigned by collaborators for reuse so as to maintain consistency in the use of terminology. We would also be using the Properties Ontology (ChemAxiomProp) when describing the chemical and materials properties.

 

 

SAMPLE 4:

Metadata about timing and exposure of individual images will be automatically generated by the camera. GPS locations will subsequently be added by post-processing GPS track data based on shared time stamps. Metadata for the image dataset as a whole will be generated by the image management software (iMatch) and will include time ranges, locations, and a taxon list. Those metadata will be translated into Ecological Metadata Language (EML), created using the Morpho software tool, and will include location and taxonomic summaries.

 

(Adapted from: Hampton, S. Examples of Data Management Plans. Retrieved  Nov 24, 2015, from DataOne website: https://www.dataone.org/sites/all/documents/ESA11_SS3_hampton.pdf)

 

 

SAMPLE 5:

We will be using some core elements from the TEI metadata standards http://www.tei-c.org/index.xml to describe our data. We will also be adding some customised elements in the metadata to provide more details on the rights management.

 

 

SAMPLE 6:

The data will be stored in several tables in an MS SQL database, which also includes some “metatables” that describe the original source of various tables and variables. These metatables will also include configuration information for the public website, such as short and long names for variables, numeric format, colours for mapping, etc.  Several standard Census variables, ref: Office of National Statistics https://www.ons.gov.uk/ will also be used.

 

(Adapted from: Jenkins, Keith (2012) “Sociology / Demographics – Cornell University,” Data Curation Profiles Directory: Vol. 4, Article 6. http://dx.doi.org/10.5703/1288284315013)

Data Documentation

What documentation will you be providing to facilitate a better understanding of the project data?

  • List the documentation that you would be providing to explain how the data is to be interpreted and used. Examples of documentation:
    • codebooks
    • lay summary
    • txt
    • webpage
    • electronic lab notebook
  • Content of documentation could include the following:
    • Methodology and procedures used to collect the data
    • Details about codes
    • Definitions of variables
    • Variable field locations
    • Frequencies

Additional Information:

  • Visit the ‘Document your data’ by UK Data Service for more guidance on the different types of documentation.
  • Visit the ‘How to write a lay summary’ by Digital Curation Centre for more guidance on why lay summaries are important, how to write one and examples.

SAMPLE 1:

I would be providing the following accompanying documentation to facilitate a better understanding of the project data.

  • lay summary
  • readme.txt

others: I will also be writing a journal article to share the research data management aspect of my project. The paper would be made available on DR-NTU later.

Data Storing During Project

Where and how are you storing the data during the project?

 

  • List the platforms and devices that will be used to store the data, e.g. electronic lab notebook, Sharepoint, WordPress.
  • Identify the location where the data would be stored e.g. school server. Provide the URLs of online locations.
  • Provide names of people performing data storage roles.

Additional Information:

 

 

SAMPLE 1:

I will be using a networked storage drive XXX, which is a storage for active data for all research staff and students. It is fully backed-up, secure, resilient, and has multi-site storage. It is accessible via VPN (Virtual Private Network) from outside the University.  I will also be using an external storage device such as hard drives / USB flash drives (also known as memory sticks, USB keyrings or pen drives) / Compact Discs (CDs) / Digital Video Discs (DVDs).  Researcher ABC would be coordinating and overall-in-charge for data storage.

SAMPLE 2:

The data will be stored locally on a secure password-protected data server. One set of hard drives and one set of tapes will be stored in XXX building. A second set of hard drives and a second set of tapes will be stored at a XXX building. All data will be back up on a daily basis by XXX (researcher).

SAMPLE 3:

The data (on staff computers and the web server) will be managed according to the standard practices of the college’s IT department and will be password protected. Any restricted, non-public data will be stored on CRADC (Cornell Restricted Access Data Center). All files will be backed up every day by xxx (project team member).

 

Backup & Versioning Control

What backup and versioning control procedures will you be undertaking?

  • Describe the backup and archiving regime you will use to back up all your data to prevent its loss, e.g. through hard disk failure, virus infection or theft.
  • Describe the method you will use to ensure that different versions of your data are identifiable and properly controlled and used.
  • Specify any community agreed or other formal data standards used (with URL references).
  • Provide names of the people performing data backup roles.

Additional Information:

  • Storing data on laptops, computer hard drives or external storage devices alone is very risky. The use of robust, managed storage with automatic backup, for example that provided by university IT team, is preferable.
  • See ‘Backing-up Data’ in UK Data Archive for more tips.
  • See ‘Storage and Security’ in MANTRA for more guidance.
  • 3-2-1 principle of backup (Source: QSTAR Technology, BACKBLAZE)
    • Keep 3 copies of any important file
    • Store files on at least 2 different media types
    • Keep at least 1 copy offsite
  • Learn more about versioning control:

 

SAMPLE 1:

A complete copy of materials will be generated and stored independently on primary and backup sources for both the PI and Co-PI (as data are generated) and with all members of the Expert Panel every 6 months. The project team will be adopting the Version Control guidelines (http://www.nidcr.nih.gov/Research/ToolsforResearchers/Toolkit/VersionControlGuidelines.htm) provided by National Institute of Dental and Craniofacial Research to organise and ensure different versions of the data are identifiable and properly controlled and use.

SAMPLE 2:

We will adopt and use the version control standards recommended by University of Leicester https://www2.le.ac.uk/services/research-data/documents/UoL_VersionControlChart_d0-1.pdf for the transcripts of the interviews and coding in terms of changes the research team has made to the files.

SAMPLE 3:

We will be using Mercurial (https://www.mercurial-scm.org/), a free, distributed source control management tool to manage the data, so that the data would easily be identifiable and properly controlled and used.

SAMPLE 4:

All data will be backed up manually on monthly basis by researcher xxx on a computer hard drive kept at the research team office. The computer will be password protected and only team members will be given the password and right to access the computer. Incremental back-ups will be performed nightly and full back-ups will be performed monthly. Staff xxxx will be keeping versions by appending the date of the update to the file name. Versions of the file that have been revised due to errors/updates will be retained in an archive system. A revision history document will describe the revisions made.

(Adapted from: NSF General: Mauna Loa example. Retrieved from Data Management Planning website: https://www.dataone.org/sites/all/documents/DMP_MaunaLoa_Formatted.pdf)

Long-term Storage & Preservation

a. What are your plans for storing* your working data (physical and digital copies) other than your final dataset after the completion of your project?

b. I will store* the final research data in: NTU Data Repository** OR another open data repository and the URL is:

* NTU policy requires a minimum of 10 years

** different from DR-NTU (which is for journal publications)

Guide for a.:

  • Describe what you intend to do with the working data you have collected and used in your research on completion of the project.

Guide for b.:

  • State where you will store your final research data.

Additional Information:

Additional information for a.:

  • Working data refers to data that are currently being examined and analysed. The working data changes over time as the data are being reviewed, refined, revised, processed or added in order to help the researcher find answers to the research questions. During the data processing period, the same set of working data may be reproduced or recreated to become a new set of data. Hence, working data will not be utilised in any of the research output for evidence support and conclusion. Raw data is considered one components of working data.

Additional information for b.:

  • The final research data refers to the data that exists during the last stage in the data lifecycle in which all re-workings and manipulations of the data by the researcher have ceased.
  • The final research data also refers to the recorded factual materials commonly accepted by the scientific community as necessary to document, support, and validate research findings. Final research data does not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory.
  • Preservation of digital outputs is necessary in order for the research data to endure changes in the technological environment and remain potentially re-usable in the future.
  • NTU researchers should deposit their final research data in the NTU open access research data repository DR-NTU (Data) [https://researchdata.ntu.edu.sg]. Where the DR-NTU (Data) does not meet your requirements, you must still submit information about the data that is deposited elsewhere, to the DR-NTU (Data). DR-NTU (Data) would provide a link to the external data repository(ies). An international list of data repositories is available via Re3data.

[NOTE: DR-NTU (Data) will store and preserve the final version of your final research data for long-term access. If necessary, an embargo period can be activated if reasons are provided].

SAMPLE 1:

 

 

 

 

 

a. I will be storing my working data on NTU Electronic Lab Notebook.

b. I will store the final research data in NTU Data Repository.

SAMPLE 2:

a. I will be storing my working data on a server housed at the school lab.

b. I will store the final research data in another open data repositories and the URL is: http://www. xxxx (open access)

SAMPLE 3:

a. I will be storing my working data on a few DVDs and housed them at the school lab.

b. I will store the final research data in NTU Data Repository.

FAQ

1. What is a Data Management Plan?

A Data Management Plan (DMP) outlines the steps you intend to take in managing the data you collect or generate in the entire course of your research project. It provides information on what the data is about, how it will be processed, the security measures to be taken, where and when it will be stored and preserved and who can have access and use of the data.

2. Why do I need to submit a DMP?

The NTU Research Data Policy requires all research proposals to include a data management plan (DMP). A DMP would be useful in getting the Principal Investigator (PI) and his team to start planning for the necessary resources including an effective and efficient approach for managing their research data. Proper management of research data is necessary to ensure that the research integrity of research work carried out in NTU is beyond reproach.

3. What are the benefits of having a DMP?

A systematic approach in planning the life-cycle of your data will help to alert you on possible data collection and management issues so that you can prepare for them before the project begins. In addition, it will help in the continuity of your research project when staff leaves or new staff start work. Importantly, adherence to data management processes will ensure that the evidence of your research is properly recorded, kept and available when there are concerns on research integrity.

4. Why do I need to share my research data?

The general expectation is that publicly funded research data is a public good, and should be made openly available with as few restrictions as possible. Many research funding agencies worldwide require their grant recipients to share their research data. Examples include National Science Foundation and National Institute of Health in US (see US list by University of Minnesota Libraries); Engineering and Physical Sciences Research Council and Wellcome Trust in UK (see UK list by Digital Curation Centre) as well as National Health and Medical Research Council and Australian Research Council in Australia (see Australian list by Australian National Data Service).

In Singapore, the National Medical Research Council (NMRC) of the Ministry of Health in Singapore has indicated a possible upcoming requirement for recipients of new grant projects of S$250,000 and above to share their research data no later than 12 months after publication of paper that has used the dataset.

NTU researchers are required to share their research data where possible according to the NTU Research Data Policy.

5. What are the benefits of sharing my data?
There are many possible benefits of sharing data. Making your research data easily discoverable and accessible could encourage greater collaboration with people within as well as outside your discipline(s). It could also help reduce unintentional redundancies in replicating past research efforts. Sharing the data underpinning your research publications could increase visibility of your research publications and academic profile. Other researchers could cite your data so you gain credit.

6. I am working with sensitive data. Can I be exempted from filing a DMP?

No. You could still fill up a DMP even when your project data is of sensitive nature. You do not need to disclose any sensitive aspects of your project data when writing a DMP. There is sample text for the relevant question in the NTU DMP template which might give you some idea of what other researchers say in their DMPs in regards to the management of sensitive data.

7. When do I need to submit a DMP?

According to the NTU Research Data Policy, you are required to submit your DMP within the first 3 months of the approval of your project grant. However, the online system would be asking you to submit within 2 weeks so as not to delay the opening of the research funding account for your project.

8. When do I need to submit a new version of DMP?

You are strongly recommended to submit a new version of DMP as and when there is any significant change to the data specified earlier and how the data would be managed.

9. How does the format of completed DMP look like?

The format looks like this.

10. Who can I contact if I need help with answering the questions in DMP or data sharing?

Please contact a scholarly communication librarian at library@ntu.edu.sg if you need help with answering the questions in DMP or data sharing. We would also like to invite you to attend a NTU DMP Writing Workshop. Click here to register.

For RIMS IT support, please send your enquiry to RIMS_SUPPORT@ntu.edu.sg.

Print Friendly
Skip to toolbar