The Data Curation Process
DataFirst is involved in the entire Data Curation Lifecycle to support the research process. See how we curate data for reuse and the other services we offer as shown in our Microdata Service Model. This model also shows how DataFirst supports the virtuous cycle of reuse: We work with data depositors to improve the quality of their data products, based on feedback from researchers.
1. Accepting Data Deposits
Collections Policy - DataFirst accepts deposits of unit record data from census or survey research, or administrative records.
Formats – DataFirst accepts data files in ASCII and all proprietary formats, e.g. Excel, Stata
Documentation – Background documentation helps support data re-use. Any documentation pertaining to the research should be deposited with the data files, including questionnaires, codebooks, and reports.
Data Ownership – Depositors should ensure they are the data owners with the rights to deposit data to be shared by DataFirst
2. Assuring Data
2.1 Disclosure Control
Once a dataset has been deposited with us, we undertake disclosure control to ensure the final shared data files do not contain personal data that could be used to identify individuals. View the DataFirst Disclosure Control Flowchart.
2.2 Data Quality Checking
All datasets deposited with us undergo quality checks to confirm the accuracy and usability of the data. Anomalies in data files and documents are corrected in consultation with data depositors. Errors and corrections are recorded as Data Quality Notes in the metadata provided with each dataset.
2.3 File versioning
Data files with data quality changes will receive new version numbers. File naming and versioning is according to the Data Documentation Initiative (DDI) standard. DataFirst versions at file level as well as at dataset level and therefore individual data files within a dataset may not have the same version numbers. The version number of the dataset will be that of the latest data file. Notes on this are included in the metadata for new versions. The advantage of this is that researchers will not need to download/recheck data files that have not been changed, just the files that have been changed.
3. Describing the Data (Metadata Creation)
Extensive provenance and usage information is created for each dataset in our collection. This metadata is created according to the DDI data description standard using the metadata creation template available free from NESSTAR or the International Household Survey Network.
4. Archiving Data
An archival version of all iterations of each dataset is retained by DataFirst. Archival copies are securely preserved and migrated as technology changes, to ensure they are always accessible.
5. Supporting Data Discovery
Subject and country searches for data are enabled via . Datasets can be searched at study or variable level.
6. Disseminating Data
6.1 Public Access Data
Datasets are distributed online free to all researchers via Researchers register on the site and complete an online data request form to download the data. Access to public use files is immediate. Those requested licensed data will receive an email link to the data. If data is available free and online from another distributor DataFirst provides metadata and documentation but links to the external site for data downloads.
6.2 Secure Data
Sensitive, potentially disclosive data is made available via our . Researchers can apply for accreditation to use this data in a secure facility in the School of Economics at the University of Cape Town.