National Income Dynamics Study 2008, Wave 1 Secure Data
Wave 1 Secure Data
Household Survey [hh]
In 2008, the South African Presidency embarked on an intensive effort to track changes in the well-being of South Africans by closely following about 28 000 people - young and old, rich and poor - over a period of years. This was undertaken through initiating the National Income Dynamics Study (NIDS). The NIDS survey is the first national panel study to document the dynamic structure of a sample of household members in South Africa and changes in their incomes, expenditures, assets, access to services, education, health, and other dimensions of well-being. A key feature of the panel study is its ability to follow people as they move out of their original 7 305 households. In doing this, the movement of household members as they leave and/or return to the household or set up their own households will be adequately captured in subsequent waves of this panel study.
The first “baseline” wave of NIDS was conducted by the Southern Africa Labour and Development Research Unit (SALDRU) based at the University of Cape Town's School of Economics. The first wave of fieldwork commenced in February 2008, and data and report released in July 2009. The design of NIDS envisaged data collection every two years.
Elsewhere in the world such surveys have been invaluable in promoting understanding of who is making progress in a society and who is not and, importantly, what factors are driving these dynamics. In addition, panel data is invaluable for the purposes of evaluating and monitoring the efficacy of social policies and programmes. This is because the panel allows researchers and policy analysts to see how households and individuals are impacted when they become eligible for these programmes.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
The units of analysis in the NIDS 2008 survey are individuals and households.
Version 7.0.0: Edited, partially-anonymised dataset for access in our Secure Research Data Centre.
Version 4 of the NIDS Wave 1 secure data, produced 20 February 2012, had the following changes from version 3:
1. Sample size (one household dropped).
Household 109011 has been dropped from the sample. This household was interviewed twice under phase 1 (with hhid 109009 ) and under phase 2 ( with hhid 109011). The latter interview has been excluded from the sample. As a result there is 1 fewer household in the HouseholdQ file. The two individuals (an adult and a proxy) in this household have been dropped from the HouseholdRoster file and the individual level files.
2. Duplicate record for polygamous relationship in HouseholdRoster file
A duplicate record has been created in the HouseholdRoster file for pid 316891 with different hhid's. This individual is resident in two households (106506 and 106744). The implication of this is that individuals in the HouseholdRoster file are now uniquely identified by hhid and pid, rather than pid only. This will be useful to remember when it comes to merging the different STATA files together.
Note that because of the polygamist's pid appearing in two different households, merges should take place on both hhid and pid in Version 4.0 of the data.
3. New Variables added
The following variables have been added to the public release dataset:
a. Interviewer Evaluation variables have been included in the Adult, Child, Proxy and HouseholdQ files.
b. New pid variables have been added for every question that asks for the pcode of the respondent.
c. Date of interview day has been included in all datasets. Users have found this variable useful, hence its inclusion.
4. Variables dropped
a.The household questionnaire variable, hhqi, has been dropped from the wave1 dataset, to avoid confusion with the wave 2 hhid variable in terms of content. The hhqi variable also no longer appears in the wave 2 dataset.
b. Hhderived weight variables have been dropped
5. Variable names updated
a. Year of interview has been included in Education variable names in the Adult, Child and Proxy files
b. The hhid variable has been renamed to w1_hhid in all datasets
6. Update of Key variables, Weights and Z-scores
In the process of undertaking Wave 2 fieldwork, key variables (dob, age, gender, education and race) have been updated as better responses from respondents were received in cases where there had been non responses or errors in wave 1 data. As a result of changes in these key variables and samples sizes of the dataset, new weights and z-scores have been calculated for all households and individuals, where applicable.
Changes in Wave 1 data from Version 4.0 to 4.1:
Changes to the data to cater for a polygamist mandated a change to the weights.
A mistake in the coding file for z-scores found after the release of version 4 has been corrected.
Version 5 of the NIDS Wave 1 (2008) data was received on 22 August 2013. The following changes were made to Version 4.1 of the National Income Dynamics Study (NIDS) Wave 1 dataset to produce Version 5:
Data Corrections in Version 5
Discrepancies in birth history, and parent vital status (mother/father alive) data were corrected with data from call-backs to households.
Duplicate households (resulting from interviews with the same respondents in more than one household) were identified during Wave 3 fieldwork and corrected. This has resulted in a change in the number of individuals and households in the dataset.
PIDs were created for non-resident household members in the Wave 1 HouseholdRoster file. 732 non-residents were matched to respondents in Wave 2 or 3.
Documents Renamed in Version 5
The Wave 1 household questionnaire file was renamed from HouseholdQ to HHQuestionnaire for consistency with Waves 2 and 3.
New Variables in Version 5
These identify mothers and fathers in the NIDS panel even when they were not co-resident with their children or had died.
Renamed Variables in Version 5:
Variables have been renamed in the Child file to ensure consistency in the variable names across files. Please see the User Manual for a list of renamed variables
Dropped Variables in Version 5:
wx_r_age (use best_age in the individual derived file)
Most of the variables dropped were empty variables. Please see the User Manual for a list of the dropped variables
New Weights in Version 5:
All weights were recalculated in version 5. Please see the NIDS Wave 3 User Manual for an explanation of how the weights were calculated the relationship between the different weights.
Changes in Version 5.1
Admin data has been added to the regular wave specific pack. Previously this was a separate item to download via the DataFirst catalogue. We hope that this convenience will enrich users' experience of developing research from this ever growing resource. The publically available data matches the names of schools as collected by NIDS to Department of Basic Education's Ordinary School's Master List. Only a limited number of variables are made publically available to protect the identities of NIDS respondents. A secure data facility is provided where researchers can match their own data sources based on EMIS numbers to the matched schools. See <http://www.nids.uct.ac.za/nids-data/secure-data> for further details.
The variable w1_pi_hhimprent was incorrectly named in the hhderived file in Version5.0. It has been renamed back to w1_hhimprent_inc.
Birth History changes
There were a few changes on the pcode and pid on 8 of the individuals listed on the BH section. An incorrect pcode and pid had been wrongly allocated to these individuals. These have since been corrected.
2 individuals had been incorrectly assigned a pcode of 44 which is invalid. This error has been fixed on the HouseholdRoster file by assigning the correct pcode for the two respondents.
Through interaction with our users it was brought to our attention that the svyset command in STATA was retaining settings. We have subsequently removed these settings from all data sets.
Changes in V5.2 (February 2014)
NIDS datasets have been reweighted to take into account the Census 2011 geographic data. The change in the weights have also impacted slightly on the w1_hhquint variable as it represents the weighted household quintile.
Previous geographic variables have been given the suffix '2001' to distinguish them from the new geographic variables. The following variables were affected:
w1_hhprov became w1_hhprov2001
w1_hhgeo became w1_hhgeo2001
w1_hhdc became w1_hhdc2001
w1_hhmp became w1_hhmp2001
w1_hhea became w1_hhea2001
w1_mapped_prov* became w1_mapped_prov2001*
w1_mapped_dc* became w1_mapped_dc2001*
w1_mapped_mp* became w1_mapped_mp2001*
w1_mapped_geo* became w1_mapped_geo2001*
w1_mapped_ea* became w1_mapped_ea2001*
*Secure dataset variables
Census 2011 Geographic Variables have been brought into the NIDS dataset. The new variables are:
New Variable Name
w1_hhprov2011 w1_hhgeo2011 w1_hhdc2011 w1_hhmdbdc2011 w1_hhmp2011 w1_hhea2011 w1_mapped_prov2011* w1_mapped_dc2011* w1_mapped_mdbdc2011* w1_mapped_mp2011* w1_mapped_geo2011* w1_mapped_eatype2011* w1_mapped_ea2011*
*Secure dataset variables
More detail about this change can be found in the document detailing the Inclusion of Census 2011 data in NIDS.
Changes in version 5.3
Version 5.3 of NIDS Wave 1 2008 had minor changes to some variable lables.
CHANGES IN VERSION 6
NIDS collects data on locations in which respondents have lived in the past (Questions b10 - b16 in the Adult). This migration data was previously coded using 2001 Census data to district municipality level (DC). In the latest release this migration data is are now coded to both the 2001 and 2011 Census data and the new release has both versions of the district municipality codes. New variables for migration have the suffix dc_2001 and dc_2011 for descriptions coded to the 2001 and 2011 Census data respectively.
NIDS has tried to identify and match all the children across Wave 1 - Wave 4 in the Birth History (BH) section. Respondents were contacted for further information, and changes made based on new information received. An additional gain from this exercise is that each child in the BH section now has a PID to identify them.
Negative and Positive events Variables
The following “other” variables in the household questionnaire are now available in the HHQuestionnaire data file: w1_h_nego_o, w1_h_poso1_o & w1_h_poso2_o.
Employment codes for Wave 1 data were previously created using the South African Standard Classification of Occupations (SASCO) codes, whereas subsequent waves used the International Standard Classification of Occupations (ISCO) codes. In order to make Wave 1 consistent with the other waves, all Wave 1 employment descriptions have now been coded using the ISCO codes. The variable names for the codes created have been changed to indicate this. Disaggregated ISCO codes up to the 5-digit level are available in the Secure (restricted access) version of the data, while the one-digit level codes are included in the Public Release data.
Some respondents in the Adult dataset had answered that they had other self-employment activities. However, occupational codes for "other self-employment activities" did not exist in the data. This new variable has been added: w1_a_emsothatc_isco_c, which has the occupational codes of respondents with other self-employment activities.
Police District data
Police district data has now been included as part of the Admin data file. Variables include distance to the nearest police station and distance to the police station in the district in which the household is located. Only categorical distances have been included in the public release version of the data. Actual distances can be found in the secure (restricted-access) version of the data.
Substantive cleaning was done on the variables indicating which household member helped to complete the questionnaire. This resulted in a fourth person being created for the Proxy questionnaire. The variable added is w1_p_intresppid4.
An exercise to reduce inconsistences in the parental information was carried out for all individuals across all waves. Cases with problems were identified by comparing parental information across waves. Information obtained from contacting respondents was used to correct inconsistent parental data, where possible.
The pcode variables have been dropped from Wave 1 data. This was done for the following reasons:
All non-resident individuals now have a pid, thus the pcode becomes a duplicate identifier
The task of cleaning the identifiers was becoming cumbersome, as every pid adjustment required a pcode adjustment
The pcode did not exist in any of the later datasets. We therefore dropped these variable in order to have consistency in the panel.
Table 2 below shows all the variables that have been renamed in V6.0 data.
Quest. Section Old Variable name New Variable Name
Adult Demographics w1_a_movy w1_a_moveyr
Adult Demographics w1_a_brndc w1_a_brndc_2001
Adult Demographics w1_a_lvbfdc w1_a_lvbfdc_2001
Adult Demographics w1_a_lv94dc w1_a_lv94dc_2001
Adult Demographics w1_a_lv06dc w1_a_lv06dc_2001
Adult Labour Market Participation w1_a_em1occ_c w1_a_em1occ_isco_c
Adult Labour Market Participation w1_a_em2occ_c w1_a_em2occ_isco_c
Adult Labour Market Participation w1_a_emsatc_c w1_a_emsatc_isco_c
Adult Labour Market Participation w1_a_emsoth_c w1_a_emsothatc_isco_c
Adult Labour Market Participation w1_a_emctype_c w1_a_emctype_isco_c
Adult Labour Market Participation w1_a_emhtsk_c w1_a_emhtsk_isco_c
Adult Labour Market Participation w1_a_unemtyp_c w1_a_unemtyp_isco_c
Adult Parents' vital status w1_a_mthwrk_c w1_a_mthwrk_isco_c
Adult Parents' vital status w1_a_fthwrk_c w1_a_fthwrk_isco_c
Adult Labour Market Participation w1_a_eminc w1_a_emcinc
Adult Labour Market Participation w1_a_em1inc_s w1_a_em1inc_sh
Adult Labour Market Participation w1_a_eminc_sh w1_a_emcinc_sh
Adult Education w1_a_ed07payr1 w1_a_ed07paypr1
Adult Education w1_a_ed07payr2 w1_a_ed07paypr2
Adult Education w1_a_ed07payr3 w1_a_ed07paypr3
Child Demographics w1_c_movy w1_c_moveyr
Child Demographics w1_c_brndc w1_c_brndc_2001
Child Demographics w1_c_lv06dc w1_c_lv06dc_2001
Child Demographics w1_c_lvbfdc w1_c_lvbfdc_2001
Child Parents' vital status w1_c_mthwrk_c w1_c_mthwrk_isco_c
Child Parents' vital status w1_c_fthwrk_c w1_c_fthwrk_isco_c
Child Education w1_c_ed07wdexp w1_c_ed07wdex
Child Education w1_c_fthedlev w1_c_fthtert
Child Education w1_c_mthedlev w1_c_mthtert
Child Education w1_c_ed07payr1 w1_c_ed07paypr1
Child Education w1_c_ed07payr2 w1_c_ed07paypr2
Child Education w1_c_ed07payr3 w1_c_ed07paypr3
Child Parents and family support w1_c_fththa w1_c_fthdtha
Child Grants w1_c_grcurecr w1_c_grcurecrel
Child Child's Health w1_c_hlthdes w1_c_hldes
Child Parents' vital status w1_c_mthtrt w1_c_mthtertyn
Child Parents' vital status w1_c_fthtrt w1_c_fthtertyn
hhderived w1_hhcluster w1_cluster
hhderived - w1_hhdc2001 w1_dc2001
hhderived - w1_hhdc2011 w1_dc2011
hhderived - w1_hhgeo2001 w1_geo2001
hhderived - w1_hhgeo2011 w1_geo2011
hhderived - w1_hhprov2001 w1_prov2001
hhderived - w1_hhprov2011 w1_prov2011
hhderived - w1_hhmdbdc2011 w1_mdbdc2011
HHQ A w1_h_pcode_pid w1_h_respondent
HHQ Negative Events w1_h_negdthfrin w1_h_negdthfrinc
HHQ Agriculture w1_h_*prd w1_h_*prdss
Proxy Demographics w1_p_movy w1_p_moveyr
Proxy Demographics w1_p_brndc w1_p_brndc_2001
Proxy Demographics w1_p_lv06dc w1_p_lv06dc_2001
Proxy Demographics w1_p_lv94dc w1_p_lv06dc_2011
Proxy Demographics w1_p_lvbfdc w1_p_lv94dc_2001
Proxy Labour Market Participation w1_p_emp w1_p_emactcur_u
Proxy Labour Market Participation w1_p_empinc w1_p_em1inc_sh
Proxy Labour Market Participation w1_p_empocc_c w1_p_em1occ_isco_c
Proxy Labour Market Participation w1_p_empprod_c w1_p_em1prod_c
CHANGES IN VERSION 6.1
Version 6.1 has changes to the weight variables, w1_pweight in the indderived data file and the w1_wgt in the hhderived data file. The weight variables were changed because:
1. Panel weights were missing for some babies born to CSM mothers after Wave 1 (2008)
2. The weight for one respondent was missing
CHANGES IN VERSION 7.0.0
Version 7.0.0 of NIDS wave 1 2008 includes changes to the amount of individuals and households in each data file, largely driven by previously incorrect classification of TSM/CSM status, duplicate interviews and additional baby CSMs not captured in a previous version of this wave. Version 7.0.0 also contains new and renamed variables, and there are changes to the survey weights. For details on these changes please see the document Wave 1 Changes between V6.1 and V7.0.0 provided with the data.
HOUSEHOLD: Household characteristics, household roster, mortality history, living standards, expenditure, consumption, negative events, positive events, agriculture
ADULTS: Demographics, education, labour market participation, income, health, well-being, numeracy, anthropometric data
CHILDREN: Education, health, family support, grants, anthropometric data, numeracy
The Secure (Restricted access) data files contain confidential variables that are not released in the publicly available data. The secure variables include the primary sampling unit (PSU), date of birth day, and full geo-codes. Employment codes are provided up to the four digit level, and a code-list for these is available with the data. A complete list of variables available in the restricted-access data is provided with the data.
The NIDS 2008 covered the whole of South Africa. The lowest level of geographic aggregation for the data is district municipality.
The lowest level of geographic aggregration covered by the the NIDS Secure Data 2008 is household. The data is provided with household GPS coordinates
The target population for NIDS 2008 was private households in all nine provinces of South Africa, and residents in workers' hostels, convents and monasteries. The frame excludes other collective living quarters, such as student hostels, old age homes, hospitals, prisons and military barracks.
Producers and sponsors
Southern Africa Labour and Development Research Unit
University of Cape Town
Government of South Africa
A stratified, two-stage cluster sample design was employed in sampling the households to be included in the base wave. In the first stage, 400 Primary Sampling Units (PSUs) were selected from Stats SA's 2003 Master Sample of 3000 PSUs. This Master Sample was the sample used by Stats SA for its Labour Force Surveys and General Household Surveys between 2004 and 2007 and for the 2005/06 Income and Expenditure Survey. Each of these surveys was conducted on non-overlapping samples drawn within each PSU.
The sample of PSUs for NIDS is a subset of the Master Sample. The explicit strata in the Master Sample are the 53 district councils (DCs). The sample was proportionally allocated to the strata based on the Master Sample DC PSU allocation and 400 PSUs were randomly selected within strata. It should be noted that the sample was not designed to be representative at provincial level, implying that analysis of the results at province level is not recommended.
Sample of dwelling units
At the time that the Master Sample was compiled, 8 non-overlapping samples of dwelling units were systematically drawn within each PSU. Each of these samples is called a "cluster" by Stats SA. These clusters were then allocated to the various household surveys that were conducted by Stats SA between 2004 and 2007. However, two clusters in each PSU were never used by Stats SA and these were allocated to NIDS.
It was sometimes necessary to re-list a PSU when the situation on the ground had drastically changed to an extent that the information recorded on the listing books no longer reflected the situation on the ground. In these cases, the PSU was re-listed and a new sample of dwelling units selected. However, the downside of re-listing a PSU is that the chance of sample overlap with dwelling units that are in other surveys is increased. The extent of this overlap cannot be quantified as the lists are no longer comparable. There is anecdotal evidence that sample overlap might have occurred in some PSUs.
Individual respondent selection
Fieldworkers were instructed to interview all households living at the selected address/dwelling unit. If they found that the dwelling unit was vacant or the dwelling no longer existed they were not permitted to substitute the dwelling unit but recorded this information on the household control sheet.
The household control sheet is a two page form. This form was completed for every dwelling unit that was selected in the study, regardless of whether or not a successful interview was conducted. Where more than one household resided at the selected dwelling unit, a separate household control sheet was completed for every household and they were treated in the data as separate units. In order to qualify as separate households they should not share resources or food. Lodgers and live-in domestic workers were considered separate households.
All resident household members at selected dwelling units were included in the NIDS panel, providing that at least one person in the household agreed to participate in the study. The household roster in the household questionnaire was used to identify potential participants in the study. Firstly, respondents were asked to list all individuals that have lived under this "roof" or within the same compound/homestead at least 15 days during the last 12 months OR who arrived in the last 15 days and this was now their usual residence. In addition the persons listed should share food from a common 'pot' and share resources from a common resource pool. All those listed on the household roster are considered household members.
All resident household members became NIDS sample members. In addition, non-resident members that were "out of scope" at the time of the survey also became NIDS sample members. Out-of-scope household members were those living in insititutions (such as boarding school hostels, halls of residence, prisons or hospitals) which were not part of the sampling frame. These individuals had a zero probability of selection at their usual place of residence and were thus included in the NIDS sample as part of the household that had listed them as non-resident members. These two groups constitute the permanent sample members (PSMs) and should have had an individual questionnaire (adult, child or proxy) completed for them. These individuals are PSMs even if they refused to be interviewed in the base wave.
An initial sample of 9600 dwelling units was drawn with the expectation of realizing 8000 successful interviews. However, during the initial round of fieldwork for Wave 1 we did not achieve the target number of households. Therefore we went back to the field to attempt to overturn refusals in 48 PSUs and to visit 24 new dwelling units in 32 of these areas. Stats SA drew an additional 24 dwelling units from their Master Sample in predominantly White and Asian PSUs in order to improve representation of these population groups in the data.
Response rates in phase 1 of Wave 1 of the NIDS survey were disappointing and phase 2 was embarked upon to realise a more acceptable base wave sample. A detailed analysis of household level and individual level response rates follows. Item non-response rates are not addressed here. Such non-response is flagged in the data and is appropriately discussed in the context of specific analyses in the Discussion Paper series.
Household response rates were calculated using the number of visited dwelling units as the denominator and the number of participating households as the numerator. In the instances where response rates are given by race the predominant race group of the PSU is assigned to all households in that PSU. This is done because, by definition, non participating households were not interviewed and we did not gather information about the race of their members from the questionnaires.
Every effort was made to correctly identify all resident household members at the time of the interview. For different reasons not all resident household members were interviewed. For 1754 adults who were unavailable proxy questionnaires were completed. For a further 1250 adults no questionnaires were completed. For these individuals we only have the information supplied in the household roster, i.e. date of birth, education, etc. They are however panel members and we will attempt to make contact with them in the next wave.
Over the combined field work periods NIDS fieldworkers knocked on 10,642 household doors. Of these households, 7305 agreed to participate and the interview was completed. This equates to a 69% response rate. The total sample for NIDS consists of 409 PSUs. Of those, 9 were replaced in phase 2 because the whole PSU was inaccessible in phase 1. They are therefore excluded from the rest of the calculations.
Dates of Data Collection
Data Collection Mode
National Income Dynamics Study (NIDS) supervisory staff
Development Research Africa
Take Note Trading
Four questionnaires were administered for the National Income Dynamics Study 2008:
HOUSEHOLD QUESTIONNAIRE: This covered household characteristics, household roster, mortality history, living standards, expenditure, consumption, negative events, positive events, agriculture
ADULT QUESTIONNAIRE: This was administered to all people in sampled households who were 15-years old or older on the day of the interview. The Adult Questionnaire collected data on demographics, education, labour market participation, income, health, well-being, numeracy and anthropometric measurements
CHILD QUESTIONNAIRE: This asked questions of household members who were 14-years old or younger, and covered education, health, family support, grants and numeracy and anthropometric data
PROXY QUESTIONNAIRE: These were completed where possible for adults who were unavailable or unable to answer their own adult questionnaire
Initially the intention was that data capture would be done in-house. However, by early March 2008 it became evident that data capture was proceeding too slowly and Citizen Surveys was awarded the tender for the work.All questionnaires were double captured and anomolies reconciled. Regular data dumps enabled the checking of captured data against hard copies of the questionnaires.
Registering to use the NIDS data includes agreement that the data user will not attempt to identify specific individuals from the data. The data user will not redistribute the data to other users and each user is required to register for data usage on the DataFirst website: http://www.datafirst.uct.ac.za
Secure Research Data Centre access https://www.datafirst.uct.ac.za/services/secure-data-services
Southern Africa Labour and Development Research Unit. National Income Dynamics Study Wave 1 2008, Secure Data [dataset]. Version 7.0.0. Pretoria: SA Presidency [funding agency]. Cape Town: Southern Africa Labour and Development Research Unit [implementer], 2018. Cape Town: DataFirst [distributor], 2018. https://doi.org/10.25828/jnkz-s804