The sensitivity of estimates of post-apartheid changes in South African poverty and inequality to key data imputations

Type Working Paper - CSSR Working Paper 106
Title The sensitivity of estimates of post-apartheid changes in South African poverty and inequality to key data imputations
Issue 106
Publication (Day/Month/Year) 2005
Page numbers 1-31
We begin by summarising the literature that has assessed medium-run changes in poverty and inequality in South Africa using census data. According to this literature, over the 1996 to 2001 period both poverty and inequality increased. In this paper we assesses the robustness of these results to the large percentage of individuals and households in both censuses for whom personal income data is missing and to the fact that personal income is collected in income bands rather than as point estimates. First, we use a sequential regression multiple imputation approach to impute missing values for the 2001 census data. Relative to the existing literature, the imputation results lead to estimates of mean income and inequality (as measured by the Gini coefficient) that are higher and estimates of poverty that are lower. This is true even accounting for the wider confidence intervals that arise from the uncertainty that the imputations bring into the estimation process. Next we go on to assess the influence of dubious zero values by setting them to missing and re-doing the multiple imputation process. This increases the uncertainty associated with the imputation process as reflected in wider confidence intervals on all estimates and only the Gini coefficient is significantly different from the first set of estimated parameters. The final imputation exercise assesses the sensitivity of results to the practice of taking personal incomes recorded in bands and attributing band midpoints to them. We impute an alternative set of intra-band point incomes by replicating the intra-band empirical distribution of personal incomes from a national income and expenditure survey undertaken in the year before each census. Using the empirical distributions increases estimated inequality although the differences are relatively small. We finish our empirical work with a discussion of provincial poverty shares as a policy relevant illustration of the importance of dealing with missing values. Overall our results for 1996 and 2001 confirm the major findings from the existing literature while generating more reliable confidence intervals for the key parameter of interest than are available elsewhere.

Related studies