Survey ID Number
World Health Survey 2003, South Africa
SAMPLING GUIDELINES FOR WHS
Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.
The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.
The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.
All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO
Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.
Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).
Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.
MULTI-STAGE CLUSTER SELECTION
A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.
In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.
In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.
It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which increases sample variance and effectively reduces our estimating power. WHO requires an absolute maximum of 50 respondents per PSU, and ideally would suggest 20-30. This means that for a sample size of 5000 respondents, 100- 200 PSU clusters should be taken into the sample. Calculating that, roughly, one fifth of the total number of PSU clusters in a country will be randomly selected into the survey sample, the sampling frame should consist of 500-1000 PSU clusters.
Probability sampling means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. Non-probability methods of sampling such as quota or convenience sampling and random walk, may introduce bias into the survey, will throw survey findings into question, and are not accepted by WHO.
The probability of selection into the survey sample for each cluster will be proportional to its relative size. Systematic Sampling Systematic sampling is the ordered sampling at fixed intervals from a list, starting from a randomly chosen point. Typically, systematic sampling is not used at the first stage of sampling (selection of PSUs) because it renders the estimation of sampling error difficult.
Systematic sampling is recommended at the SSU, TSU, and household selection stages of sampling. Systematic sampling may be linear or circular.
SELECTION OF HOUSEHOLDS
The Household is a device used to get at the individual. The household is the sampling unit while the individual is the observational unit. While it would be preferable to randomly select from a list of all eligible persons in a country, such lists, with a few exceptions, are not available, so we must employ a final cluster, the household, to get at our observational units.
Households will be selected from lists of dwelling units. Non-probabilistic methods of household selection such as the random walk are not acceptable. Such lists are typically available from population registries, household listings, voter lists and census list. As it is essential to include all households in the sampling frame, an assessment of the methodology employed to select households must be made:
- How much has the population changed since these lists were made?
- Completeness of coverage. Are there unregistered populations (e.g. slums)
- Population shifts
- Changes in Registry
Almost all lists will suffer from routine problems. WHO recommends that survey institutions manually enumerate all the households in the sampling units randomly selected into the survey sample. If existing lists or registries will be used, then a detailed analysis of their quality must be made and they must be updated to ensure that there is no exclusion of households from the survey sampling frame.
SELECTION OF INDIVIDUALS FROM HOUSEHOLD ROSTER
All members of each household selected into the survey sample will be enumerated on the household roster. A member of the household is defined as someone who usually stays in the household, sleeps and shares meals, who has that address as primary place of residence, or who spends more than 6 months a year living there. Country-specific variations in this standard definition are allowed in consultation with WHO.
The respondent for the survey will be selected among all eligible members of the household using Kish tables. Kish tables provide a method by which each eligible person in a household has an equal probability of selection into the survey sample. It is extremely important for the representativeness of the survey sample and the integrity of the survey that the Kish tables are properly implemented. All interviews where the Kish selection method is not properly implemented will be rejected. The Kish technique allows adequate representation for all the persons in the household.