The BAIS is Botswana's national population based household sexual behavioural survey. The baseline survey was conducted in 2001. Further surveys were conducted in 2004, 2008 and 2013. The primary objective of the BAIS III 2008 was to update current information on the sexual behavioral patterns of the population aged 10-64 years and the HIV prevalence and incidence rates among those aged 18 months and older at national, district and sub-district level. This information wil be used for continuous strategic prevention, national HIV program planning and future HIV and AIDS research.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
Households and individuals
Version 1: Edited, anonymised data available from another repository.
The survey was intended to provide:
i. National HIV prevalence and incidence estimates among the population aged 18 months and older
ii. Indicative trends in sexual and preventive behaviour among the population aged 10-64 years
iii. A comparison between HIV rate, behavior, knowledge, attitude, and cultural factors that are associated with the epidemic with estimates derived from previous surveys
iv. Demographic and socio-economic data, and data on housing and household members to examine the determinants and consequences of the pandemic.
The survey had national coverage.
The survey covered all household members
Producers and sponsors
Ministry of Finance and Development Planning
National AIDS Coordinating Agency (NACA)
For BAIS-II the sampling frame is based on the 2001 Population and Housing Census. This comprised the list of all Enumeration Areas (EAs) together with number of households.
All districts and major urban centres become their own strata. EAs were grouped according to ecological zones in rural districts and according to income categories in cities/towns. Geographical stratification along ecological zones and income categories was undertaken to improve the accuracy of the survey data because of the homogeneity of the variables within each stratum
A stratified two-stage probability sample design was used for the selection of the sample.The first stage was the selection of EAs as Primary Sampling Units (PSUs) selected with probability proportional to measures of size (PPS), where measures of size (MOS) were the number of households in the EA as defined by the 2001 Population and Housing Census. In all 459 EAs were selected with probability proportional to size.In the second stage of sampling, the households were systematically selected from a fresh list of occupied households prepared at the beginning of the survey's fieldwork (i.e. listing of households for the selected EAs). Overall 8,275 households were drawn systematically.
Once the data set was cleaned, sampling weights were applied to the data. Being a multistage design, it follows naturally that the sample selected at each stage represents (or is assumed to represent) the respective population. The fundamental assumption is that units selected at each stage were similar to those not selected, in respect of characteristics of interest. In the treatment of units for the non-response the assumption that the responders were similar to non-respondents though should not be always taken for granted. Sampling weights are equal to the inverse of the probability of selection. Therefore the sampling probabilities at first stage of selection of EAs including probabilities of selecting the households were used to calculate the design weights. Non response adjustments were also taken into consideration when calculating the final sampling weight.
Dates of Data Collection
Data Collection Mode
The 2008 BAIS III has three major data collection tools, namely the Household Questionnaire, the Individual Questionnaire and Blood Sample Collection.
1. The Household Questionnaire was used to list all members of the selected households and their demographic characteristics such as age, sex, orphanhood (0-17 years) and economic activity.
2. The Individual Questionnaire was designed to capture information regarding demographic characteristics, care and support, marriage and cohabiting partnerships, alcohol consumption and drug use, sexual history and behavior, male circumcision and sexually transmitted diseases, knowledge about HIV/AIDS and level of interventions, attitudes towards people with HIV/AIDS, childbearing and antenatal care as well as availability of social and medical services in response to the pandemic.
The third component of the survey was the collection of blood samples from members of households aged 18 months and over for testing and estimation of HIV prevalence and derivation of incidence measures.
Estimates of Sampling Error
The estimates from a sample survey are affected by two types of errors: (1) non-sampling error, and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2008 BAIS III to minimise these type of errors, non-sampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2008 BAIS III is only one of many samples that could have been selected from the same population, using the same sample design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
The standard error can also be used to compute the design effect (DEFT) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFT value of 1 indicates that the sample design is as efficient as simple random sample: a value greater than 1 indicates that increase in the sampling error is due to the use of more complex and less statistically efficient design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulae for calculating standard errors. However, the BFHS sample is the results of a stratified two stage design which is considered a complex design, hence special methods and softwares are required to take into account the complexity of the design.
WesVar 4.3 statistical software (supported by WESTAT) was used to obtain standard errors, confidence intervals and design effect for selected indicators. It is a powerful tool for statistical data analysis from complex survey designs which includes multi-stage, stratification and unequal probability samples. Jackknife replication method was applied which forms part of the replication options within this software. To estimate variances using the jackknife method requires forming replications from the full sample by randomly eliminating one sample cluster (enumeration area) from a domain or stratum at a time. Then a pseudo-estimate is formed from the retained EAs, which are re-weighted to compensate for the eliminated unit. Thus, for a particular stratum containing k clusters, k replicated estimates are formed by eliminating one of these, at a time, and increasing the weight of the remaining(k - 1) clusters by a factor of k /(k - 1). This process is repeated for each cluster.
Government of Botswana
Public use files, available to all
Central Statistics Office. Botswana AIDS Impact Survey III 2008 [dataset]. Version1. Gaborone: Central Statistics Office (now Statistics Botswana)[producer and distributor], 2015.