Progress in International Reading and Literacy Study 2001
Socio-Economic/Monitoring Survey [hh/sems]
The PIRLS 2001 aimed to generate a database of student achievement data in addition to information on student, parent, teacher, and school background data for the 35 countries that participated in PIRLS 2001.
Kind of Data
Sample survey data
Unit of Analysis
Individuals and establishments
DataFirst downloaded a version of the PIRLS data (as prepared by IEA) on the 31st of August 2015. This dataset was originally made available as 266 separate datafiles that were defined by country and datafile type (Student Achievement Test File, Student Background File, Teacher Background File, Learning to Read (Home) Survey File, School Background File, Student-Teacher Linkage File, andStudent Achievement Score Reliability File). That is, 38 areas (35 countries; Ontario; Quebec; Basque Country) and 7 separate datafile types (38 multiplied by 7 yields 266). All datafiles of the same type were combined to yield seven separate datafiles. This is the first version of such a dataset hosted by DataFirst.
A new approach to scaling the reading purposes and processes was introduced in PIRLS 2011 to enhance measurement of trends over time in these domains. This same approach was applied retrospectively to the PIRLS 2001 and 2006 reading purposes and processes so that these data correspond to the trend results presented in PIRLS 2011 International Results in Reading. All data files were updated to reflect that change. Please note that the overall reading achievement scores for PIRLS 2001 remain unchanged.
The PIRLS 2001 contains information on the following:
• Student achievement (in PIRLS designed test)
• Teacher background
• Student background
• School background
• Parent background
The survey had international coverage
The PIRLS 2001 target populations are all children in "the upper of the two grades with the most 9-year-olds at the time of testing" (PIRLS, 1999) in each participating country. This corresponds to the fourth grade in most countries. This population was chosen because it represents an important transition point in children's development as readers. In most countries, by the end of fourth grade, children are expected to have learned how to read, and are now reading to learn.
The teachers in the PIRLS 2001 international database do not constitute representative samples of teachers in the participating countries. Rather, they are the teachers of nationally representative samples of students. Therefore, analyses with teacher data should be made with students as the units of analysis and reported in terms of students who are taught by teachers with a particular attribute. Teacher data are analyzed by linking the students to their teachers. The student-teacher linkage data files are used for this purpose. The same caveat applies to analyses of schools and parents.
Producers and sponsors
International Association for the Evaluation of Educational Achievement
International Study Centre
National Centre for Education Statistics of the U.S. Department of Education
The World Bank
To be acceptable for PIRLS 2001, national sample designs had to result in probability samples that gave accurate weighted estimates of population parameters such as means and percentages, and for which estimates of sampling variance could be computed. The PIRLS 2001 sample design is derived from the design of IEA's TIMSS (see Foy & Joncas, 2000), with minor refinements. Since sampling for PIRLS was to be implemented by the National Research Coordinator (NRC) in each participating country - often with limited resources - it was essential that the design be simple and easy to implement while yielding accurate and efficient samples of both schools and students.
The international project team provided manuals and expert advice to help NRCs adapt the PIRLS sample design to their national system, and to guide them through the phases of sampling. The School Sampling Manual (PIRLS, 1999) describes how to implement the international sample design to select the school sample; and offers advice on initial planning, adapting the design to national situations, establishing appropriate sample selection procedures, and conducting fieldwork. The Survey Operations Manual and School Coordinator Manual (PIRLS, 2001b, 2001a) provide information on sampling within schools, assigning assessment booklets and questionnaires to sampled students, and tracking respondents and non-respondents. To automate the rather complex within-school sampling procedures, NRCs were provided with sampling software jointly developed by the IEA Data Processing Center and Statistics Canada (IEA, 2001).
In IEA studies, the target population for all countries is known as the international desired target population. This is the grade or age level that each country should address in its sampling activities. The international desired target population for PIRLS 2001 was the following:
"All students enrolled in the upper of the two adjacent grades that contain the largest proportion of 9-year-olds at the time of testing."
PIRLS expected all participating countries to define their national desired population to correspond as closely as possible to its definition of the international desired population. Using its national desired population as a basis, each participating country had to define its population in operational terms for sampling purposes. This definition, known in IEA terminology as the national defined population, is essentially the sampling frame from which the first stage of sampling takes place. Ideally, the national defined population should coincide with the national desired population, although in reality there may be some school types or regions that cannot be included; consequently, the national defined population is usually a very large subset of the national desired population. All schools and students in the desired population not included in the defined population are referred to as the excluded population.
The international sample design for PIRLS is generally referred to as a two-stage stratified cluster sample design. The first stage consists of a sample of schools, which may be stratified; the second stage consists of a sample of one or more classrooms from the target grade in sampled schools.
For more information on the approach to sampling adopted please consult section 5 of the PIRLS 2001 user guide.
Deviations from the Sample Design
Although countries were expected to do everything possible to maximize coverage of the population by the sampling plan, schools could be excluded, where necessary, from the sampling frame for the following reasons:
• They were in geographically remote regions.
• They were extremely small in size.
• They offered a curriculum or a school structure that was different from the mainstream educational system(s).
• They provided instruction only to students in the categories defined as “within-school exclusions.”
Within-school exclusions were limited to students who, because of some disability,were unable to take the PIRLS tests. NRCs were asked to define anticipated withinschool exclusions. Because these definitions can vary internationally, they were also asked to follow certain rules adapted to their jurisdictions. In addition, they were to estimate the size of the included population so that their compliance with the 95 percent rule could be projected. The general PIRLS rules for defining within-school exclusions included the following three groups:
• Educable mentally-disabled students. These are students who were considered, in the professional opinion of the school principal or other qualified staff members, to be educable mentally disabled – or who had been so diagnosed in psychological tests. This category included students who were emotionally or mentally unable to follow even the general instructions of the PIRLS test. It did not include students who merely exhibited poor academic performance or discipline problems.
• Functionally-disabled students. These are students who were permanently physically disabled in such a way that they could not perform in the PIRLS tests. Functionally-disabled students who could perform were included in the testing.
• Non-native-language speakers. These are students who could not read or speak the language of the test, and so could not overcome the language barrier of testing. Typically, a student who had received less than one year of instruction in the language of the test was excluded, but this definition was adapted in different countries. A major objective of PIRLS was that the effective target population, the population actually sampled by PIRLS, be as close as possible to the international desired population. Each country had to account for any exclusion of eligible students from the international desired population. This applied to school-level exclusions as well as within-school exclusions. See Appendix B of the PIRLS 2001 Technical Report for a detailed account of sample implementation in each country.
*** Weight Variables Included in the Student Data Files ***
Each student’s sampling weight is a composite of five factors: the school weighting factor, the school weighting adjustment, the class weighting factor, the student weighting factor, and the student weighting adjustment. In addition, three versions of each student’s weight are provided – the “total student” weight, the “senate” weight, and the “house” weight – each with its own particular uses.
The variables described in this section are included in the Student Background and Student Achievement files. The meaning and interpretation of the weights in each of the files is the same. The weighting factors included in the student-level data files and their adjustment factors are as follows:
WGTFAC1 School Weighting Factor
This variable corresponds to the inverse of the probability of selection for the school where the student is enrolled.
WGTADJ1 School Weighting Adjustment
This is an adjustment that is applied to WGTFAC1 to account for nonparticipating schools in the sample. Multiplying WGTFAC1 by WGTADJ1 gives the sampling weight for the school, adjusted for non-participation.
WGTFAC2 Class Weighting Factor
This is the inverse of the probability of selection of the classroom within the school. Since, in general, only one classroom was selected per grade within each school, there was no need to compute an adjustment factor for the classroom weight.
WGTFAC3 Student Weighting Factor
This is the inverse of the probability of selection for the individual student within a classroom. In cases where an intact classroom was selected, the value is set to 1 for all members of the classroom.
WGTADJ3 Student Weighting Adjustment
This is an adjustment applied to the variable WGTFAC3 to account for nonparticipating students in the selected school and/or classroom. Multiplying the variables WGTFAC2, WGTFAC3, and WGTADJ3 and adding them up within each school gives an estimate of the number of students within the sampled school.
The five variables listed above are combined to give a student’s overall sampling weight. The probability of selecting an individual student is the product of three independent events: selecting the school, the classroom, and the student. To obtain the probability of selection for an individual student, multiply three selection probabilities – school, classroom, and student – and their respective adjustment factors. Inverting this probability gives the sampling weight for the student.
Three versions of the students’ sampling weight are provided in the user database. All three give the same figures for statistics such as means and proportions, but vary for statistics such as totals and population sizes. Each one has particular advantages in certain circumstances. These three versions are as follows:
TOTWGT Total Student Weight
This is obtained by simply multiplying the variables WGTFAC1, WGTADJ1, WGTFAC2, WGTFAC3, and WGTADJ3 for the student. The sum of these weights within a sample provides an estimate of the size of the population. Although this is a commonly used sampling weight, it sometimes adds to a very large number, and to a different number within each country. This is not always desirable. For example, if you want to compute a weighted estimate of the mean achievement in the population across all countries, using the variable TOTWGT as your weight variable will lead each country to contribute proportionally to its population size, with the large countries counting more than small countries. Although this is desirable in some circumstances (e.g., when computing the 75th percentile for mathematics achievement for students around the world), in general TOTWGT is not the student weight of choice for cross-country analyses, since it does not treat countries equally, and gives inflated results in significance tests when the proper adjustments are not used.
SENWGT Senate Weight
The variable SENWGT, within each country, is proportional to TOTWGT multiplied by the ratio of 500 divided by the sum of the weights over all students in the grade. These sampling weights can be used when international estimates are sought and you want to have each country contribute the same amount to the international estimate. When this variable is used as the sampling weight for international estimates, the contribution of each country is the same, regardless of the size of the population. See PIRLS 2001 User Guide for more information.
HOUWGT House Weight
The variable HOUWGT is proportional to TOTWGT multiplied by the ratio of the sample size (n) divided by sum of the weights over all students in the grade. These sampling weights can be used when you want the actual sample size to be used in performing significance tests. Although some statistical computer software packages allow you to use the sample size as the divisor in the computation of standard errors, others will use the sum of the weights, and this results in severely deflated standard errors for the statistics if TOTWGT is used as the weighting variable. When performing analyses using such software, it is recommended to use the variable HOUWGT as the weight variable. Because of the clustering effect in most PIRLS samples, it may also be desirable to apply a correction factor such as a design effect to the HOUWGT variable.
*** Weight Variables Included in the Student-Teacher Linkage Files ***
The individual student sampling weights generally should be used when you want to obtain estimates at the student level. The exception is when student and teacher data are to be analyzed together. In this case, a separate set of weights have been computed to account for the fact that a student could have more than one teacher. This set of weights is included in the Student-Teacher Linkage file and is listed below.
This weight is computed by dividing the sampling weight for the student by the number of teachers that the student has. This weight should be used whenever you want to obtain estimates regarding students and their teachers. The Student-Teacher Linkage file also includes variables that indicate the number of teachers
the student has.
*** Weight Variables Included in the School Data Files ***
The PIRLS samples are samples of students within countries. Although they are made up of a sample of schools within the countries, the samples of schools are selected so that the sampling of students, rather than the sampling of schools, is optimized. In particular, the probability-proportional-to-size sampling methodology causes large schools to be oversampled. Several weight variables are included in the school files, as follows:
WGTFAC1 School Weighting Factor
This variable corresponds to the inverse of the probability of selection for the school where the student is enrolled.
WGTADJ1 School Weighting Adjustment
This is an adjustment that is applied to WGTFAC1 to account for nonparticipating chools in the sample. If you were to multiply WGTFAC1 by GTADJ1 you would obtain the sampling weight for the school, adjusted for non-participation.
SCHWGT School-level Weight
The school sampling weight is the inverse of the probability of selection for the school, multiplied by its corresponding adjustment factor. It is computed as the roduct of WGTADJ1 and WGTFAC1. Although this weight variable can be used to estimate the number of schools with certain characteristics, it is important to keep in mind that the sample selected for PIRLS is a good sample of students, but not necessarily an optimal sample of schools. Schools are selected with probability proportional to their size, so it is expected that there is a greater number of large schools in the sample. For countries that sampled by track within school, the SCHWGT is based on the track size rather than the total school size. This may lead to invalid school-weighted analyses.
*** Other Sampling Variables Included in the Student and Student-Teacher Link Files ***
With complex sampling designs that involve more than simple random sampling, as in the case of PIRLS where a multi-stage cluster design was used, there are several methods for estimating the sampling error of a statistic that avoid the assumption of simple random sampling. One such method is the jackknife repeated replication (JRR) technique (Wolter, 1985). The particular application of the JRR technique used in PIRLS is termed a paired selection model because it assumes that the sampled population can be partitioned into strata, with the sampling in each stratum consisting of two primary sampling units (PSU), selected independently. The following variables capture the information necessary to estimate correct standard errors using the JRR technique:
The variable JKZONE indicates the sampling zone or stratum to which the student’s school is assigned. The sampling zones can have values from 1 to 75 in the Student Background and Student Achievement data files. This variable is included in the Student Background and the Student Achievement data files.
The variable JKREP indicates the PSU and its value are used to determine how the student is to be used in the computation of the replicate weights. This variable can have values of either 1 or 0. Those student records with a value of 0 should be excluded from the corresponding replicate weight, and those with a value of 1 should have their weights doubled. This variable is included in the Student Background and the Student Achievement data files. For each individual student, this variable is identical in these two files. Additionally, the variables JKCZONE and JKCREP are included in the school file.
The variable JKCREP can have values of either 1 or 0. It indicates whether this school is to be dropped or have its weight doubled when estimating standard errors. Those school records with a value of 0 should be excluded from the corresponding replicate weight, and those with a value of 1 should have their weights doubled.
Dates of Data Collection
Data Collection Mode
Data Collection Notes
Each country was responsible for carrying out all aspects of the data collection, using standardized procedures developed for the study. Manuals provided explicit instructions to the NRCs and their staff members on all aspects of the data collection – from contacting sampled schools to packing and shipping materials to the IEA Data Processing Center for processing and verification. Manuals were also prepared for test administrators and for individuals in the sampled schools who work with the national centers to arrange for the data collection within the schools. These manuals addressed all aspects of the assessment administration within schools (including test security, distribution of booklets, timing and conduct of the testing session, and returning materials to the national center).
The PIRLS International Study Center placed great emphasis on monitoring the quality of the PIRLS data collection. In particular, the International Study Center implemented an international program of site visits, whereby international quality control monitors visited a sample of 15 schools in each country and observed the test administration. In addition to the international program, NRCs were also expected to organize an independent national quality control program based upon the international model. The latter program required national quality control observers to document data collection activities in their country. The national quality control observers visited a random sample of 10 percent of the schools (in addition to those visited by the international quality control monitors), and recorded their observations from the testing sessions for later analysis.
PIRLS Background Questionnaires
By gathering information about children’s experiences together with reading achievement on the PIRLS test, it is possible to identify the factors or combinations of factors that relate to high reading literacy. An important part of the PIRLS design is a set of questionnaires targeting factors related to reading literacy. PIRLS administered four questionnaires: to the tested students, to their parents, to their reading teachers, and to their school principals.
Each student taking the PIRLS reading assessment completes the student questionnaire. The questionnaire asks about aspects of students’ home and school experiences – including instructional experiences and reading for homework, selfperceptions and attitudes towards reading, out-of-school reading habits, computer use, home literacy resources, and basic demographic information.
Learning to Read (Home) Survey
The learning to read survey is completed by the parents or primary caregivers of each student taking the PIRLS reading assessment. It addresses child-parent literacy interactions, home literacy resources, parents’ reading habits and attitudes, homeschool connections, and basic demographic and socioeconomic indicators.
The reading teacher of each fourth-grade class sampled for PIRLS completes a questionnaire designed to gather information about classroom contexts for developing reading literacy. This questionnaire asks teachers about characteristics of the class tested (such as size, reading levels of the students, and the language abilities of the students). It also asks about instructional time, materials and activities for teaching reading and promoting the development of their students’ reading literacy, and the grouping of students for reading instruction. Questions about classroom resources, assessment practices, and home-school connections also are included. The questionnaire also asks teachers for their views on opportunities for professional development and collaboration with other teachers, and for information about their education and training.
The principal of each school sampled for PIRLS responds to the school questionnaire. It asks school principals about enrollment and school characteristics (such as where the school is located, resources available in the surrounding area, and indicators of the socioeconomic background of the student body), characteristics of reading education in the school, instructional time, school resources (such as the availability of instructional materials and staff), home-school connections, and the school climate.
To ensure the availability of comparable, high-quality data for analysis, PIRLS took rigorous quality control steps to create the international database. Countries used manuals and software provided by PIRLS to create and check their data files, so that the information would be in a standardized international format before being forwarded to the IEA Data Processing Center. Upon arrival at the DPC, the data underwent an exhaustive cleaning process involving several steps and procedures designed to identify, document, and correct deviations from the international instruments, file structures, and coding schemes. The process also emphasized consistency of information within national data sets, and appropriate linking among the student, parent, teacher, and school data files.
Public use files, available to all
International Association for the Evaluation of Educational Achievement (IEA). Progress in International Reading and Literacy Study 2001 [dataset]. Version 1. Chestnut Hill, MA: PIRLS International Study Centre [producer], 2003. Cape Town: DataFirst [distributor], 2015. DOI: https://doi.org/10.25828/e2rk-zf96