Evolving from early ILSAs such as the First International Mathematics and Science Study (FIMSS; Husén, 1967), TIMSS has the longest tradition and is IEA's most wellknown ILSA, with 49 participating countries in recent cycles (Mullis, Martin, Foy, & Hooper, 2016). Every four years it assesses students in Grade 4 and 8 on their math and science skills. TIMSS is often said to be "curriculum-based", because it uses a curriculum model consisting of the intended, implemented and attained curriculum. These three aspects represent, respectively, what students are expected to learn according to countries' curriculum policies, the actual teaching in classrooms, and, finally, the achievement level of the students and their attitudes regarding the subjects (Mullis & Martin, 2013). To measure these levels, the set of instruments consists of a curriculum, school, teacher, student and, since 2011, home questionnaire in addition to the math and science test. Up until TIMSS-2015 the test was administered using paper-based booklets, though the transition to digital assessment is planned for upcoming cycles (Mullis & Martin, 2017). Similar to TIMSS regarding sampling design is PIRLS. Since 2001, PIRLS assesses the reading literacy of students in Grade 4. The reading test is centred around two reading Nevertheless, to best investigate the validity of a cross-national measurement, the response patterns should also be studied with a measurement model. This can be done to ensure that the precautions in test construction and administration have indeed resulted in items that show the same measurement properties across groups, that is, to ensure measurement invariance. Potential differences in item response behaviour for students of equal ability can complicate inferences regarding proficiency differences between countries or specific student populations. A lack of measurement invariance, characterized by these differences in response behaviour is called differential item functioning (DIF). The issue of valid measurement in secondary analyses of both test and questionnaire data across (sub)populations, is central to this thesis. Attention is directed at the modelling of item responses across (sub)populations, particularly at dealing with potential DIF. The framework of item response theory (IRT; see, for example, Van 1.4 Research Objectives This thesis comprises of several studies that centre around analyses on ILSA data using the framework of IRT for modelling the constructs of interest. Each study is driven by a substantive question from the field of educational research, where the unique properties of ILSA data, such as standardized measurements across countries, and the use of advanced IRT modelling may contribute in an innovative way. In addition, the studies aim to provide new approaches to study validity issues, particularly DIF. Models in this dissertation will be estimated both within a frequentist and Bayesian framework. 1.5 Outline of the Thesis This thesis continues in Chapter 2 with the modelling of computer and information l...