Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Achievement tests and optimal design for pretesting of questions
Stockholm University, Faculty of Social Sciences, Department of Statistics.ORCID iD: 0000-0003-2889-0263
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Achievement tests are used to measure the students' proficiency in a particular knowledge. Computerized achievement tests (e.g. GRE and SAT) are usually based on questions available in an item bank to measure the proficiency of students. An item bank is a large collection of items with known characteristics (e.g. difficulty). Item banks are continuously updated and revised with new items in place of obsolete, overexposed or flawed items over time. This thesis is devoted to updating and maintaining the item bank with high-quality questions and better estimations of item parameters (item calibration). 

The thesis contains four manuscripts. One paper investigates the impact of student ability dimensionality on the estimated parameters and the other three deal with item calibration.

In the first paper, we investigate how the ability dimensionality influences the estimates of the item-parameters. By a case and simulation study, we found that a multidimensional model better discriminates among the students.

The second paper describes a method for optimal item calibration by efficiently selecting the examinees based on their ability levels. We develop an algorithm which selects intervals for the students' ability levels for optimal calibration of the items. We also develop an equivalence theorem for item calibration to verify the optimal design.  

The algorithm developed in Paper II becomes complicated with the increase of number of calibrated items. So, in Paper III we develop a new exchange algorithm based on the equivalence theorem developed in Paper II.

Finally, the fourth paper generalizes the exchange algorithm described in Paper III by assuming that the students have multidimensional abilities to answer the questions.

Place, publisher, year, edition, pages
Department of Statistics, Stockholm University , 2019. , p. 26
Keywords [en]
Achievement test, Equivalence theorem, Exchange algorithm, Item calibration, Item response theory model, Optimal experimental design
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
URN: urn:nbn:se:su:diva-174079ISBN: 978-91-7797-879-4 (print)ISBN: 978-91-7797-880-0 (electronic)OAI: oai:DiVA.org:su-174079DiVA, id: diva2:1357038
Public defence
2019-11-15, William-Olssonsalen, Geovetenskapens hus, Svante Arrhenius väg 14, floor 1, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Manuscript. Paper 3: Manuscript. Paper 4: Manuscript.

Available from: 2019-10-23 Created: 2019-10-02 Last updated: 2022-02-26Bibliographically approved
List of papers
1. Discrimination with unidimensional and multidimensional item response theory models for educational data
Open this publication in new window or tab >>Discrimination with unidimensional and multidimensional item response theory models for educational data
2022 (English)In: Communications in statistics. Simulation and computation, ISSN 0361-0918, E-ISSN 1532-4141, Vol. 51, no 6, p. 2992-3012Article in journal (Refereed) Published
Abstract [en]

Achievement tests are used to characterize the proficiency of higher-education students. Item response theory (IRT) models are applied to these tests to estimate the ability of students (as latent variable in the model). In order for quality IRT parameters to be estimated, especially ability parameters, it is important that the appropriate number of dimensions is identified. Through a case study, based on a statistics exam for students in higher education, we show how dimensions and other model parameters can be chosen in a real situation. Our model choice is based both on empirical and on background knowledge of the test. We show that dimensionality influences the estimates of the item-parameters, especially the discrimination parameter which provides information about the quality of the item. We perform a simulation study to generalize our conclusions. Both the simulation study and the case study show that multidimensional models have the advantage to better discriminate between examinees. We conclude from the simulation study that it is safer to use a multidimensional model compared to a unidimensional if it is unknown which model is the correct one.

Keywords
Achievement tests, Discrimination, Multidimensional four parameter logistic model, Multidimensional graded response model, Multidimensional item response theory
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-177401 (URN)10.1080/03610918.2019.1705344 (DOI)000504938300001 ()2-s2.0-85078600822 (Scopus ID)
Available from: 2020-01-05 Created: 2020-01-05 Last updated: 2022-09-27Bibliographically approved
2. Optimal Item Calibration for Computerized Achievement Tests
Open this publication in new window or tab >>Optimal Item Calibration for Computerized Achievement Tests
2019 (English)In: Psychometrika, ISSN 0033-3123, E-ISSN 1860-0980, Vol. 84, no 4, p. 1101-1128Article in journal (Refereed) Published
Abstract [en]

Item calibration is a technique to estimate characteristics of questions (called items) for achievement tests. In computerized tests, item calibration is an important tool for maintaining, updating and developing new items for an item bank. To efficiently sample examinees with specific ability levels for this calibration, we use optimal design theory assuming that the probability to answer correctly follows an item response model. Locally optimal unrestricted designs have usually a few design points for ability. In practice, it is hard to sample examinees from a population with these specific ability levels due to unavailability or limited availability of examinees. To counter this problem, we use the concept of optimal restricted designs and show that this concept naturally fits to item calibration. We prove an equivalence theorem needed to verify optimality of a design. Locally optimal restricted designs provide intervals of ability levels for optimal calibration of an item. When assuming a two-parameter logistic model, several scenarios with D-optimal restricted designs are presented for calibration of a single item and simultaneous calibration of several items. These scenarios show that the naive way to sample examinees around unrestricted design points is not optimal.

Keywords
achievement tests, computerized tests, item calibration, optimal restricted design, two-parameter logistic model
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-169646 (URN)10.1007/s11336-019-09673-6 (DOI)000492593800010 ()
Available from: 2019-06-12 Created: 2019-06-12 Last updated: 2022-02-26Bibliographically approved
3. An exchange algorithm for optimal calibration of  items in computerized achievement tests
Open this publication in new window or tab >>An exchange algorithm for optimal calibration of  items in computerized achievement tests
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The importance of large scale achievement tests, like national tests in school, eligibility tests for university, or international assessments for evaluation of students, is increasing. Pretesting of questions for the above mentioned tests is done to determine characteristic properties of the questions by adding them to an ordinary achievement test. If computerized tests are used, it has been shown using optimal experimental design methods that it is efficient to assign pretest questions to examinees based on their abilities. We can consider the specific distribution of abilities of the available examinees and apply restricted optimal designs.A previously used algorithm optimizes the criterion directly. We develop here a new algorithm which builds on an equivalence theorem. It discretizises the design space with the possibility to change the grid during the run, makes use of an exchange idea and filters computed designs. We illustrate how the algorithm works in some examples and how convergence can be checked. We show that this new algorithm can be used flexibly even if different models are assumed for different questions.

National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-174075 (URN)
Available from: 2019-10-02 Created: 2019-10-02 Last updated: 2022-02-26Bibliographically approved
4. Optimal Calibration of Items for Multidimensional Achievement Tests
Open this publication in new window or tab >>Optimal Calibration of Items for Multidimensional Achievement Tests
2024 (English)In: Journal of educational measurement, ISSN 0022-0655, E-ISSN 1745-3984, Vol. 61, no 2, p. 274-302Article in journal (Refereed) Published
Abstract [en]

Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements. For example, multidimensional diagnostic tests can help students to determine which particular domain of knowledge they need to improve for better performance. To estimate the characteristics of candidate items (calibration) for future multidimensional achievement tests, we use optimal design theory. We generalize a previously developed exchange algorithm for optimal design computation to the multidimensional setting. We also develop an asymptotic theorem saying which item should be calibrated by examinees with extreme abilities. For several examples, we compute the optimal design numerically with the exchange algorithm. We see clear structures in these results and explain them using the asymptotic theorem. Moreover, we investigate the performance of the optimal design in a simulation study. 

Keywords
Achievement tests, exchange algorithm, item calibration, multidimensional item response model, optimal restricted design
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-174077 (URN)10.1111/jedm.12386 (DOI)001184813000001 ()2-s2.0-85187667942 (Scopus ID)
Available from: 2019-10-02 Created: 2019-10-02 Last updated: 2024-09-05Bibliographically approved

Open Access in DiVA

Achievement tests and optimal design for pretesting of questions(1085 kB)1589 downloads
File information
File name FULLTEXT01.pdfFile size 1085 kBChecksum SHA-512
52354cfeb6c7004a775f276311aa99772215300fe309c007d38853e329b14ddc25d4536fe10c4e746986047a5d403cd23e8b6c158911377d0b222580f9b20fdd
Type fulltextMimetype application/pdf

Authority records

Ul Hassan, Mahmood

Search in DiVA

By author/editor
Ul Hassan, Mahmood
By organisation
Department of Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 1589 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 874 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf