Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Achievement tests and optimal design for pretesting of questions
Stockholm University, Faculty of Social Sciences, Department of Statistics.
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Achievement tests are used to measure the students' proficiency in a particular knowledge. Computerized achievement tests (e.g. GRE and SAT) are usually based on questions available in an item bank to measure the proficiency of students. An item bank is a large collection of items with known characteristics (e.g. difficulty). Item banks are continuously updated and revised with new items in place of obsolete, overexposed or flawed items over time. This thesis is devoted to updating and maintaining the item bank with high-quality questions and better estimations of item parameters (item calibration). 

The thesis contains four manuscripts. One paper investigates the impact of student ability dimensionality on the estimated parameters and the other three deal with item calibration.

In the first paper, we investigate how the ability dimensionality influences the estimates of the item-parameters. By a case and simulation study, we found that a multidimensional model better discriminates among the students.

The second paper describes a method for optimal item calibration by efficiently selecting the examinees based on their ability levels. We develop an algorithm which selects intervals for the students' ability levels for optimal calibration of the items. We also develop an equivalence theorem for item calibration to verify the optimal design.  

The algorithm developed in Paper II becomes complicated with the increase of number of calibrated items. So, in Paper III we develop a new exchange algorithm based on the equivalence theorem developed in Paper II.

Finally, the fourth paper generalizes the exchange algorithm described in Paper III by assuming that the students have multidimensional abilities to answer the questions.

Place, publisher, year, edition, pages
Department of Statistics, Stockholm University , 2019. , p. 26
Keywords [en]
Achievement test, Equivalence theorem, Exchange algorithm, Item calibration, Item response theory model, Optimal experimental design
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
URN: urn:nbn:se:su:diva-174079ISBN: 978-91-7797-879-4 (print)ISBN: 978-91-7797-880-0 (electronic)OAI: oai:DiVA.org:su-174079DiVA, id: diva2:1357038
Public defence
2019-11-15, William-Olssonsalen, Geovetenskapens hus, Svante Arrhenius väg 14, floor 1, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Manuscript. Paper 3: Manuscript. Paper 4: Manuscript.

Available from: 2019-10-23 Created: 2019-10-02 Last updated: 2019-10-16Bibliographically approved
List of papers
1. Discrimination with Unidimensional and Multidimensional Item Response Theory Models for Educational Data
Open this publication in new window or tab >>Discrimination with Unidimensional and Multidimensional Item Response Theory Models for Educational Data
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Achievement tests are used to characterize the proficiency of higher-education students. Item response theory (IRT) models are applied to these tests to estimate the ability of students (as latent variable in the model). In order for quality IRT parameters to be estimated, especially ability parameters, it is important that the appropriate number of dimensions is identified. Through a case study, based on a statistics exam for  students in higher education, we show how dimensions and other model parameters can be chosen in a real situation. Our model choice is based both on empirical and on background knowledge of the test. We investigate whether dimensionality influences the estimates of the item-parameters, especially the discrimination parameter which provides information about the quality of the item. We perform a simulation study to generalize our conclusions. Both the simulation study and the case study show that multidimensional models have the advantage to better discriminate between examinees.

National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-174074 (URN)
Available from: 2019-10-02 Created: 2019-10-02 Last updated: 2019-10-04Bibliographically approved
2. Optimal Item Calibration for Computerized Achievement Tests
Open this publication in new window or tab >>Optimal Item Calibration for Computerized Achievement Tests
2019 (English)In: Psychometrika, ISSN 0033-3123, E-ISSN 1860-0980, Vol. 84, no 4, p. 1101-1128Article in journal (Refereed) Published
Abstract [en]

Item calibration is a technique to estimate characteristics of questions (called items) for achievement tests. In computerized tests, item calibration is an important tool for maintaining, updating and developing new items for an item bank. To efficiently sample examinees with specific ability levels for this calibration, we use optimal design theory assuming that the probability to answer correctly follows an item response model. Locally optimal unrestricted designs have usually a few design points for ability. In practice, it is hard to sample examinees from a population with these specific ability levels due to unavailability or limited availability of examinees. To counter this problem, we use the concept of optimal restricted designs and show that this concept naturally fits to item calibration. We prove an equivalence theorem needed to verify optimality of a design. Locally optimal restricted designs provide intervals of ability levels for optimal calibration of an item. When assuming a two-parameter logistic model, several scenarios with D-optimal restricted designs are presented for calibration of a single item and simultaneous calibration of several items. These scenarios show that the naive way to sample examinees around unrestricted design points is not optimal.

Keywords
achievement tests, computerized tests, item calibration, optimal restricted design, two-parameter logistic model
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-169646 (URN)10.1007/s11336-019-09673-6 (DOI)000492593800010 ()
Available from: 2019-06-12 Created: 2019-06-12 Last updated: 2019-11-11Bibliographically approved
3. An exchange algorithm for optimal calibration of  items in computerized achievement tests
Open this publication in new window or tab >>An exchange algorithm for optimal calibration of  items in computerized achievement tests
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The importance of large scale achievement tests, like national tests in school, eligibility tests for university, or international assessments for evaluation of students, is increasing. Pretesting of questions for the above mentioned tests is done to determine characteristic properties of the questions by adding them to an ordinary achievement test. If computerized tests are used, it has been shown using optimal experimental design methods that it is efficient to assign pretest questions to examinees based on their abilities. We can consider the specific distribution of abilities of the available examinees and apply restricted optimal designs.A previously used algorithm optimizes the criterion directly. We develop here a new algorithm which builds on an equivalence theorem. It discretizises the design space with the possibility to change the grid during the run, makes use of an exchange idea and filters computed designs. We illustrate how the algorithm works in some examples and how convergence can be checked. We show that this new algorithm can be used flexibly even if different models are assumed for different questions.

National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-174075 (URN)
Available from: 2019-10-02 Created: 2019-10-02 Last updated: 2019-10-04Bibliographically approved
4. Optimal calibration of  items for multidimensional achievement tests
Open this publication in new window or tab >>Optimal calibration of  items for multidimensional achievement tests
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements due to  diagnostic nature. Diagnostic pretests help the organization to assist the students in determining which ability needs to be improved from particular domain of knowledge for better performance in the test. To develop diagnostic pretest items for multidimensional achievement tests, we generalize the previously developed exchange algorithm in multidimensional setting. We also develop an asymptotic theorem which helps us to choose an item at extreme ability levels to sample the examinees.

Keywords
Achievement tests, exchange algorithm, item calibration, multidimensional item response model, optimal restricted design
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:su:diva-174077 (URN)
Available from: 2019-10-02 Created: 2019-10-02 Last updated: 2019-10-04Bibliographically approved

Open Access in DiVA

Achievement tests and optimal design for pretesting of questions(1085 kB)12 downloads
File information
File name FULLTEXT01.pdfFile size 1085 kBChecksum SHA-512
52354cfeb6c7004a775f276311aa99772215300fe309c007d38853e329b14ddc25d4536fe10c4e746986047a5d403cd23e8b6c158911377d0b222580f9b20fdd
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Ul Hassan, Mahmood
By organisation
Department of Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 12 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 143 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf