Theory of selective editing with score functions
2014 (English)In: Journal of Official Statistics, ISSN 0282-423X, E-ISSN 2001-7367Article in journal (Refereed) Accepted
In many realistic datasets there are values that we may suspect to be erroneous. To clean data cost-effectively we must prioritise observations to contact or measure again in order to validate or correct them. Sometimes there is auxiliary information apart from the initially and possibly dubious reported values that allows for prediction of the true values prior to editing. The weighted difference in absolute terms between a predicted and a reported value is referred to as an item score. A large item score indicates a need for checking the observation. Usually we want to edit and verify values on all items of the same unit that need be looked at rather than going back to the same unit several times for each item separately. The article discusses ways of forming a unit score out of a generic set of p item scores for continuous variables. A generalised unit score function that unifies the functions widely used in statistical editing is presented. The optimal choice of unit score function is discussed in a variety of scenarios. The problem of prioritising manual statistical editing of business survey data has been the motivating example.
Place, publisher, year, edition, pages
measurement errors, Minkowski’s metric, subsample for recontact, validation.
Probability Theory and Statistics
Research subject Statistics
IdentifiersURN: urn:nbn:se:su:diva-97146OAI: oai:DiVA.org:su-97146DiVA: diva2:675861