Definition: Data review (data checking)
UN - Data editing terminology
Activity through which the correctness conditions of the data are verified. It also includes the specification of the type of the error or condition not met, and the qualification of the data and its division into the "error-free" and "erroneous" data. Data checking may be aimed at detecting error-free data or at detecting erroneous data. Data review consists of both error detection and data analysis, and can be carried out in manual or automated mode.
Data review/error detection may occur at many levels:
a) within a questionnaire
- Item level / editing of individual data - the lowest logical level of checking and correction during which the relationships among data items are not considered. Validations at this level are generally named "range checking".
Example: age must be between 0 and 120. In more complex range checks, the range may vary by strata or some other identifier.
Example: if strata = "large farm operation", then the number of acres must be greater than 500.
- Questionnaire level / editing of individual records - a logical level of checking and correction during which the relationships among data items in one record/questionnaire are considered.
1) If married = "Yes" then age must be greater than 14.
2) Sum of field acres must equal total acres in farm.
- Hierarchical - This level involves checking items in sub-questionnaires. Data relationships of this type are known as "hierarchical data" and include situations such as questions about an individual within a household. In this example, the common household information is on one questionnaire and each individual's information is on a separate questionnaire. Checks are made to ensure that the sum of the individual's data for an item does not exceed the total reported for the household.
b) across questionnaires / editing of logical units
- A logical level of checking and correction during which the relationships among data in two or more records are considered, namely in a group of records that are logically coupled together. The across questionnaire edits involve calculating valid ranges for each item from the survey data distributions or from historic data for use in outlier detection. Data analysis routines that are usually run at summary time may easily be incorporated into data review at this level. In this way, summary level errors are detected early enough to be corrected during the usual error correction procedures. The "across questionnaire" checks should identify the specific questionnaire that contains the questionable data. "Across questionnaire" level edits are generally grouped into two types: statistical edits and macro edits.
Economic Commission for Europe of the United Nations (UNECE), The Knowledge Base on Statistical Data Editing, Online glossary developed by the UNECE Data Editing Group, 2000