Data validation and verification

Data validation is a method of verifying the accuracy and quality of your data, which is usually done before importing and processing it. It can also be thought of as a form of data purification. Data validation ensures that your data is complete (no blank or null values), unique (consisting of distinct values that aren't repeated), and that the range of values is as expected.

Data validation is frequently used as part of ETL (Extract, Transform, and Load) operations, which move data from a source database to a target data warehouse so that it can be joined with other data for analysis. When you undertake analysis, data validation ensures that your results are accurate.

 

Verification Main Methods

 
  • Double-entry - Entering the information twice and comparing the two versions is known as double entry. This practically doubles the workload, and because most individuals are paid by hour, it also increases the expense.

  • Proofreading data - entails someone double-checking the information entered against the original document. It's also time-consuming and expensive.

 

Types of Validation

Data Type Check

A data type check verifies that the information entered is of the appropriate data type. A field, for example, might only take numeric input. If this is frequently the case, the system should reject any data including additional characters such as letters or special symbols.

Code Check

A code check verifies that a field is selected from a valid list of options or that formatting requirements are followed. It's easy to check if a zip code is valid by comparing it to a list of valid codes, for example. Other things, such as country codes and NAICS industry codes, are frequently used in the same way.

Range Check

A range check verifies that the input file is inside a specified range. In geographic data, for example, latitude and longitude are frequently used. A latitude number should be between -90 and 90 degrees latitude, and a longitude value should be between -180 and 180 degrees longitude. Any values that fall outside of this range are considered invalid.

Format Check

Many data types have a predetermined format. Date columns recorded in a set format such as "YYYY-MM-DD" or "DD-MM-YYYYY" are a typical use case. A knowledge validation mechanism that ensures dates are in the appropriate format helps preserve consistency across data and across time.

Consistency Check

A consistency check is a type of logical check that ensures data is entered in a logically consistent manner. Checking if a parcel's delivery date is later than the shipment date is an example.

Uniqueness Check

Some data, such as IDs and e-mail addresses, are naturally unique. These fields in a database should almost certainly contain unique entries. A uniqueness check ensures that an item isn't entered into a database numerous times.

    Author: Hansi

Title
mLife