Select Page

Data quality testing is the process of checking whether data meets defined standards for accuracy, completeness, consistency, validity, uniqueness, and timeliness. It uses structured rules and checks to find errors, gaps, and inconsistencies before data reaches reports, decisions, or automated systems. Organizations that skip it pay a measurable price. Gartner research estimates poor data quality costs organizations an average of $12.9 million per year.

Key Takeaways

  • Data quality testing ensures data meets standards of accuracy, completeness, consistency, validity, uniqueness, and timeliness.
  • Organizations face significant financial losses from poor data quality, averaging $12.9 million per year according to Gartner.
  • The testing process involves three key moments: at data ingestion, during transformation, and on stored data.
  • There are six core dimensions of data quality testing: accuracy, completeness, consistency, validity, uniqueness, and timeliness.
  • Data quality testing is essential for compliance with regulations like GDPR and CCPA, which require maintaining accurate personal data.

What Is Data Quality Testing?

Data quality testing is a structured evaluation process. You apply defined rules to a dataset. Those rules check whether the data meets the standards your organization requires.

Secoda defines data quality checks as “evaluations that measure metrics related to data quality and integrity. These checks involve identifying duplicate data, checking for mandatory fields, null values, and missing values, applying formatting checks for consistency, and verifying the recency of data.”

The goal is straightforward. You want to know whether your data is fit for its intended purpose. Data that works for one use case may fail for another. Testing tells you where the gaps are before they cause problems.

Data quality testing happens at three key moments:

  1. At data ingestion: before data enters a system or warehouse
  2. During data transformation: as data moves between systems or is processed
  3. On stored data: in regular audits of data already in production

Testing at ingestion is the most cost-effective approach. The longer a data error persists, the more expensive it becomes to fix.

Kevin Clay

Public, Onsite, Virtual, and Online Six Sigma Certification Training!

  • We are accredited by the IASSC.
  • Live Public Training at 52 Sites.
  • Live Virtual Training.
  • Onsite Training (at your organization).
  • Interactive Online (self-paced) training,

Why Data Quality Testing Matters: The Evidence

The financial case for data quality testing is documented and consistent.

Gartner research estimates that poor data quality costs organizations an average of $12.9 million per year (cited by Cambridge Spark, 2025, and integrate.io, 2026). IBM found that US businesses lose $3.1 trillion annually due to poor data quality (cited by IBM and Anodot, citing the IBM study).

The IBM Institute for Business Value’s 2025 CDO Study, which surveyed 1,700 chief data officers across 27 countries and 19 industries, found that 43% of chief operations officers identify data quality as their most significant data priority. Over a quarter of organizations estimate losses exceeding $5 million annually from poor data quality. Seven percent report losses of $25 million or more.

IBM’s published article on the cost of poor data quality (January 2026) identifies specific downstream failures:

  • Dashboards and business intelligence tools produce misleading guidance when inaccurate data underlies them.
  • Machine learning models learn from their training data. Inaccurate, biased, or inconsistent data produces flawed model outputs.
  • Leaders misjudge performance, misprice offerings, and pursue initiatives based on incorrect assumptions.

IBM states directly: “Poor data quality often goes unnoticed because its impact rarely appears at the point of failure. Instead, it surfaces downstream as lost revenue, inefficiencies, compliance risks, and missed opportunities.” (Source: IBM Think, January 2026)

Data quality testing prevents those downstream failures. It catches errors at the source.

Also Read: What Are Data Quality Dimensions? The 6 Core Dimensions Explained

The Six Dimensions Data Quality Testing Covers

The_5_Pillars_of_Data_Quality
The 5 Pillars of Data Quality

Data quality testing checks data against six core dimensions. These dimensions were formally established in academic research. Professors Richard Y. Wang and Diane M. Strong defined the foundational framework in their 1996 paper “Beyond Accuracy: What Data Quality Means to Data Consumers,” published in the Journal of Management Information Systems. IBM confirms the six dimensions as accuracy, completeness, consistency, timeliness, validity, and uniqueness.

Each dimension represents a different type of failure. Each requires a different type of test.

1. Accuracy

Accuracy testing checks whether data values correctly represent real-world facts.

A customer phone number that no longer belongs to that customer is an accuracy failure. A product price that does not match the current catalog is an accuracy failure.

Testing method: Compare data values against a verified reference source. Calculate the percentage of matching values.

2. Completeness

Completeness testing checks whether all required data is present.

A record missing a required field is incomplete. A dataset with a high null rate in a critical column is incomplete.

Testing method: Count the percentage of non-null values in each required field. Set acceptable thresholds per field based on business rules.

3. Consistency

Consistency testing checks whether the same data has the same value across every system where it appears.

A customer address stored differently in the CRM and the billing system is a consistency failure. Date fields stored as different formats across two databases is a consistency failure.

Testing method: Compare values for the same data element across different systems. Track mismatches and calculate a consistency rate.

4. Validity

Validity testing checks whether data conforms to defined formats, types, and business rules.

A phone number field containing letters is invalid. An order date that falls in the future for a historical record is invalid.

Testing method: Apply predefined business rules to each field. Calculate the percentage of records that pass each rule.

5. Uniqueness

Uniqueness testing checks whether each real-world entity appears exactly once in the dataset.

A customer record that appears twice under slightly different names is a uniqueness failure. Duplicate transaction records inflate totals and skew every metric they touch.

Testing method: Run duplicate detection checks on key fields and record identifiers. Use fuzzy matching for near-duplicates.

6. Timeliness

Timeliness testing checks whether data is current and available when it is needed.

A customer record not updated after a known address change fails the timeliness test. A dashboard refreshed daily when decisions require hourly data fails the timeliness test.

Testing method: Track the lag between a real-world event and when data reflects it. Compare data refresh times against defined service level thresholds.

Types of Data Quality Tests

Data quality testing includes several distinct test types. Each type targets a specific problem category.

Completeness tests check for null values, missing records, and unpopulated required fields.

Format tests verify that values follow required formatting rules, such as date formats, postal code structures, or phone number patterns.

Range tests confirm that numeric values fall within acceptable minimum and maximum bounds. An age field showing a value of 200 fails a range test.

Referential integrity tests verify that relationships between tables hold. Every order record should reference a valid customer ID. Every product code should match an entry in the product catalog.

Duplicate detection tests identify records where the same entity appears more than once, either as exact duplicates or near-duplicates with slight variations.

Cross-system consistency tests compare values for the same data element across two or more systems and flag mismatches.

Business rule tests apply organization-specific logic. Examples include: invoice amounts must be greater than zero, customer age must be 18 or older for a specific product category, or a status field can only contain values from a defined list.

Also Read: Data Quality Management: How to Fix Bad Data Using Six Sigma

How to Run Data Quality Testing: A Step-by-Step Process

Data_Quality_Testing_Steps
Data Quality Testing Steps

The following six steps describe a structured approach to data quality testing.

Step 1: Define your data quality standards.

Start with the business purpose of the data. What decisions does it support? What processes depend on it? Define the acceptable threshold for each dimension in that context. A financial reporting dataset may require 99% accuracy. An internal log file may tolerate a lower threshold.

Step 2: Profile the data.

Data profiling gives you a baseline picture of your current data. Run profiling checks to understand the current null rates, value distributions, format patterns, and duplicate rates. This step shows you where problems exist before you write any rules.

Step 3: Define test rules for each dimension.

Translate your quality standards into specific, testable rules. For accuracy, identify your reference source. For validity, document every business rule that applies to each field. For uniqueness, define the key fields used to identify duplicate records.

Step 4: Execute the tests.

Run each test against the dataset. Record which records pass and which fail each rule. Calculate the pass rate for each dimension.

Step 5: Document and prioritize failures.

Not all failures carry equal weight. A null rate of 2% in a rarely-used descriptive field is lower priority than a 5% null rate in a primary key field. Prioritize failures by business impact. Fix the most consequential issues first.

Step 6: Monitor continuously.

Data quality degrades over time. New sources, system updates, and business changes all introduce new quality problems. Automate monitoring where possible. Set alerts when quality metrics fall below defined thresholds. Review results on a defined schedule.

Data Quality Testing in Six Sigma’s Measure Phase

Data quality testing connects directly to the Measure phase of the DMAIC framework.

In Six Sigma, the Measure phase has one primary job. You collect reliable data on the current state of the process. That data drives everything that follows. If the data is flawed, your root cause analysis in the Analyze phase is unreliable. Your improvement decisions in the Improve phase are based on wrong assumptions. Your Control phase monitoring uses a flawed baseline.

The Measure phase includes Measurement System Analysis (MSA). MSA evaluates whether your measurement process itself is reliable. Data quality testing and MSA work toward the same goal. Both confirm that the data you are working with accurately represents the process it is supposed to measure.

Six Sigma practitioners apply data quality testing specifically to:

  • Validate that data collected from process observations is accurate and complete
  • Confirm that historical process data is consistent across the time period being analyzed
  • Identify and remove duplicate records before calculating baseline metrics
  • Verify that all required measurement fields are populated before statistical analysis begins

Garbage in, garbage out. No statistical tool in Six Sigma compensates for poor input data quality.

Data Quality Testing and Regulatory Compliance

Data quality testing is not just a performance issue. It is a compliance requirement in many industries.

Regulations including GDPR (General Data Protection Regulation) and CCPA (California Consumer Privacy Act) require organizations to maintain accurate and secure personal data. Inaccurate data held about individuals can constitute a compliance failure under these frameworks.

IBM notes that regulatory requirements “impose stricter data governance practices, heightening the complexity of managing data quality.” Financial services, healthcare, and pharmaceutical industries face additional sector-specific requirements that make data quality testing a legal obligation.

The cost of a data breach, which poor data governance can worsen, reached an average of $4.88 million per incident according to IBM research cited by integrate.io (2026). Data quality testing reduces the risk of compliance failures that contribute to those costs.

FAQ: What Is Data Quality Testing?

What is the purpose of data quality testing?

Data quality testing evaluates whether data meets defined standards for accuracy, completeness, consistency, validity, uniqueness, and timeliness. Its purpose is to catch data errors before they reach reports, analytical models, or business decisions. Poor data quality costs organizations an average of $12.9 million per year, according to Gartner research. Testing prevents those costs by identifying failures at the source.

What are the main types of data quality tests?

The main types include completeness tests (checking for missing values and required fields), format tests (verifying correct data formats), range tests (confirming numeric values fall within acceptable limits), referential integrity tests (verifying relationships between tables), duplicate detection tests (identifying repeated records), cross-system consistency tests (comparing values across systems), and business rule tests (applying organization-specific logic to data fields).

What are the six dimensions of data quality?

The six core dimensions of data quality are accuracy, completeness, consistency, timeliness, validity, and uniqueness. This framework was formally established by Professors Richard Y. Wang and Diane M. Strong in their 1996 paper “Beyond Accuracy: What Data Quality Means to Data Consumers.” IBM confirms these six dimensions as the standard for evaluating data quality across organizations.

How does data quality testing connect to Six Sigma?

In Six Sigma, the Measure phase requires reliable data on the current state of a process. Data quality testing validates that the data collected is accurate, complete, and consistent before analysis begins. Poor input data makes root cause analysis unreliable and improvement decisions unsound. Six Sigma practitioners use data quality testing and Measurement System Analysis together to confirm that their data accurately reflects the process being improved.

How often should organizations run data quality tests?

It depends on how fast the data changes and how critical it is to business operations. High-velocity data sources, such as transactional systems, benefit from automated checks at ingestion and continuous monitoring. Strategic datasets used for periodic reporting should be audited at least quarterly. OvalEdge recommends running a baseline data quality assessment at least quarterly for standard datasets and embedding automated checks for high-velocity feeds.

What does it cost to ignore data quality testing?

Gartner’s cross-industry research estimates that poor data quality costs organizations an average of $12.9 million per year in combined losses from operational inefficiencies, compliance risks, and flawed decision-making.

IBM’s 2025 CDO Study found that over a quarter of organizations lose more than $5 million annually due to poor data quality, with 7% reporting losses of $25 million or more. IBM’s published research also notes that by the time a quality issue reaches a boardroom dashboard, fixing it can cost 100 times more than catching it at ingestion.

How SSDSI Teaches Data Quality in Six Sigma Training

At Six Sigma Development Solutions Inc. we teach data quality testing as a core skill in our Green Belt and Black Belt programs.

Our Measure phase content covers data collection planning, measurement system analysis, and data validation techniques. You learn how to define data quality standards, run baseline data profiles, and identify measurement system failures before they contaminate your analysis.

We deliver training in three formats:

Onsite training: We come to your location. Your team trains together with a live instructor and works on real datasets or representative process data.

Live virtual training: Instructor-led training in real time over five days. You get the same analytical exercises as the onsite format through a virtual classroom environment.

Online self-paced training: Work through the content on your own schedule. All exercises and reference tools are included.

Every format prepares you for the IASSC certification exam. SSDSI is an IASSC Accredited Training Organization.

Data quality testing is not a data engineering topic alone. It is a process improvement skill. Every Six Sigma project depends on it.

Ready to build your data quality skills in Six Sigma?

Explore SSDSI’s Green Belt and Black Belt programs in onsite, live virtual, or online formats.

About Six Sigma Development Solutions, Inc.

Six Sigma Development Solutions, Inc. offers onsite, public, and virtual Lean Six Sigma certification training. We are an Accredited Training Organization by the IASSC (International Association of Six Sigma Certification). We offer Lean Six Sigma Green Belt, Black Belt, and Yellow Belt, as well as LEAN certifications.

Book a Call and Let us know how we can help meet your training needs.