Table of contents
Introduction
2
Quick Summary
4
When a Data Processing Pipeline Rejects Data
4
Correcting the Errors
5
Giving Business Users Access To Their Bad Data
6
There’s Gold in Data Error Management
7
Effective Handling of Errors
7
Dealing with Different Data Formats
8
Reconciliation
9
Auditing
10
Reporting
10
Conclusion
11
3
Data Integration
[email protected]
www.cloveretl.com
Quick Summary
In this paper we discuss the benefits of automated data processing pipelines designed
for error management. We outline tools and practices that enable business users
to effectively identify, correct and put bad data back into the processing pipeline.
This data correction loop coupled with performance analysis, audit tracks and smart
detection of anomalies eliminates unexpected downtimes, prevents data loss and
avoids delays in business operations.
When a Data Processing Pipeline
Rejects Data
IT systems are great at detecting issues, bringing invalid data to a user’s attention, but
often lack the capability to fix the problems. Whenever data enters an organization or
a certain business process, it’s pushed through an (ideally automated) data pipeline,
which moves the data through complex processing stages
and eventually ends up with a transformed data set or
Processes produce
sets that serve further business functions. That’s true
rejected data sets,
when everything is fine. What if there’s a problem with
listing problematic
the data? At worst, the whole process fails, leaving mess
records coupled with
and inconsistencies in the systems involved. At best, this
additional information
results in rejected data sets, listing records that couldn’t
describing the nature
be processed, augmented with additional information
describing the nature of the problem.
of problem.
Data can be rejected at any point of its journey through
the pipeline?—?typos or missing references during input validation; duplicates or
records violating business logic along the way; up to issues with pushing data to its
targets, like mismatch or change in data structures or congestions on network lines.
AUTOMATED DATA PROCESSING PIPELINE
Data Source
Input Validation
Logic Step
Logic Step
Output
Data Target
Rejected Data
Data Integration
4
[email protected]
www.cloveretl.com
Please complete the form to gain access to this content