Table of Contents Introduction Data Anonymization Challenges Removing explicit entities Data sampling Anonymization levels Semantic relationship Data Anonymization Requirements SOA Architecture Model Semantic dependencies analysis Data Anonymization Solutions CloverETL Infrastructure Banking System Anonymization Anonymization ETL Process Conclusion Acronyms and terms cloveretl.com [email protected] Introduction Production data covers an ideal use-case scenario for complex heterogeneous systems deployed in production environments in a certain time period. In today’s enterprise applications, use cases inherently stored in these systems are usually very complex. Complex Systems –...
Table of Contents
Introduction
Data Anonymization Challenges
Removing explicit entities
Data sampling
Anonymization levels
Semantic relationship
Data Anonymization Requirements
SOA Architecture Model
Semantic dependencies analysis
Data Anonymization Solutions
CloverETL Infrastructure
Banking System Anonymization
Anonymization ETL Process
Conclusion
Acronyms and terms
cloveretl.com
[email protected]
Introduction
Production data covers an ideal use-case scenario for complex heterogeneous systems deployed
in production environments in a certain time period. In today’s enterprise applications, use cases
inherently stored in these systems are usually very complex.
Complex Systems – Intricate use-cases
Complexity introduces a general test data issue: how to get test data for new releases and updates.
Unlike new system development, it’s necessary to pass plenty of tests, including functional tests,
for new and changed functionalities; regression tests for existing functionality; and especially load
and performance tests, ensuring a satisfying customer experience. Thus, finding enough reliable,
high quality data is often a nightmare for the majority of the enterprise systems test managers.
Synthetic Data – Not a Solution
The obvious approach of generating synthetic data often does not satisfy the stringent criteria
enterprise systems must meet, especially for regression and load-and-performance test needs.
As previously mentioned, real complex and heterogeneous production use cases usually go far
beyond the imagination of even the best senior business analyst, and they’re a common source of
potential production issues related to change management. Synthetic test data can only satisfy
small, isolated changes where regression and load-and-performance testing is not required.
Production Data for Testing?
Using production data for such complex testing seems like it would be the natural answer for most
project managers. However, a number of problems immediately arise with such an approach. The
most critical problems are privacy concerns and data security. Client and business process data
are part of a corporation’s most valuable assets. Thus, extending access for the testing team to
such data hugely increases overall security risks – revealing sensitive client information to unwanted
eyes and affecting related security costs and procedures. In some businesses, there’s an additional
impact on internal policy impacts too.
In banking environments particularly, for example, a lot of employees choose to have premium internal
banking accounts, as they often offer benefits, special interest rates, etc. Now suppose that such
production data were available to the project team. Test analysts would be able to peek at sensitive
information about colleagues’ wages, history of transactions, and more. If such information were
revealed among bank employees, it’d seriously threaten the overall HR corporate policy.
cloveretl.com
[email protected]
It’s no secret that data assets are increasing exponentially. With this dramatic growth in volume and complexity, the need to move, manipulate, and analyze data is taking center stage. Today’s imperative is to design a data workflow optimized...
An Unexpectedly Poetic Preface A data migration is like moving to a new home. Moving can get stressful quickly. While it does take careful planning, it also amounts to endless weeks of chaos (and boxes all over the place). You keep losing stuff...
Introduction If you caught our e-book Data Migration for Humans, it broke down key considerations when planning a migration, from enlisting the right people and technologies, to deeply discovering the data in your systems. If you haven’t yet, we...