
Figure 1: Data flow
Extract, Transform, and Load
In computing, extract, transform and load (ETL) refers to a process in database usage and especially in data warehousing that involves: 1) extracting data from outside sources 2) transforming it to fit operational needs (which can include quality levels) 3) loading it into the end target (the database or, more specifically, operational data store, data mart, or data warehouse). [1]
Begin by affirming that the level of validation from a business's source systems is done in a consistent manner. Investigate each source system's business data and understand the relationships (meta-data management) across the enterprise. Therein lies the root cause for potential data problems. Examine out-bound data files mapping. Ask questions. What were the data relationships, and what business rules were applied? How and where were they applied during the ETL process? Did the tests cover stored procedures and functions, triggers, indexing, field size, field type, and formats? Did the data marts or data extracts need to include referential integrity tests to insure the reliability and integrity of the data?
Ensure that a common data dictionary and definition is in place to facilitate certification efforts. Be concerned about legacy data, as it is likely that the application has undergone many transformations throughout its life. With new applications or business rules, data gets appended and enhanced and might not resemble its original state. Look for the originating event, which could be embedded in the ETL process. These data anomalies should be clearly vetted across the organization and mitigated through a common approach.






