Business Intelligence and Data Quality


Figure 1: Data flow

Extract, Transform, and Load
In computing, extract, transform and load (ETL) refers to a process in database usage and especially in data warehousing that involves: 1) extracting data from outside sources 2) transforming it to fit operational needs (which can include quality levels) 3) loading it into the end target (the database or, more specifically, operational data store, data mart, or data warehouse). [1]

Begin by affirming that the level of validation from a business's source systems is done in a consistent manner. Investigate each source system's business data and understand the relationships (meta-data management) across the enterprise. Therein lies the root cause for potential data problems. Examine out-bound data files mapping. Ask questions. What were the data relationships, and what business rules were applied? How and where were they applied during the ETL process? Did the tests cover stored procedures and functions, triggers, indexing, field size, field type, and formats? Did the data marts or data extracts need to include referential integrity tests to insure the reliability and integrity of the data?

Ensure that a common data dictionary and definition is in place to facilitate certification efforts. Be concerned about legacy data, as it is likely that the application has undergone many transformations throughout its life. With new applications or business rules, data gets appended and enhanced and might not resemble its original state. Look for the originating event, which could be embedded in the ETL process. These data anomalies should be clearly vetted across the organization and mitigated through a common approach.

About the author

Paul Fratellone's picture
Paul Fratellone

Paul Fratellone is program director of quality and test consulting in the testing business unit of MindTree. Paul’s career of more than twenty-five years in information technology has been concentrated in testing, compliance, and quality management. He strives to achieve consistent execution to attain a predictable level of quality that is commensurate with the investment and enables leadership to objectively measure the success and continuous benefits from these investments.