ETL or Data Warehouse Testing Concepts
What is ETL TESTING?
ETL or Extract-Transform-Load defines the mechanism of data flow from a system to the data warehouse. Here, the data to be extracted must match the data warehouse schema before loading into the database. So, the data are first extracted from the OLTP database and transformed according to the data warehouse schema and then loaded into the data warehouse. However, the data could also be from a non-OLTP source.
The ETL concept simplifies the day to day data handling process. For example, there are different departments within an organization, say a retail chain, which handles data differently in different formats. Now, to sync all the departments, say sales, marketing, logistics, etc. and get the information related to a particular customer is quite tedious.
Datawarehouse simplifies the process of storing information from different sources in the form of ETL.
The ETL process has a typical road map as shown below.
- Extraction: Includes extraction of data from the source
- Transformation: Includes rule to set data in the form of standard schema
- Loading: Includes loading of data for use
- Processing: Includes data for browsing.
ETL Testing Concepts
ETL testing concepts ensure the accuracy of data that has been transformed from the source to the destination. Here, the data are verified in the intermediate steps between source and destination.
A typical process of ETL testing goes through multiple phases. The different phases of ETL testing are mentioned below.
- Business and Requirement Understanding
- Test Planning and Estimation
- Designing Test Cases and Preparing Test Data
- Test Execution and Bug Reporting
- Summary and Result analysis
- Test Closure
Different Types of ETL Testing
- Product Testing
Product validation testing or table balancing is done on data that are getting moved into the production. This kind of ETL testing ensures that the data are not compromised in the production systems and is in the correct order for the business decisions.
- Validation Testing
This kind of testing is done to ensure that the transformed data values are the expected one.
- Application Upgrades
This is an automatically generated ETL testing. This type of ETL testing checks whether the data in a new application is exactly same as the data extracted from an older application.
- Metadata Testing
Metadata testing includes checking of data type, data length, and index/constraint.
- Data Completeness Testing
Once all the data are loaded in the target, data completeness testing can verify that no data is missed. Data completeness testing includes compare and validates counts.
- Testing for Data Accuracy
This testing is done to ensure the accuracy of loaded and transformed data.
- Data Transformation Testing
Data transformation testing is done to compensate the shortcomings of SQL query, as it alone cannot verify the transformation rules.
- Data Quality Testing
Syntax and reference tests are included in data quality testing. This ensures that there is no error in date or order number in the business process. Syntax test will identify dirty data and Reference test will check the data based on the data model.
- Incremental ETL Testing
This testing checks the data integrity of old data and the newly added data and verifies that the addition of any new data is getting processed properly.
- Navigation Testing
This kind of testing is done to check the interface aspects of the front end reports.
Creating ETL Test Case
ETL testing involves ensuring that the data has been loaded from source to destination accurately as well as verification has been done in the intermediate stages. There are two documents that are required for performing ETL testing.
- ETL mapping sheets
- DB Schema of source and Target
ETL Performance Testing
ETL performance testing confirms that an ETL system can handle multiple users and transactions. This not only improves the performance of the session but also removes any performance bottlenecks. There are various areas, like source databases, target databases, sessions, system, etc. where the bottlenecks are found, which when removed can improve the performance. In fact, you can use performance testing tools like Informatica.
ETL Test Automation
A traditional approach to ETL testing involves SQL scripting and data ‘eyeballing’. However, this approach hardly covers the entire test and is often time-consuming and error-prone. In order to reduce the cost and improve defect detection, automation is required with the help of tools like Informatica.
Some of the best practices for ETL testing
- Ensure that the data has been transformed properly.
- Data should be loaded into the data warehouse without any data loss.
- Ensure that the ETL application reports invalid data.
- Confirm scalability and performance by ensuring that data is loaded within the given time frame.
- Irrespective of visibility, you must perform appropriate unit tests.
- Unit test must use valid coverage techniques.
- You must create unit tests that target exceptions.