Pages

4 September 2016

DWH - Data Validation Testing Using Minus Query Method

Data Validation in Data Warehouse has always been challenging. One of the most frequently way of data validation is by using MINUS/ EXCEPT operator. MINUS operator is used in ORACLE database while EXCEPT is used in SQL SERVER database. Before proceeding to testing part, first let us understand the functionality of MINUS operator.

      MINUS operator gives the differences between the two table. Suppose we have two tables. Lets consider table A and B. When we will do MINUS operation between table A and table B (ie. A - B), it will return all those records which are present in table A but not in B. Similarly when we do  MINUS operation between table B and table A (ie. B - A), it will return all those records which are present in table B but not in A.

       One point to remember while doing MINUS operation between first table and second table is that every row in first table is searched in second table and if not found , that row is returned as a result set. Similarly, while doing MINUS operation between second table and first table is that every row in second table is searched in first table and if not found , that row is returned as a result set. 

        If all the rows  present in first table is present in second table as well then output or result of first table MINUS second table will be empty result set or in other words nothing will be returned.  Similarly, if all the rows present in second table is also present first table, then output or result of second table MINUS first table will be empty result set or in other words nothing will be returned.
Let us understand the usage of MINUS operator in data validation. Below SourceTable and TargetTable is used for illustration. 


     SourceTable                                      TargetTable
ID
Name
ID
Name
1
ABC
1
ABC
2
DEF
2
DEF
3
GHI
3
GHI
4
JKL
4
JKL
5
LMN
5
LMN
6
OPQ
8
OPQ


SourceTable Minus TargetTable:

 DWH - Data Validation Testing Using Minus Query Method

SELECT ID, NAME
FROM SourceTable
MINUS
SELECT ID, NAME
FROM TargetTable;
 
ID
Name
6
OPQ

In the above query, we are applying MINUS operator between SourceTable and Target Table. 6th row data in SourceTable is not present in TargetTable so this record will be returned from SourceTable.




TargetTable Minus SourceTable :


SELECT ID, NAME
FROM TargetTable
MINUS
SELECT ID, NAME
FROM SourceTable;

ID
Name
8
OPQ

In the above query, we are applying MINUS operator between TargetTable and SourceTable. 6th row data in TargetTable is not present in SourceTable so this record will be returned fromTargetTable.



Now let us consider  below SourceTable and TargetTable  for illustration. Here both SourceTable and Target Table is same.


    Source Table                           TargetTable
ID
Name
ID
Name
1
ABC
1
ABC
2
DEF
2
DEF
3
GHI
3
GHI
4
JKL
4
JKL
5
LMN
5
LMN
6
OPQ
6
OPQ
SourceTable Minus TargetTable:

SELECT ID, NAME
FROM SourceTable
MINUS
SELECT ID, NAME
FROM TargetTable;

ID
Name

In the above query, we are applying MINUS operator between SourceTable and Target Table.Each row data in SourceTable is present in TargetTable so empty result set is returned. 





TargetTable Minus SourceTable :


SELECT ID, NAME
FROM TargetTable
MINUS
SELECT ID, NAME
FROM SourceTable;


ID
Name

In the above query, we are applying MINUS operator between TargetTable and SourceTable. Each row data in TargetTable is present in SourceTable so empty result set is returned. 


CONCLUSION: For Data validation to PASS,
(a.) Source Table MINUS Target Table should always return empty result set.
(b.) Target Table  MINUS Target Table should always return empty result set.




If you like this post, don't forget to share and post comment.


3 comments: