Pages

30 January 2017

DWH - SLOWLY CHANGING DIMENSION (SCD)

When dimension changes over a period of time, it is called as Slowly Changing Dimension and it is very popularly known as SCD. It is used for tracking history of dimension attributes.There are mainly six types of SCD which are TYPE 0, TYPE 1, TYPE 2, TYPE 3, TYPE 4 and TYPE 6.

   EmployeeDeatil table mentioned below is used as an illustration, contains columns Emp_Key, EmployeeID, Name and Designation. Emp_key is a Surrogate Key.

TYPE 0 (Retains original value): This is a passive method. Value remains exactly the same as that of when inserted for the first time. This is not often used.

TYPE 1 (Overwriting old value): In this type, old value is replaced by a new value in the dimension table. Old data is lost in this process. 
           Let us assume Employee has been promoted from Test Engineer to Test Lead. So, Designation column has been replaced with new value as shown below. Thus we are not maintaining any history of changes that has taken place.

Old Value


Emp_Key
EmpID
Name
Designation
111
1001
Sunil
Test Engineer

New Value

Emp_Key
EmpID
Name
Designation
111
1001
Sunil
Test Lead


TYPE 2 (Adding  new row): In this type, a complete
history is maintained by adding extra row to the dimension table. Whenever a new row is added, a new surrogate key (Emp_key) is assigned. Type 2 can be implemented by following two ways.

    (a.) By adding Effective Date Columns i.e. Effective Start Date and Effective End Date .

          Let us assume Employee has been promoted from Test Engineer to Test Lead.
            Here all old records are marked with start date in 'Effective Start Date ' Column and end date in 'Effective End Date' Column. The most recent or active record will have 'Effective End Date ' as NULL or very high date i.e 31/12/9999

Old Value


Emp_Key
EmpID
Name
Designation
111
1001
Sunil
Test Engineer

New Value


Emp_Key
EmpID
Name
Year
Designation
Effective Start Date
Effective End Date
111
101
Sunil
2012
Test Engineer
1/1/2012
31/12/2015
112
101
Sunil
2016
Test Lead
1/1/2016
31/12/9999

  (b.) By adding Flag Column i.e. ActiveFlag

Let us assume Employee has been promoted from Test Engineer to Test Lead.
            Here all old records in the dimension table is marked with Flag =0 in 'ActiveFlag' column while one recent or active record is marked with Flag=1 in  'ActiveFlag' column as shown below.

Old Value


Emp_Key
EmpID
Name
Designation
111
1001
Sunil
Test Engineer

New Value


Emp_Key
EmpID
Name
Designation
ActiveFlag
111
101
Sunil
Test Engineer
0
112
101
Sunil
Test Lead
1

TYPE 3 (Adding  new columns): In this type history is maintained by adding attributes/columns to the dimension table. Here only current and previous values are retained as history. New value is provided in 'New' column while old value is stored in 'old' column.
             Let us assume Employee has been promoted from Test Engineer to Test Lead.
              You can see 'New Designation' column is provided with new values while and 'Old Designation' column is provided with old values.

Old Value


Emp_Key
EmpID
Name
Designation
111
1001
Sunil
Test Engineer

New Value



Emp_Key
EmpID
Name
Old Designation
New Designation
111
1001
Sunil
Test Engineer
Test Lead

TYPE 4 (Adding history Table): In this type, a new history table is added to keep track of all the changes of attributes in the dimension table. The main dimension table contains latest or recent records. 
            Let us assume Employee has been promoted from Test Engineer to Test Lead.

EmployeeDetail (Original Table)


Emp_Key
EmpID
Name
Designation
112
1001
Sunil
Test Lead

EmployeeDetail_History (History Table)


Emp_Key
EmpID
Name
Designation
Effective Start Date
Effective End Date
111
101
Sunil
Test Engineer
1/1/2012
31/12/2015
112
101
Sunil
Test Lead
1/1/2016
31/12/9999

TYPE 6 (Hybrid): It combines the approach or method used in TYPE 1, TYPE 2 and  TYPE 3 (1+2+3=6).

Example of EmployeeDetail Table with one record


Emp_Key
EmpID
Name
Current Value
Historical Value
Effective Start Date
Effective End Date
ActiveFlag
111
1001
Sunil
Test Engineer
Test Engineer
1/1/2012
31/12/9999
Y

Let us assume Employee gets promotion from Test Engineer to Test Lead , we add a new record, as  shown below.

Emp_Key
EmpID
Name
Current Value
Historical Value
Effective Start Date
Effective End Date
Current_Flag
111
1001
Sunil
Test Lead
Test Engineer
1/1/2012
31/12/2015
N
112
1001
Sunil
Test Lead
Test Lead
1/1/2016
31/12/9999
Y

Let us assume again Employee gets promotion from Test  Test Lead  to Manager, we add a new record, as shown below.

Emp_Key
EmpID
Name
Current Value
Historical Value
Effective Start Date
Effective End Date
Current_Flag
111
1001
Sunil
Manager
Test Engineer
1/1/2012
31/12/2015
N
112
1001
Sunil
Manager
Test Lead
1/1/2016
31/12/2016
N
113
1001
Sunil
Manager
Manager
1/1/2017
31/12/9999
Y

We see that Current_Flag value is overwritten in with the new information, as in Type 1 processing. New records are created  to track the changes, as in Type 2 processing. And we store the history in Historical Value column , which incorporates Type 3 processing.

1 comment: