Thursday 19 February 2015

Surrogate key in Datawearhouse with example

Surrogate key in Datawearhouse with example

What is Surrogate Key?


  • Surrogate Keys are integers that are assigned sequentially in the dimension table which can be used as PK.
  • Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, because the production key may be alphanumeric or composite key but the surrogate key is always single numeric key. 
  • Assume the production key is an alphanumeric field if you create an index for this fields it will occupy more space, so it is not advisable to join/index, because generally all theDatawearhouse fact table are having historical data. 
  • These fact Table are linked with so many dimension table. if it's a numerical fields the performance is high
  • Surrogate key is the primary key for the Dimensional table.
  • It’s a substitution for the natural primary key.
  • It is just a unique identifier or number for each row that can be used for the primary key to the table. 
  • The only requirement for a surrogate primary key is that it is unique for each row in the table.
  • Data warehouses typically use a surrogate, (also known as artificial or identity key)  key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.
  • It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Example:

          Let us consider an scenarios where you have designed a very good Data Warehouse , its catering all your reporting need and its in production. After two years organization decides to reuse there business keys of products

e.g.
         There was product called Baby Powder after two years organization decides to stop selling at its cost is high and sell is low and instead Baby Powder they launched a Talcum Powder for Men and want to give same key to the product say 336.
         Now while designing a Data Warehouse you have used Business Keys as Primary Key in dimension table with this new change you will have to update the dimension table to replace Baby Powder with talcum Powder and organization does want to remove all data of Baby Powder, what will you do now?
        So to avoid such situation its always better to use Surrogate Keys as Primary Key in dimension table along with Business Key.

e.g.
SK_Prouduct
Product_ID
Product Name
Cost
Acive
1
336
Baby Powder
444
N
2
345
Cream
34
Y
3
336
talkem Powder
44
Y


Now with above table you can use same code for another Product, mark the product as active and inactive and maintain all the product in Data Warehouse

Advantages of Surrogate Key:
  • Surrogate Key allow to cater all your data need that arises from the Business or operational changes and recycling.
  • Surrogate Keys allow the data warehouse to integrate data from all sources if they lack natural Business keys.
  • Surrogate keys are very helpful for ETL transformations.

Facebook Commentbox



7 comments:

subscribe
Subscribe Us
emailSubscribe to our mailing list to get the updates to your email inbox... We can't wait more to have your email in our subscribers email list. Just put your nice email in below box: