What is Surrogate Key?
- Surrogate Keys are integers that are assigned sequentially in the dimension table which can be used as PK.
- Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, because the production key may be alphanumeric or composite key but the surrogate key is always single numeric key.
- Assume the production key is an alphanumeric field if you create an index for this fields it will occupy more space, so it is not advisable to join/index, because generally all theDatawearhouse fact table are having historical data.
- These fact Table are linked with so many dimension table. if it's a numerical fields the performance is high
- Surrogate key is the primary key for the Dimensional table.
- It’s a substitution for the natural primary key.
- It is just a unique identifier or number for each row that can be used for the primary key to the table.
- The only requirement for a surrogate primary key is that it is unique for each row in the table.
- Data warehouses typically use a surrogate, (also known as artificial or identity key) key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.
- It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.
Example:
Let us
consider an scenarios where you have designed a very good Data Warehouse , its
catering all your reporting need and its in production. After two years
organization decides to reuse there business keys of products
e.g.
There
was product called Baby Powder after two years organization decides to stop
selling at its cost is high and sell is low and instead Baby Powder they launched
a Talcum Powder for Men and want to give same key to the product say 336.
Now while
designing a Data Warehouse you have used Business Keys as Primary Key in
dimension table with this new change you will have to update the dimension
table to replace Baby Powder with talcum Powder and organization does want to
remove all data of Baby Powder, what will you do now?
So to
avoid such situation its always better to use Surrogate Keys as Primary Key in
dimension table along with Business Key.
e.g.
SK_Prouduct
|
Product_ID
|
Product
Name
|
Cost
|
Acive
|
1
|
336
|
Baby
Powder
|
444
|
N
|
2
|
345
|
Cream
|
34
|
Y
|
3
|
336
|
talkem
Powder
|
44
|
Y
|
Now with
above table you can use same code for another Product, mark the product as
active and inactive and maintain all the product in Data Warehouse
Advantages of Surrogate Key:
- Surrogate Key allow to cater all your data need that arises from the Business or operational changes and recycling.
- Surrogate Keys allow the data warehouse to integrate data from all sources if they lack natural Business keys.
- Surrogate keys are very helpful for ETL transformations.
