What is dimension?
A dimension is something that qualifies a
quantity (measure). For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.
Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.
What are different types of dimentions?
- Slowly changing Dimention
- Confirmed Dimention
- Junk Dimention
- Degenerated Dimention
Slowly Changing Dimensions (SCD) - dimensions that change slowly
over time, rather than changing on regular schedule, time-base. In Data
Warehouse there is a need to track changes in dimension attributes in
order to report historical data. In other words, implementing one of the
SCD types should enable users assigning proper dimension's attribute
value for given date. Example of such dimensions could be: customer,
geography, employee.
There are many approaches how to deal with SCD. The most popular are:
Type 1 - Overwriting the old value. In this method no history of
dimension changes is kept in the database. The old dimension value is
simply overwritten be the new one. This type is easy to maintain and is
often use for data which changes are caused by processing
corrections(e.g. removal special characters, correcting spelling
errors).
Before the change:
After the change:
Before the change:
Customer_ID | Customer_Name | Customer_Type |
---|---|---|
1 | Cust_1 | Corporate |
After the change:
Customer_ID | Customer_Name | Customer_Type |
---|---|---|
1 | Cust_1 | Retail |
Type 2 - Creating a new additional record. In this methodology
all history of dimension changes is kept in the database. You capture
attribute change by adding a new row with a new surrogate key to the
dimension table. Both the prior and new rows contain as attributes the
natural key(or other durable identifier). Also 'effective date' and
'current indicator' columns are used in this method. There could be only
one record with current indicator set to 'Y'. For 'effective date'
columns, i.e. start_date and end_date, the end_date for current record
usually is set to value 9999-12-31. Introducing changes to the
dimensional model in type 2 could be very expensive database operation
so it is not recommended to use it in dimensions where a new attribute
could be added in the future.
Before the change:
After the change:
Before the change:
Customer_ID | Customer_Name | Customer_Type | Start_Date | End_Date | Current_Flag |
---|---|---|---|---|---|
1 | Cust_1 | Corporate | 22-07-2010 | 31-12-9999 | Y |
After the change:
Customer_ID | Customer_Name | Customer_Type | Start_Date | End_Date | Current_Flag |
---|---|---|---|---|---|
1 | Cust_1 | Corporate | 22-07-2010 | 17-05-2012 | N |
2 | Cust_1 | Retail | 18-05-2012 | 31-12-9999 | Y |
Type 3 - Adding a new column. In this type usually only the
current and previous value of dimension is kept in the database. The new
value is loaded into 'current/new' column and the old one into
'old/previous' column. Generally speaking the history is limited to the
number of column created for storing historical data. This is the least
commonly needed techinque.
Before the change:
After the change:
Before the change:
Customer_ID | Customer_Name | Current_Type | Previous_Type |
---|---|---|---|
1 | Cust_1 | Corporate | Corporate |
After the change:
Customer_ID | Customer_Name | Current_Type | Previous_Type |
---|---|---|---|
1 | Cust_1 | Retail | Corporate |
Confirmed Dimention:
A
dimension that is used in multiple locations is called a conformed dimension. A
conformed dimension may be used with multiple fact tables in a single database,
or across multiple data marts or data warehouses.
Eg: The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.
Junk Dimention
A
junk dimension is a single table with a combination of different and unrelated
attributes to avoid having a large number of foreign keys in the fact table.
Junk dimensions are often created to manage the foreign keys created by rapidly
changing dimensions.
Eg: Assume that we have a gender dimension and marital status dimension.
In the fact table we need to maintain two keys referring to these
dimensions. Instead of that create a junk dimension which has all the
combinations of gender and marital status (cross join gender and marital
status table and create a junk table). Now we can maintain only one key
in the fact table.
Degenerated Dimention:
A
degenerate dimension is when the dimension attribute is stored as part of fact
table, and not in a separate dimension table. These are essentially dimension
keys for which there are no other attributes. In a data warehouse, these are
often used as the result of a drill through query to analyze the source of an
aggregated number in a report. You can use these values to trace back to
transactions in the OLTP system.
Facebook Commentbox
Eg: A transactional code in a fact table.
Data Mart Datawearhousing Schemas Dimention Data Modeling Dimention Fact OLAP vs OLTP
No comments:
Post a Comment