What is Dimention

What is dimension?

A dimension is something that qualifies a quantity (measure).
For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.

Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. 

What are different types of dimentions?

  • Slowly changing Dimention
  • Confirmed Dimention
  • Junk Dimention
  • Degenerated Dimention
Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base. In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data. In other words, implementing one of the SCD types should enable users assigning proper dimension's attribute value for given date. Example of such dimensions could be: customer, geography, employee.

There are many approaches how to deal with SCD. The most popular are:


  • Type 1 - Overwriting the old value
  • Type 2 - Creating a new additional record
  • Type 3 - Adding a new column

  • Type 1 - Overwriting the old value. In this method no history of dimension changes is kept in the database. The old dimension value is simply overwritten be the new one. This type is easy to maintain and is often use for data which changes are caused by processing corrections(e.g. removal special characters, correcting spelling errors).

    Before the change:
    Customer_ID Customer_Name Customer_Type
    1 Cust_1 Corporate


    After the change:
    Customer_ID Customer_Name Customer_Type
    1 Cust_1 Retail

    Type 2 - Creating a new additional record. In this methodology all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifier). Also 'effective date' and 'current indicator' columns are used in this method. There could be only one record with current indicator set to 'Y'. For 'effective date' columns, i.e. start_date and end_date, the end_date for current record usually is set to value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.

    Before the change:
    Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
    1 Cust_1 Corporate 22-07-2010 31-12-9999 Y


    After the change:
    Customer_ID Customer_Name Customer_Type Start_Date End_Date Current_Flag
    1 Cust_1 Corporate 22-07-2010 17-05-2012 N
    2 Cust_1 Retail 18-05-2012 31-12-9999 Y

    Type 3 - Adding a new column. In this type usually only the current and previous value of dimension is kept in the database. The new value is loaded into 'current/new' column and the old one into 'old/previous' column. Generally speaking the history is limited to the number of column created for storing historical data. This is the least commonly needed techinque.

    Before the change:
    Customer_ID Customer_Name Current_Type Previous_Type
    1 Cust_1 Corporate Corporate


    After the change:
    Customer_ID Customer_Name Current_Type Previous_Type
    1 Cust_1 Retail Corporate

    Confirmed Dimention:
    A dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

    Eg: The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.

    Junk Dimention
    A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by rapidly changing dimensions.

    Eg: Assume that we have a gender dimension and marital status dimension. In the fact table we need to maintain two keys referring to these dimensions. Instead of that create a junk dimension which has all the combinations of gender and marital status (cross join gender and marital status table and create a junk table). Now we can maintain only one key in the fact table.

    Degenerated Dimention:
    A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.

    Facebook Commentbox

    Eg: A transactional code in a fact table.

    Datawearhouse concepts     Data Warehouse Architectures   Datawearhouse Architecture Overview
    Data Mart   Datawearhousing Schemas  Dimention Data Modeling  Dimention  Fact  OLAP vs OLTP

    No comments:

    Post a Comment

    subscribe
    Subscribe Us
    emailSubscribe to our mailing list to get the updates to your email inbox... We can't wait more to have your email in our subscribers email list. Just put your nice email in below box: