methods of dataloading in datawearhouse

Data-loading in datawearhouse:

After the data has been cleansed and transformed into a structure consistent with the data warehouse requirements, data is ready for loading into the data warehouse. You may make some final transformation during the loading operation, although you should complete any transformations that could identify inconsistencies before the final loading operation.
In almost all data warehousing scenario, we extract data from one or more source systems and keep storing them in the data warehouse for future analysis. The source systems are generally OLTP systems which store everyday transactional data. Now when it comes to loading these transactional data to data warehouse, we have 2 ways to accomplish this,
  1. Incremental Data loading
  2. Full load method     


    1. Incremental Data loading:      
                                          
    Image result for incremental data load
    It generally means only loading into the warehouse the records that have changed (inserts, updates etc.) since the last load; as opposed to doing a full load of all the data (all records, including those that haven't changed since the last load) into the warehouse.
     Advantage :
    it reduces the amount of data being transferred from system to system, as a full load may take hours / days to complete depending on volume of data.
    Disadvantage: 
    Around maintainability. With a full load, if there's an error you can re-run the entire load without having to do much else in the way of cleanup / preparation. With an incremental load, the files generally need to be loaded in order. So if you have a problem with one batch, others queue up behind it till you correct it. Alternately you may find an error in a batch from a few days ago, and need to re-load that batch once corrected, followed by every subsequent batch in order to ensure that the data in the warehouse is consistent.

    2.Full Load method:
    Also known as Destructive Load, is a process of completely destroying/deleting the existing data and reloading it from scratch. A lot of unchanged data is also deleted and reloaded in this process. But a destructive load ensures the highest data integrity easily.
    “Delete destination data. Read data from source. Load into destination.”
    In case we are to opt for full load method for loading, we will read the 2 source tables (Customer and Sales) everyday in full. So,
    On 22 Mar 2012: We will read 2 records from Customer and 3 records from Sales and load all of them in the target.
    On 23 Mar 2012: We will read 3 records from customer (including the 2 older records) and 5 records from sales (including 3 old records) and will load or update them in the target data warehouse.
    As you can clearly guess, this method of loading unnecessarily read old records that we need not read as we have already processed them before. Hence we need to implement a smarter way of loading.

    Facebook Commentbox



    No comments:

    Post a Comment

    subscribe
    Subscribe Us
    emailSubscribe to our mailing list to get the updates to your email inbox... We can't wait more to have your email in our subscribers email list. Just put your nice email in below box: