Thursday 12 March 2015

What is ETL (Extract, Transform, Load)


ETL refers to the methods involved in accessing and   manipulating source   data and loading it into target database. During the ETL process, more   often, data is extracted from an OLTP database, transformed to   match the data warehouse schema, and loaded into the data warehouse   database.

(Extract, Transform, Load) –

ETL Architectures
ETL Process
 EXTRACTION :

  • Data Capturing: The ETL extraction element is responsible for extracting data from the source system. During extraction, data may be removed from the source system or a copy made and the original data retained in the source system.

Data Capturing


  • Data Transmission: Legacy systems may require too much effort to implement such offload processes, so legacy data is often copied into the data warehouse, leaving the original data in place. Extracted data is loaded into the data warehouse staging area (a relational database usually separate from the data warehouse database), for manipulation by the remaining ETL processes. 
ETL data Processing
Data Transmission
  • Cleansing Process: 
  1. Data extraction is generally performed within the source system itself.
  2. Data extraction processes can be implemented using Transact-SQL stored procedures, Data Transformation Services (DTS) tasks, or custom applications developed in programming or scripting languages. 



Transformation:
         The ETL transformation element is responsible for data validation, data accuracy, data type conversion, and business rule application. An ETL system that uses inline transformations during extraction is less robust and flexible than one that confines transformations to the reformatting element. Transformations performed in the OLTP system impose a performance burden on the OLTP database.


  • Data Validation: Check that all rows in the fact table match rows in dimension tables to enforce data integrity.

  • DataAccuracy : Ensure that fields contain appropriate values, such as only "off" or "on" in a status field.

  • Data Type Conversion : Ensure that all values for a specified field are stored the same way in the data warehouse regardless of how they were stored in the source system. For example, if one source system stores "off" or "on" in its status field and another source system stores "0" or "1" in its status field, then a data type conversion transformation converts the content of one or both of the fields to a specified common value such as "off" or "on".

    Transformation
  • Business Rule ApplicationEnsure that the rules of the business are enforced on the data stored in the warehouse. For example, check that all customer records contain values for both FirstName and LastName fields. 


Loading :

  • The ETL loading element is responsible for loading transformed data into the data warehouse database.
  • Data warehouses are usually updated periodically rather than continuously, and large numbers of records are often loaded to multiple tables in a single data load.
  • The data warehouse is often taken offline during update operations so that data can be loaded faster and SQL Server 2000 Analysis Services can update OLAP cubes to incorporate the new data. BULK INSERT, bcp, and the Bulk Copy API are the best tools for data loading operations.
  • The design of the loading element should focus on efficiency and performance to minimize the data warehouse offline time.

Facebook Commentbox

    





2 comments:

  1. good job friend .
    Please try to provide real scenarios as well with these concept .
    Best of luck !

    ReplyDelete
    Replies
    1. Thank you....please share any required information regarding DWH BI then please share it on businessintelligancedevloper@gmail.com,we will publish it on blog.
      Suggestions are welcome.


      Thanks

      Delete

subscribe
Subscribe Us
emailSubscribe to our mailing list to get the updates to your email inbox... We can't wait more to have your email in our subscribers email list. Just put your nice email in below box: