datawarehouse concepts-BusinessIntelligence

Thursday, 19 February 2015

Surrogate key in Datawearhouse with example

What is Surrogate Key?

Surrogate Keys are integers that are assigned sequentially in the dimension table which can be used as PK.
Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, because the production key may be alphanumeric or composite key but the surrogate key is always single numeric key.
Assume the production key is an alphanumeric field if you create an index for this fields it will occupy more space, so it is not advisable to join/index, because generally all theDatawearhouse fact table are having historical data.
These fact Table are linked with so many dimension table. if it's a numerical fields the performance is high
Surrogate key is the primary key for the Dimensional table.
It’s a substitution for the natural primary key.
It is just a unique identifier or number for each row that can be used for the primary key to the table.
The only requirement for a surrogate primary key is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or identity key) key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Example:

Let us consider an scenarios where you have designed a very good Data Warehouse , its catering all your reporting need and its in production. After two years organization decides to reuse there business keys of products

e.g.

There was product called Baby Powder after two years organization decides to stop selling at its cost is high and sell is low and instead Baby Powder they launched a Talcum Powder for Men and want to give same key to the product say 336.

Now while designing a Data Warehouse you have used Business Keys as Primary Key in dimension table with this new change you will have to update the dimension table to replace Baby Powder with talcum Powder and organization does want to remove all data of Baby Powder, what will you do now?

So to avoid such situation its always better to use Surrogate Keys as Primary Key in dimension table along with Business Key.

e.g.

SK_Prouduct	Product_ID	Product Name	Cost	Acive
1	336	Baby Powder	444	N
2	345	Cream	34	Y
3	336	talkem Powder	44	Y

Now with above table you can use same code for another Product, mark the product as active and inactive and maintain all the product in Data Warehouse

Advantages of Surrogate Key:

Surrogate Key allow to cater all your data need that arises from the Business or operational changes and recycling.

Surrogate Keys allow the data warehouse to integrate data from all sources if they lack natural Business keys.

Surrogate keys are very helpful for ETL transformations.

Facebook Commentbox

Sunday, 15 February 2015

Operational Systems in datawearhouse

Operational Systems

Operational systems are the ones supporting the day-to-day activities of the enterprise. They are focused on processing transactions, ranging from order entry to billing to human resources transactions. In a typical organization, the operational systems use a wide variety of technologies and architectures, and they may include some vendor-packaged systems in addition to in-house custom-developed software. Operational systems are static by nature; they change only in response to an intentional change in business policies or

processes, or for technical reasons, such as system maintenance or performance tuning.

Operational databases are normally "relational" - not "dimensional". They are designed for operational, data entry purposes and are not well suited for online queries and analytics.

These operational systems are the source of most of the electronically maintained data within the CIF. Because these systems support time-sensitive realtime transaction processing, they have usually been optimized for performance and transaction throughput. Data in the operational systems environment may be duplicated across several systems, and is often not synchronized. These operational systems represent the first application of business rules to an organization’s data, and the quality of data in the operational systems has a direct impact on the quality of all other information used in the organization.

Sometimes operational systems are referred to as operational databases, transaction processing systems, or online transaction processing systems (OLTP). However, the use of the last two terms as synonyms may be confusing, because operational systems can be batch processing systems as well.

Any Enterprise must necessarily maintain a lot of data about its operation. This is its "Operational Data".

Operational systems vs. Data warehousing

The fundamental difference between operational systems and data warehousing systems is that operational systems are designed to support transaction processing whereas data warehousing systems are designed to support online analytical processing (or OLAP, for short).

Based on this fundamental difference, data usage patterns associated with operational systems are significantly different than usage patterns associated with data warehousing systems. As a result, data warehousing systems are designed and optimized using methodologies that drastically differ from that of operational systems.

The table below summarizes many of the differences between operational systems and data warehousing systems.

Difference between operational systems and data warehousing systems

operational systems	data warehousing systems
Operational systems are generally designed to support high-volumetransaction processing with minimal back-end reporting.	Data warehousing systems are generally designed to support high-volume analytical processing (i.e. OLAP) and subsequent, often elaborate report generation.
Operational systems are generally process-oriented or process-driven, meaning that they are focused on specific business processes or tasks. Example tasks include billing, registration, etc.	Data warehousing systems are generally subject-oriented, organized around business areas that the organization needs information about. Such subject areas are usually populated with data from one or more operational systems. As an example, revenue may be a subject area of a data warehouse that incorporates data from operational systems that contain student tuition data, alumni gift data, financial aid data, etc.
Operational systems are generally concerned with current data.	Data warehousing systems are generally concerned with historical data.
Data within operational systems are generally updated regularlyaccording to need.	Data within a data warehouse is generally non-volatile, meaning that new data may be added regularly, but once loaded, the data is rarely changed, thus preserving an ever-growing history of information. In short, data within a data warehouse is generally read-only.
Operational systems are generally optimized to perform fast inserts and updates of relatively small volumes of data.	Data warehousing systems are generally optimized to perform fast retrievals of relatively large volumes of data.
Operational systems are generally application-specific, resulting in a multitude of partially or non-integrated systems and redundant data(e.g. billing data is not integrated with payroll data).	Data warehousing systems are generally integrated at a layer above the application layer, avoiding data redundancy problems.
Operational systems generally require a non-trivial level of computing skills amongst the end-user community.	Data warehousing systems generally appeal to an end-user community with a wide range of computing skills, from novice to expert users.

Facebook Commentbox

Friday, 30 January 2015

SQL interview Questions and answers for Freshers

DATABASE interview questions for freshers

SQl Interview Qustions For Fresher's

1. What is the difference between a "where" clause and a "having" clause?

- "Where" is a kind of restiriction statement. You use where clause to restrict all the data from DB.Where clause is using before result retrieving. But Having clause is using after retrieving the data.Having clause is a kind of filtering command.

2. What is the basic form of a SQL statement to read data out of a table?

-The basic form to read data out of table is ‘SELECT * FROM table_name; ‘ An answer: ‘SELECT * FROM table_name WHERE xyz= ‘whatever’;’ cannot be called basic form because of WHERE clause.

3. What structure can you implement for the database to speed up table reads?

- Follow the rules of DB tuning we have to: 1] properly use indexes ( different types of indexes) 2] properly locate different DB objects across different tablespaces, files and so on.3] create a special space (tablespace) to locate some of the data with special datatype ( for example CLOB, LOB and …)

4. What are the tradeoffs with having indexes?

- 1. Faster selects, slower updates.

2. Extra storage space to store indexes. Updates are slower because in addition to updating the table you have to update the index.

5. What is a "join"?

- ‘join’ used to connect two or more tables logically with or without common field.

6. What is "normalization"? "Denormalization"? Why do you sometimes want to denormalize?

- Normalizing data means eliminating redundant information from a table and organizing the data so that future changes to the table are easier. Denormalization means allowing redundancy in a table. The main benefit of denormalization is improved performance with simplified data retrieval and manipulation. This is done by reduction in the number of joins needed for data processing.

7. What is a "constraint"?

- A constraint allows you to apply simple referential integrity checks to a table. There are four primary types of constraints that are currently supported by SQL Server: PRIMARY/UNIQUE - enforces uniqueness of a particular table column. DEFAULT - specifies a default value for a column in case an insert operation does not provide one. FOREIGN KEY - validates that every value in a column exists in a column of another table. CHECK - checks that every value stored in a column is in some specified list. Each type of constraint performs a specific type of action. Default is not a constraint. NOT NULL is one more constraint which does not allow values in the specific column to be null. And also it the only constraint which is not a table level constraint.

8. What types of index data structures can you have?

- An index helps to faster search values in tables. The three most commonly used index-types are: - B-Tree: builds a tree of possible values with a list of row IDs that have the leaf value. Needs a lot of space and is the default index type for most databases. - Bitmap: string of bits for each possible value of the column. Each bit string has one bit for each row. Needs only few space and is very fast.(however, domain of value cannot be large, e.g. SEX(m,f); degree(BS,MS,PHD) - Hash: A hashing algorithm is used to assign a set of characters to represent a text string such as a composite of keys or partial keys, and compresses the underlying data. Takes longer to build and is supported by relatively few databases.

9. What is a "primary key"?

- A PRIMARY INDEX or PRIMARY KEY is something which comes mainly from database theory. From its behavior is almost the same as an UNIQUE INDEX, i.e. there may only be one of each value in this column. If you call such an INDEX PRIMARY instead of UNIQUE, you say something about
your table design, which I am not able to explain in few words. Primary Key is a type of a constraint enforcing uniqueness and data integrity for each row of a table. All columns participating in a primary key constraint must possess the NOT NULL property.

10.What is a "functional dependency"? How does it relate to database table design?

- Functional dependency relates to how one object depends upon the other in the database. for example, procedure/function sp2 may be called by procedure sp1. Then we say that sp1 has functional dependency on sp2.

11.What is a "trigger"?

- Triggers are stored procedures created in order to enforce integrity rules in a database. A trigger is executed every time a data-modification operation occurs (i.e., insert, update or delete). Triggers are executed automatically on occurance of one of the data-modification operations. A trigger is a database object directly associated with a particular table. It fires whenever a specific statement/type of statement is issued against that table. The types of statements are insert,update,delete and query statements. Basically, trigger is a set of SQL statements A trigger is a solution to the restrictions of a constraint. For instance: 1.A database column cannot carry PSEUDO columns as criteria where a trigger can. 2. A database constraint cannot refer old and new values for a row where a trigger can.

12.Why can a "group by" or "order by" clause be expensive to process?

- Processing of "group by" or "order by" clause often requires creation of Temporary tables to process the results of the query. Which depending of the result set can be very expensive.

13.What is "index covering" of a query?

- Index covering means that "Data can be found only using indexes, without touching the tables"

14.What is a SQL view?

- An output of a query can be stored as a view. View acts like small table which meets our criterion. View is a precomplied SQL query which is used to select data from one or more tables. A view is like a table but it doesn’t physically take any space. View is a good way to present data in a particular format if you use that query quite often. View can also be used to restrict users from accessing the tables directly.

Thank you Readers.If you have more Question or suggestions for Sql Interview Qustions for freshers then please comment on below box.

Facebook Commentbox

Sunday, 25 January 2015

Datawearhousing OLAP

OLAP in Datawearhouse

Defination

OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning.

How is OLAP Technology Used?

OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling. It is the foundation for may kinds of business applications for Business Performance Management, Planning, Budgeting, Forecasting, Financial Reporting, Analysis, Simulation Models, Knowledge Discovery, and Data Warehouse Reporting. OLAP enables end-users to perform ad hoc analysis of data in multiple dimensions, thereby providing the insight and understanding they need for better decision making.

Types of OLAP Servers

We have four types of OLAP servers:

Relational OLAP (ROLAP)

Multidimensional OLAP (MOLAP)

Hybrid OLAP (HOLAP)

Specialized SQL Servers

Relational OLAP

ROLAP servers are placed between relational back-end server and client front-end tools. To store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.

ROLAP includes the following:

Implementation of aggregation navigation logic.

Optimization for each DBMS back end.

Additional tools and services.

Multidimensional OLAP

MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore, many MOLAP server use two levels of data storage representation to handle dense and sparse data sets.

Hybrid OLAP (HOLAP)

Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data volumes of detailed information. The aggregations are stored separately in MOLAP store.

OLAP Operations

Since OLAP servers are based on multidimensional view of data, we will discuss OLAP operations in multidimensional data.

Here is the list of OLAP operations:

Roll-up

Drill-down

Slice and dice

Pivot (rotate)

Roll-up

Roll-up performs aggregation on a data cube in any of the following ways:

By climbing up a concept hierarchy for a dimension

By dimension reduction

The following diagram illustrates how roll-up works.

·        Roll-up is performed by climbing up a concept hierarchy for the dimension location.

·        Initially the concept hierarchy was "street < city < province < country".

·        On rolling up, the data is aggregated by ascending the location hierarchy from the level of city to the level of country.

·        The data is grouped into cities rather than countries.

·        When roll-up is performed, one or more dimensions from the data cube are removed.

Drill-down

Drill-down is the reverse operation of roll-up. It is performed by either of the following ways:

By stepping down a concept hierarchy for a dimension

By introducing a new dimension.

The following diagram illustrates how drill-down works:

·        Drill-down is performed by stepping down a concept hierarchy for the dimension time.

·        Initially the concept hierarchy was "day < month < quarter < year."

·        On drilling down, the time dimension is descended from the level of quarter to the level of month.

·        When drill-down is performed, one or more dimensions from the data cube are added.

·        It navigates the data from less detailed data to highly detailed data.

Slice

The slice operation selects one particular dimension from a given cube and provides a new sub-cube. Consider the following diagram that shows how slice works.

· Here Slice is performed for the dimension "time" using the criterion time = "Q1".

· It will form a new sub-cube by selecting one or more dimensions.

Dice

Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the following diagram that shows the dice operation.

The dice operation on the cube based on the following selection criteria involves three dimensions.

(location = "Toronto" or "Vancouver")

(time = "Q1" or "Q2")

(item =" Mobile" or "Modem")

Pivot

The pivot operation is also known as rotation. It rotates the data axes in view in order to provide an alternative presentation of data. Consider the following diagram that shows the pivot operation.

In this the item and location axes in 2-D slice are rotated.

Facebook Commentbox

Friday, 23 January 2015

Datawearhouse concepts

Datawearhouse concepts & Discussions and BI job postings

Hello all.........

Welcome to Datawearhousing and BusinessIntelligence Tutorials world.

Enjoy the following things in this blog.

Datawerhousing and Business Intelligence Concepts.
Sql Practicle Query Interview Questions asked in technicle Interview.
Datawearhousing and BI job Posting Specially for Mumbai and pune people (Others city also)
Download DWH and BI study Materials like DWH books,Offline Tutorials,Scenario Interviews Questions.
Study Materials for Other BI tools like Informatica,Cognos,Business Objects,SSIS,
Database concepts and other database tools,Interview questions regarding other tools.

Thursday, 19 February 2015

Surrogate key in Datawearhouse with example

What is Surrogate Key?

Facebook Commentbox