DATAWAREHOUSING

Friday, October 3, 2008

ETL

ETL Process: ETL is the Data Warehouse acquisition processes of Extracting, Transforming and Loading data from source systems into the data warehouse.

This requires an understanding of the business rules, the logical and the physical data models and also involves getting the data from the source and populating it into the target.

ETL-concepts

Extraction

Transformation

Loading

Slowly Changing Dimension –Classification

TYPE I

TYPE II

TYPE III

Slowly Changing Dimension implementation-types

Extraction types

Data transformation

Types of Data warehouse Loading

Types of Data Warehouse Updates

ETL - Approach in a nutshell

Data Modeling


The foundation of the data warehousing system is the data model. The first step in this stage is to build the Logical data model based on the user requirements and the next step would be to translate the Logical data model into a Physical data model.

Course Road map

An Introduction to DATA MODELING

E-R Data Model (Entities)

ENTITY

TYPES OF ENTITIES

RELATIONSHIP

TYPES OF RELATIONSHIP

STEPS TO DESIGN TRANSACTIONAL DATABASE

Normalization

1. 1st normal form

2. 2nd normal form

3. 3rd normal form

4. Boyce- codd normal form (BCNF)

5. 4th normal form

6. 5th normal form

De-normalization of database

Dimensional modeling

Measures

Fact

Dimension

Granularity

Dimensional Data Model -sub-models:

o Star Schema Model

o Star Flake Schema Model

o Snow Flake Schema Model

o Extended Star Schema Model

o Galaxy Schema Model

Surrogate key

Level

Hierarchy

Types of facts

Types of dimensions

Designing a star schema dimensional model

Thursday, October 2, 2008

DATAWAREHOUSING

Data warehousing is mainly done for the reporting purposes. All the historical data is put into a Data warehouse, which can be thought of as a Very large Database. Later on, reports are generating out of this Data Warehouse to do analysis of the business.


DATAWAREHOUSE

A Data Warehouse is a Subject Oriented, Integrated, Non volatile and Time Variant repository of data that is generally used for querying and analyzing the past trends to support management decisions for the future.

Data warehouse is a relational database that is designed for query and analysis Rather than for transaction processing.

It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

Course Roadmap

What is Data Warehousing ?

The evolution of Data Warehousing

Need for Data Warehousing

OLTP Vs Warehouse Applications

Data marts Vs Data Warehouses

Operational Data Stores

Overview of Warehouse Architecture

Data Warehouse Architectures

Components of Data Warehousing Architecture

An overview of each of the components

Considerations for Data Warehouse Design

Common mistakes in Warehouse designs

An overview of Warehouse on the web