sac.blogspot.in

Wednesday, January 9, 2019

Top-Down vs. Bottom-Up In Data Warehousing

Data warehouse systems have gained popularity as companies from the most varied industries realize how useful these systems can be. A large number of these organizations, however, lack the experience and skills required to meet the challenges involved in data warehousing projects. In particular, a lack of a methodological approach prevents data warehousing projects from being carried out successfully. Generally, methodological approaches are created by closely studying similar experiences and minimizing the risks for failure by basing new approaches on a constructive analysis of the mistakes made previously.

Data warehouse design is one of the key technique in building the data warehouse. Choosing a right data warehouse design can save the project time and cost. Basically there are two data warehouse design approaches are popular.
1.5.1 Bottom-Up Design:
In the bottom-up design approach, the data marts are created first to provide reporting capability. A data mart addresses a single business area such as sales, Finance etc. These data marts are then integrated to build a complete data warehouse. The integration of data marts is implemented using data warehouse bus architecture. In the bus architecture, a dimension is shared between facts in two or more data marts. These dimensions are called conformed dimensions. These conformed dimensions are integrated from data marts and then data warehouse is built.
1.5.1.1 Advantages of bottom-up design are:

1. This model contains consistent data marts and these data marts can be delivered quickly.

2. As the data marts are created first, reports can be generated quickly.

3. The data warehouse can be extended easily to accommodate new business units. It is just creating new data marts and then integrating with other data marts.

1.5.1.2 Disadvantages of bottom-up design are:

The positions of the data warehouse and the data marts are reversed in the bottom-up approach design.

1.5.2 Top-Down Design:
In the top-down design approach the, data warehouse is built first. The data marts are then created from the data warehouse.
1.5.2.1 Advantages of top-down design are:

1. Provides consistent dimensional views of data across data marts, as all data marts are loaded from the data warehouse.

2. This approach is robust against business changes. Creating a new data mart from the data warehouse is very easy.

1.5.2.2 Disadvantages of top-down design are:

1. This methodology is inflexible to changing departmental needs during implementation phase.

2. It represents a very large project and the cost of implementing the project is significant.

1.5.3 Top-Down vs. Bottom-Up

When you consider methodological approaches, their top-down structures or bottom-up structures play a basic role in creating a data warehouse. Both structures deeply affect the Data Ware house lifecycle.

If you use a top-down approach, you will have to analyze global business needs, plan how to develop a data warehouse, design it, and implement it as a whole. This procedure is promising: it will achieve excellent results because it is based on a global picture of the goal to achieve, and in principle it ensures consistent, well integrated data warehouses. However, a long story of failure with top-down approaches teaches that:

High-cost estimates with long-term implementations discourage company managers from embarking on these kind of projects;
Analyzing and bringing together all relevant sources is a very difficult task, also because it is not very likely that they are all available and stable at the same time;
It is extremely difficult to forecast the specific needs of every department involved in a project, which can result in the analysis process coming to a standstill;
Since no prototype is going to be delivered in the short term, users cannot check for this project to be useful, so they lose trust and interest in it.

In a bottom-up approach, data warehouses are incrementally built and several data marts are iteratively created. Each data mart is based on a set of facts that are linked to a specific company department and that can be interesting for a user subgroup (for example, data marts for inventories, marketing, and so on). If this approach is coupled with quick prototyping, the time and cost needed for implementation can be reduced so remarkably that company managers will notice how useful the project being carried out is. In this way, that project will still be of great interest.

The bottom-up approach turns out to be more cautious than the top-down one and it is almost universally accepted. Naturally the bottom-up approach is not risk-free, because it gets a partial picture of the whole field of application. We need to pay attention to the first data mart to be used as prototype to get the best results: this should play a very strategic role in a company. In fact, its role is so crucial that this data mart should be a reference point for the whole data warehouse. In this way, the following data marts can be easily added to the original one. Moreover, it is highly advisable that the selected data mart exploit consistent data already made available.

http://cdn.ttgtmedia.com/rms/enterpriseApplications/Kimball's%20Approach.png

Data Granularity

The level of detail considered in a model or decision making process. The greater the granularity, the deeper the level of detail. Granularity is usually used to characterize the scale or level of detail in a set of data.

Les Barbusinski’s Answer: Granularity is usually mentioned in the context of dimensional data structures (i.e., facts and dimensions) and refers to the level of detail in a given fact table. The more detail there is in the fact table, the higher its granularity and vice versa. Another way to look at it is that the higher the granularity of a fact table, the more rows it will have.

Let me illustrate with the following example: Say we have a data mart with a single fact (Sales) and three dimensions (Time, Organization and Product). The fact table contains three metrics (Unit Price, Units Sold and Total Sale Amount). The Time dimension consists of four hierarchical elements (Year, Quarter, Month and Day). The Organization dimension consists of three hierarchical elements (Region, District and Store). The Product dimension consists of two hierarchical elements (Product Family and SKU).

As always, the metrics in the Sales fact table must be stored at some intersection of the dimensions (i.e., Time, Organization and Product). Hence, in this data mart, the highest granularity that we can store Sales metrics is by Day/Store/SKU (i.e., the lowest level in each dimensional hierarchy). Conversely, the lowest granularity that we can aggregate Sales metrics to in this data mart is by Year/Region/Product Family (i.e., the highest level in each dimensional hierarchy). We may also (for a variety of performance reasons) choose to store Sales metrics at some intermediate level of granularity (e.g., by Month/District/SKU).

Chuck Kelley’s Answer: Granularity is the level of depth of data. For example, you might have a date/time dimension which could be at the year, month, quarter, period, week, day, hour, minute, second, hundredths of seconds level of granularity. Most data warehouses do not go to the second or hundredths of seconds level, but it could be possible. The granularity with be the lowest level of the depth of data.

Joe Oates’ Answer: Granularity refers to the level of detail of the data stored fact tables in a data warehouse. High granularity refers to data that is at or near the transaction level. Data that is at the transaction level is usually referred to as atomic level data. Low granularity refers to data that is summarized or aggregated, usually from the atomic level data. Summarized data can be lightly summarized as in daily or weekly summaries or highly summarized data such as yearly averages and totals.

Clay Rehm’s Answer: Granularity simply means the level of detail. A typical data warehouse will have some tables in it that have a lot of detail and have other tables that are summarized or aggregated, which means less detail. Each non- key column in a fact table must be at the same level of granularity (detail). For example, if your primary measure on a specific fact table is daily total sales, then that is defining the granularity. Only when the grain for the fact table is chosen can we identify the dimensions of that fact table.

Difference between Data ware House and DBMS

A data warehouse is a database used to store data. It is a central repository of data in which data from various sources is stored. A data warehouse is also known as an enterprise data warehouse.

The data warehouse is then used for reporting and data analysis. It can be used for creating trending reports for senior management reporting such as annual and quarterly comparisons.

The purpose of a data warehouse is to provide flexible access to the data to the user. Data warehousing generally refers to the combination of many different databases across an entire enterprise. Data warehouses store current as well as historical data, so that all of the relevant data may be used for analysis. The analysis helps to find and show relationships among the data, to extract meaning from the data.

A database, on the other hand, is the basis or any data storage. It is an organized collection of data. Data from various sources are collected in to a single place, this place is the database. The data is organized into a structure of some sort, mainly according to a database model. The most commonly used database model is the relational model, others include hierarchical model, network model, etc.

In order to retrieve data from a database, one has to use a database management system (DBMS). The database management systems are designed applications that interact with the user, other applications, and the database itself to capture and analyze data. The DBMS is designed to allow the definition, creation, querying, update, and administration of databases. Some popular DBMSs include MySQL, PostgreSQL, SQLite, Microsoft SQL Server, Microsoft Access, Oracle, etc.

While, a database and a data warehouse may seem the same, they are actually different is a key aspect. A database is used to store data while a data warehouse is mostly used to facilitate reporting and analysis. Basically, database is just where the data is stored; in order to access this data or analyze it a database management system is required. However, a data warehouse does not necessarily require a DBMS. The purpose of a data warehouse is for easy access to the data for a user. The data warehouse may also be used to analyze the data; however the actual process of analysis is called data mining.

1. A database is used for Online Transactional Processing (OLTP) but can be used for other purposes such as Data Warehousing.

2. A data warehouse is used for Online Analytical Processing (OLAP). This reads the historical data for the Users for business decisions.

3. In a database the tables and joins are complex since they are normalized for RDMS. This reduces redundant data and saves storage space.

4. In data warehouse, the tables and joins are simple since they are de-normalized. This is done to reduce the response time for analytical queries.

5. Relational modeling techniques are used for RDMS database design, whereas modeling techniques are used for the Data Warehouse design.

6. A database is optimized for write operation, while a data warehouse is optimized for read operations.

7. In a database, the performance is low for analysis queries, while in a data warehouse, there is high performance for analytical queries.

8. A data warehouse is a step ahead of a database. It includes a database in its structure.

Objective of Dataware House

Data Warehouse is designed to meet some business needs, which is transaction system cannot do (and vice versa). Data Warehouse is designed to meet some business needs, which is transaction system cannot do (and vice versa). he big question is on why we have to create a Data-Warehouse?. The reasons are as follows.

As a starter, let’s say you want to know the monthly variations in 3 months running average on your customer balances over last twelve months grouped by products+ channels+ customer segments. Let’s see why do you need a data-warehouse for this purpose.

1.2.1 Keeping Analysis/Reporting and Production Separate.

If you run the above-said query on your production systems, you will find that it will lock all your tables and will eat-up most of your resources, as it will be accessing a lot of data doing a lot of calculations. This results in the production work to come to a virtual halt. Imagine hundreds of such above-said queries running at the same time on your production systems.

Reporting and analysis work typically access data across the database tables, whereas production work typically accesses specific customer OR product OR channel record at a point of time. That’s why it is important to have the Information generation work to be done from an offline platform (aka. Data Warehouse). Purpose of Data Warehouse is to keep analysis/reporting (non-production use data) separate from production data.

1.2.2 Information Integration from multiple systems- Single point source for information

As an example- Let’s say you have different systems for say a loan product vs. credit card product. The above-said query, if run on production will need to pick the data on real time basis from these systems.

This will make the query extremely slow, and will need to do connects in the intermediate tables OR in run-time memory. Moreover it will not be a reliable result as at a particular point of time, the databases may not be in synch as many of such synching happen in the end of day batch runs.

1.2.3 DW purpose for Data Consistency and Quality

Organizations are riddled with tens of important systems from which their information comes. Each of these systems may carry the information in different formats and also may be having out of synch information. (Different customer ID formats, mismatch in the supplier statuses). By bringing the data from these disparate sources at a common place, one can effectively undertake to bring the uniformity and consistency in data (Refer to cleansing and Data Transformation).

1.2.4 High Response Time- Production Databases are tuned to expected transaction load

Even if you run the above-said query on an offline database, it will take a lot of time on the database design, which is same as that of production. This is because the production databases are created to cater to production work. In production systems, there is some level of expected intensity for different kind of actions. Therefore, the indexing and normalization and other design considerations are for given transaction loads. However, the Data-warehouse has to be ready for fairly unexpected loads and type of queries, which demands a high degree of flexibility and quick response time.

1.2.5 Data Warehouse objective of providing an adaptive and flexible source of information.

Its easier for users to define the production work and functionalities they want, but difficult to define the analysis they need. The analysis needs keep on changing and Data-Warehouse has the capabilities to adapt quickly to the changing requirements. Please refer to 'Dimension Modeling'

1.2.6 Establish the foundation for Decision Support

Decision process of an organization will involve analysis, data mining, forecasting, decision modeling etc. By having a common point, which can provide consistent, quality data with high response time provides the core enabler for making fast and informed decisions.

Definition of Data ware house

1.1 Definition of Data ware house

“A data ware is a subject –oriented, integrated, time varying, non-volatile collection of data in support of the management’s decision making process”

By W H Inmon(1993)

1.1.1 Subject-Oriented:

A data warehouse is organized around major subject such a customer, products, sale etc. Data are organized according to subject instead of application. The data organized by subject obtains only information necessary for the decision support system.

1.1.2 Non-Volatile:

The data are not updated and changed in any way once they enter the data warehouse but data is only loaded, refreshed and accessed for query. Data warehouse is a always physically separate store of data which is transformed from the application from the appropriate environment, therefore DWH don’t need transaction, processing, recovery, concurrency control etc.

1.1.3 Time Varying:

Data are stored in a DWH to provide a historical perspective. Every key structure in DWH contain implicitly or explicitly, an element of time. The DWH contains a place for sorting data that are 5 to 10 years old or older to be used for comparisons, trends, forecasting.

1.1.4 Integrated

A DWH is usually constructed by integrating multiple, heterogeneous source such as relation databases, flat files and OLTP.

Tuesday, November 20, 2018

Java Script History

Introduction to Java Script

Monday, November 12, 2018

Database Connectivity ( NetBean Configuration)

Link: Download Connectivity Documents

Step-1:Configure Database in NetBeans 8.1

Step-2: Right click on database following window will popup

Step-3:

Step-4:

Step-5:Give your username, password, service ID.

Step-6: Click Test Button to Check The Connectivity: It Connection is successful then message will be displayed in the screen.

Step-7:Select Schema (User Name)

Step-8:Connection String will be generated by NetBean

Step-9:

Monday, November 5, 2018

JSP- Java Server Page

1. Introduction, Why JSP is used.
2. JSP Comparative

Sunday, November 4, 2018

Inheritance Programs in C++

Ref: Kamthane.

/*C++ program to demonstrate example of private simple inheritance.*/

#include
using namespace std;
class A
{ private:
int a;
protected:
int x; //can access by the derived class
public:
void setVal(int v)
{ x=v; }
};
class B:private A
{
public:
void printVal(void)
{
setVal(10); //accessing public member function here //protected data member direct access here
cout << "value of x: " << x << endl;
}
};
int main()
{
B objB; //derived class creation
objB.printVal();
return 0;
}

/* C++ program to demonstrate example of multilevel inheritance.*/

#include
using namespace std;

//Base Class : class A
class A
{
private:
int a;
public:
void get_a(int val_a)
{ a=val_a; }

void disp_a(void)
{ cout << "Value of a: " << a << endl; }
};
class B: public A
{
private:
int b;
public:
void get_b(int val_a, int val_b)
{
get_a(val_a); //assign value of a by calling function of class A
b=val_b;
}
void disp_b(void)
{
//display value of a
disp_a();
cout << "Value of b: " << b << endl;
}
};

//Here class C is derived class and B is Base class

class C: public B
{
private:
int c;
public:
//assign value of a from here
void get_c(int val_a, int val_b,int val_c)
{
/*** Multilevel Inheritance ***/
//assign value of a, bby calling function of class B and Class A
//here Class A is inherited on Class B, and Class B in inherited on Class B
get_b(val_a,val_b);
c=val_c;
}

void disp_c(void)
{
disp_b(); //display value of a and b using disp_b()
cout << "Value of c: " << c << endl;
}
};

int main()
{
//create object of final class, which is Class C
C objC;
objC.get_c(10,20,30);
objC.disp_c();
return 0;
}
---------------------

/* C++ program to demonstrate example of Multiple inheritance.*/
class A
{
protected:
int a;
}
class B
{
protected:
int b;
}
class C
{
protected:
int c;
}
class D
{
protected:
int d;
}

class E:public A,B,C,D
{
int e;
public:
void getdata()
{
cout<<"\n Enter the values of a,b,c,d,e";
cin>>a>>b>>c>>d>>e;
}
void showdata()
{
cout<<"\n Values of a"<

Saturday, November 3, 2018

Proejct Table Structures - Setting General Tables

Master Tables

State_Master
id	Auto Generated
code	Auto Generated
Name	User Input
ShortName	User Input
createby	Logged in User
createdate	System Date
updateby	Logged in User
updatedate	System Date
STATUS	User Input

Country_Master
id	Auto Generated
code	Auto Generated
Name	User Input
ShortName	User Input
createby	Logged in User
createdate	System Date
updateby	Logged in User
updatedate	System Date
STATUS	User Input

Type_Master
id	Auto Generated
code	Auto Generated
Name	User Input
createby	Logged in User
createdate	System Date
updateby	Logged in User
updatedate	System Date
STATUS	User Input

SubType_Master
id	Auto Generated
code	Auto Generated
Name	User Input
typeid	Ref - Type_Master
createby	Logged in User
createdate	System Date
updateby	Logged in User
updatedate	System Date
STATUS	User Input

User_Master
id	Auto Generated
code	Auto Generated
F_NAME	User Input
M_NAME	User Input
L_NAME	User Input
USER_NAME
PASSWORD	User Input
TYPEID	Ref - Type_Master
FIRST_LOGIN	Auto Generated
EXPIRYDATE	Auto Generated
PASSUPDATE	Auto Generated
createby	System Date
createdate	Logged in User
updateby	System Date
updatedate	User Input
STATUS	User Input

First_Load (Back End table)
COMPANY_CODE
COMPANY_NAME
SOFTWARE_VERSION
SOFTWARE_NAME
MAINSCREEN_PATH
MOBILE_NO1
MOBILE_NO2
WEBSITE
ADDRESS
TAG_LINE
PHONE_NO
createby
createdate
updateby
updatedate
STATUS	User Input

SETTING_MASTER
ID	Auto Generated
CODE	Auto Generated
FORM_NAME	User Input
FORM_NAME_MENU	User Input
FORM_PATH	User Input
createby	Logged in User
createdate	System Date
updateby	Logged in User
updatedate	System Date
STATUS	User Input

List