Manual Building the Data Warehouse

Free download. Book file PDF easily for everyone and every device. You can download and read online Building the Data Warehouse file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Building the Data Warehouse book. Happy reading Building the Data Warehouse Bookeveryone. Download file Free Book PDF Building the Data Warehouse at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Building the Data Warehouse Pocket Guide.
About Barbara Lewis
  1. Building the Data Warehouse, 4th Edition
  2. Building a Data Warehouse: 6 Crucial Steps
  3. Evaluation Copy
  4. Building a custom data warehouse dashboard
  5. Building a Data Warehouse: 6 Crucial Steps | PTR

The magnetic tapes were good for storing a large volume of data cheaply, but the drawback was that they had to be accessed sequentially. In a given pass of a magnetic tape file, where percent of the records have to be accessed, typically only 5 percent or fewer of the records are actually needed. In addition, accessing an entire tape file may take as long as 20 to 30 minutes, depending on the data on the file and the pro- cessing that is done.

Around the mids, the growth of master files and magnetic tape exploded. And with that growth came huge amounts of redundant data. It is interesting to speculate what the world of information processing would look like if the only medium for storing data had been the magnetic tape.

If there had never been anything to store bulk data on other than mag- netic tape files, the world would have never had large, fast reservations sys- tems, ATM systems, and the like. Indeed, the ability to store and manage data on new kinds of media opened up the way for a more powerful type of pro- cessing that brought the technician and the businessperson together as never before. Figure The early evolutionary stages of the architected environment.

The s saw the advent of disk storage, or the direct access storage device DASD. Disk storage was fundamentally different from magnetic tape storage in that data could be accessed directly on a DASD. There was no need to go through records 1, 2, 3,. In fact, the time to locate a record on a DASD could be measured in milliseconds. By the mids, online transaction processing OLTP made even faster access to data possible, opening whole new vistas for business and processing.

The computer could now be used for tasks not previously possible, including driv- ing reservations systems, bank teller systems, manufacturing control systems, and the like. Had the world remained in a magnetic-tape-file state, most of the systems that we take for granted today would not have been possible. The end user began to assume a role previously unfathomed — directly controlling data and systems — a role previously reserved for the professional data processor.

With PCs and 4GL technology came the notion that more could be done with data than simply processing online transactions. Previously, data and technology were used exclusively to drive detailed operational decisions. No single data- base could serve both operational transaction processing and analytical pro- cessing at the same time.

The single-database paradigm was previously shown in Figure The extract program is the simplest of all programs. It rummages through a file or database, uses some criteria for selecting data, and, on finding qualified data, transports the data to another file or database.

The end user then owns the data once he or she takes control of it. For these and probably a host of other reasons, extract processing was soon found everywhere. Figure The nature of extract processing. Extract program Why extract processing? Extract processing Evolution of Decision Support Systems 5 First, there were extracts; then there were extracts of extracts; then extracts of extracts of extracts; and so forth. It was not unusual for a large company to per- form as many as 45, extracts per day.

The larger and more mature the organization, the worse the problems of the naturally evolving architecture become. Figure Lack of data credibility in the naturally evolving architecture. Say two departments are delivering a report to management — one department claims that activity is down 15 percent, the other says that activity is up 10 percent. Not only are the two departments not in sync with each other, they are off by very large margins.

In addition, trying to reconcile the different information from the dif- ferent departments is difficult. Unless very careful documentation has been done, reconciliation is, for all practical purposes, impossible.

When management receives the conflicting reports, it is forced to make deci- sions based on politics and personalities because neither source is more or less credible. This is an example of the crisis of data credibility in the naturally evolving architecture. This crisis is widespread and predictable. Figure shows such a time discrepancy. One department has extracted its data for analysis on a Sunday evening, and the other department extracted on a Wednesday afternoon. Is there any reason to believe that analysis done on one sample of data taken on one day will be the same as the analysis for a sample of data taken on another day?

Building the Data Warehouse, 4th Edition

Of course not. Data is always changing within the corporation. Any correlation between analyzed sets of data that are taken at different points in time is only coincidental. Evolution of Decision Support Systems 7 Figure The reasons for the predictability of the crisis in data credibility in the naturally evolving architecture. The second reason is the algorithmic differential.

For example, one depart- ment has chosen to analyze all old accounts. Another department has chosen to analyze all large accounts. Is there any necessary correlation between the char- acteristics of customers who have old accounts and customers who have large accounts? Probably not. So why should a very different result surprise anyone? The third reason is one that merely magnifies the first two reasons. There are extracts, extracts of extracts, extracts of extracts of extracts, and so on. Each new level of extraction exaggerates the other problems that occur.

The fourth reason for the lack of credibility is the problem posed by external data. For example, Figure showed one analyst bringing data into the mainstream of analysis from the Wall Street Journal, and another analyst bringing data in from Business Week. However, when the analyst brings data in, he or she strips the external data of its identity.

Because the origin of the data is not captured, it becomes generic data that could have come from any source. Furthermore, the analyst who brings in data from the Wall Street Journal knows nothing about the data being entered from Business Week, and vice versa. No wonder, then, that external data contributes to the lack of credibility of data in the naturally evolving architecture. The last contributing factor to the lack of credibility is that often there is no common source of data to begin with. Analysis for department A originates from file XYZ.

Analysis for department B originates from database ABC. Given these reasons, it is no small wonder that there is a crisis of credibility brewing in every organization that allows its legacy of hardware, software, and data to evolve naturally into the spider web. Problems with Productivity Data credibility is not the only major problem with the naturally evolving architecture. Productivity is also abysmal, especially when there is a need to analyze data across the organization.

Consider an organization that has been in business for a while and has built up a large collection of data, as shown in the top of Figure Management wants to produce a corporate report, using the many files and collections of data that have accumulated over the years. Evolution of Decision Support Systems 9 Figure The naturally evolving architecture is not conducive to productivity.

Building a Data Warehouse: 6 Crucial Steps

In order to locate the data, many files and layouts of data must be analyzed. Locating the data requires looking at lots of files. Lots of extract programs, each customized, have to cross many technological barriers. Furthermore, there are complicating factors. Hav- ing to go through every piece of data — not just by name but by definition and calculation — is a very tedious process.

But if the corporate report is to be pro- duced, this exercise must be done properly. The next task for producing the report is to compile the data once it is located. The program that must be written to get data from its many sources should be simple.

Evaluation Copy

In short, even though the report-generation program should be simple to write, retrieving the data for the report is tedious. In a corporation facing exactly the problems described, an analyst recently estimated a very long time to accomplish the tasks, as shown in Figure If the designer had asked for only two or three man-months of resources, then generating the report might not have required much management atten- tion.

But when an analyst requisitions many resources, management must consider the request with all the other requests for resources and must priori- tize the requests. In other words, if the first corporate report generated required a large amount of resources, and if all succeeding reports could build on the first report, then it might be worthwhile to pay the price for generating the first report. But that is not the case.

  1. Post navigation!
  2. The Politics of Social Science Research: ‘Race’, Ethnicity and Social Change.
  3. Global Warming The Complete Briefing.

Unless future corporate reporting requirements are known in advance and are factored into building the first corporate report, each new corporate report will probably require the same large overhead. In other words, it is unlikely that the first corporate report will be adequate for future corporate reporting requirements. Productivity, then, in the corporate environment is a major issue in the face of the naturally evolving architecture and its legacy systems.

Simply stated, when using the spider web of legacy systems, information is expensive to access and takes a long time to create. Evolution of Decision Support Systems 11 Figure When the first report is being written, the requirements for future reports are not known. From Data to Information As if productivity and credibility were not problems enough, there is another major fault of the naturally evolving architecture — the inability to go from data to information. At first glance, the notion of going from data to informa- tion seems to be an ethereal concept with little substance.

But that is not the case at all.

Building a custom data warehouse dashboard

The DSS analyst will have to deal with lots of unintegrated non- integrated legacy applications. For example, a bank may have separate sav- ings, loan, direct-deposit, and trust applications. However, trying to draw information from them on a regular basis is nearly impossible because the applications were never constructed with integration in mind, and they are no easier for the DSS analyst to decipher than they are for anyone else. But integration is not the only difficulty the analyst meets in trying to satisfy an informational request.

Building a Data Warehouse: 6 Crucial Steps | PTR

A second major obstacle is that there is not enough historical data stored in the applications to meet the needs of the DSS request. Next, you run into the lack of integration across applications. The applications were built to service the needs of current balance processing. They were never designed to hold the historical data needed for DSS analysis.

What is a data warehouse?

It is no wonder, then, that going to existing systems for DSS analysis is a poor choice. But where else is there to go? The systems found in the naturally evolving architecture are simply inade- quate for supporting information needs. They lack integration and there is a discrepancy between the time horizon or parameter of time needed for ana- lytical processing and the available time horizon that exists in the applications.