A successful Data Warehouse implementation, acromym is “ACE”
Central fact table: over 120 million records
Activity on 6 million financial accounts
Size is about 400Gb
Complex Snowflake schema
About fifty users
Average of 400 queries per week, retrieving 2.1 million rows.
Bursts of 600 queries per week fetching 10.6 million rows.
All queries ad-hoc
Average performance per query
Average elapsed time per row returned
- 0.9 seconds each
Internal statistics warehouse
Provide a name to standardize discussion
Describe a problem
Describe a solution
Enumerate the consequences of design decisions
Why use them...
Provide a common framework for describing problems and solutions
Easy to catalog successful approaches
Easy to make incremental improvements
Pattern Language includes a template for describing a pattern
Intent - purpose of the pattern,
Aliases - other names,
Motivation - background and justification for this approach,
Applicability - circumstances for use,
Structure - object model diagram,
Participants - summary of classes involved,
Collaborations - clients of and services to,
Consequences - benefits and liabilities,
Implementation - issues to be solved as part of the implementation
Related Patterns - other patterns which are similar to, incorporated in or which would incorporate this pattern,
References - my own experience
Sample Code - skipped
Known Uses - ACE warehouse
Intent
- want to keep history of facts on completed business transactions
- no updates – only insert and retrieval
Aliases
- OLAP, Decision Support System, Executive Information System
Motivation
- Decision support different from day-to-day business operations.
- Historical analysis different from decision support.
- Relational database performance degrades with the size of the database.
- Contrast warehouse (On-Line Analytical Processing–OLAP) vs. operational (On-Line Transaction Processing–OLTP)
Applicability. Use a Data Warehouse when:
- Keeping historical data
- Retaining accurate, uniform history.
Structure
- Data Warehouse is an aggregate of one or more Star Schemas in the role of User-DB and Maint-DB.
- User-DB is a copy of Maint-DB, provided to users to keep availability close to 24x7.
Attributes
- two operating states: user retrieval or maintenance - a number of products, i.e. data from operational systems. Each product has state with date of most recent load - meta-data with source and transformations
Operations
- Fact Load - Fact Proof - Dimension Maintenance - Clone. We clone the maintenance database to create a user database - User Query
Participants
- warehouse is only an aggregate of star-schemae - user GUI is the most visible component - a background query processor permits users to do other things on their desktop (MS-Windows limitation) - load utilities do transformations; they may automate dimension maintenance - log everything in an administrative database - trusted user interface for database loads; no informal developer access
Collaborations
- fed from operational systems - feeds data marts - user GUI may be a collaborator instead of a participant. May be outside the warehouse – user-selected tools - metadata maintenance may be collaborator or participant; often meta-data kept separate from warehouse (we use excel spread-sheet!)
Consequences
- remove (expensive) history from operational databases - warehouse data must be final. What about adjustments? - to get availability to 24x7 clone maintenance DB to make user DB. Our database can be cloned in 4 hours
Implementation. Solutions will be required for all pattern elements
- Star Schema - User GUI - Background Query Processing - Administrative DB - Trusted User Interface - Fact Load - Warehouse Metadata
Related Patterns
- Operational Database
- A Warehouse GUI
- Background Query Processing
- Warehouse Load Utilities
- Warehouse Statistics
- Warehouse Operator Interface
Problem
- permit slicing and dicing by various keys
Solution
- prejoin numeric facts with all relevant dimensions
Consequences
- complex prejoined load
Problem
- dimensions are complex
Solution
- code dimensions with surrogate key
- dimension table has attributes
Consequences
- complex dimension maintenance during loads
Problem
- support end-user ad-hoc query
Solution
- show warehouse state, construct queries, submit for processing, collect results
Consequences
- repository for query construction, background query processor
Intent
- How to remove the processing load from the user desktop.
- Permit compression of results before sending back to the desktop.
Motivation
- user desktop limitations:
- may not be multi-tasking
- may not be reliable enough for long-running queries
- may not be secure enough to leave unattended
- DB server may not have enough performance to do long-running queries
- network connection may not be suitable for large query results
Applicability. Use a Background Query Processor when:
- desktop unsuitable for long-running queries
Structure
- a queue of SQL requests
- a query executor; creates results, reports status
- a results manager; compress, cache and transmit
Participants
- queue of SQL requests (SQL and start time) states: waiting, ready-to-run, finished.
- query executor gets SQL, establishes a DB connection, runs the SQL query, collects status.
- results manager compresses, caches and transmits results.
Collaborations
- queue fed from warehouse GUI
- status feed to warehouse GUI
- results compressed and transferred to file server to users workstation
Consequences
- protocol to enqueue a query; we insert SQL text into a table which represents the query queue.
- protocol to fetch results; we use PKZip and FTP
- protocol to show status; we put status into rows of a table
- background control will be large and complex
- multiple subtypes of background executions
- ad-hoc vs. canned production
- different result formats (depends on desktop tools)
Implementation
- pick a mechanism to enqueue queries
- pick a mechansim to execute the query; we use a sleep loop to poll for work
- pick a mechanism to reply with status
- pick a mechanism to send back results; our GUI is Business Objects; we reverse engineered their mechanism so we could replace it with PKZIP and FTP transfers
Related Patterns
- Star Schema will be queried by background processes running on a query server.
- Warehouse GUI enqueues queries, checks status and retrieves results.
- record all queries and all performance-related data in the Warehouse Statistics DB.
Problem
- Move normalized data to denormalized star or snowflake schema
Solution
- Standardized load utilities
Consequences
- Unique subclass for each legacy system
Problem
- Analyze performance and usage
Solution
- Keep load and query processing statistics
Consequences
- Need meta warehouse for statistics
- Need to parse ad-hoc SQL
Problem
- must assure quality of warehouse data
Solution
- controlled, trusted, production update process
- user proof and sign-off
Consequences
- no developer or user write access to DB
Problem
- what is meaning and provenance of data?
- maintenance of extracts, loads, etc.
Solution
- repository of meta data to describe source, business rules, and warehouse location
Consequences
- must be kept up-to-date and used for all maintenance
Problem
- users don’t have a useful business model
Solution
- look for correlations among facts
Consequences
- haphazard queries
Patterns capture knowledge
Patterns ease learning
Patterns facilitate incremental improvement
Warehouse is built from standard patterns
Pick and choose to make your own implementation
Patterns Leverage Experience
| version: | 1 |
|---|---|
| date: | 1999 |