A successful Data Warehouse implementation
Internal statistics warehouse
- “Application Management” discipline
- Over 118 weeks of data
- Over 46,000 queries
- All tables and all columns referenced by all queries ever run in production
- Weekly merge from admin into stats DB
Four broad categories of financial products
About 12 operational systems
User Requirements:
- “Yeah, we might need that, include it, also.”
Over 1000 data items
Fairly simple set of measures
Outstanding debt
Recent Payments
Complex set of dimensions
100’s of attributes
- Some well understood, many very murky
Common feature of most businesses
“Building a warehouse is like sending a person to the Moon” – Matthew Schall, IBM
Two Solutions
- Document all special-cases, insulate the users
- Discover special-cases, involve the users
Don’t insulate the users from operational details
Remember your purpose
Data → Information → Decision → Action
The warehouse doesn’t provide reports!
- It supports actionable decisions
– Operational systems are full of unused and misused features.
- No one has 3 Christmas Club accounts
- So the xmas_club_3 field is now used to track accounts created due to live-check promotions
- This may be graceless, but it’s a fact
You will need very detailed meta data
A simple Filename, byte offset, domain map is inadequate
You must capture detailed semantics
You must guard against the evil of
- Semantic Heterogeneity
- “I call them customers, you call them accounts”
User access to dimension information
- No need to keep private dimension facts
- No need for complex desktop processing
- No need for large result sets
Production queries
- Reduce (nearly) identical queries
- Additional time-domain data
Warehouse has multiple users
Warehouse has a "generic" business focus
Warehouse works correctly
- It isn’t broke, but it isn’t working
- Datamart is a business-focused subset of the warehouse
Two Paths
- Warehouse Datamarts
- Datamarts Warehouse
Start from Warehouse
- Sychronization of data guaranteed
Start from Datamart
- Very hard to reconcile semantics or timing
118 reports produced by one department
Some reports are so complex only key individuals produce them
- And they recently quit!
Other reports have interesting discrepancies
Some reports include manually fed data
Some reports include adjustments
About 24 queries produce all reports
Complex reports involve non-standard processing
- non-star-schema joins
- additional dimension information
4 new (simple) aggregates required
- Some new dimensions
Add extract and rollup process to warehouse load to produce aggregates
Not always the right solution
Ours takes 24+ hours, once each month
- YMMV
Add to Business Objects repository
Reconcile Dimensions
- Desktop - Datamart - Warehouse
Heavy use of datamart objects
Able to produce comprehensive reports 24 hours after load
- With very fast queries against aggregate
All Private dimension information integrated into datamart
Partially integrated into warehouse
User changes to dimensions now visible
- Fewer mystery totals on reports
Use of aggregates standardizes results
Fewer rogue analysts in user department
Demand for “adjustment” to match G/L data
- down to the dollar
- unprecedented accuracy
Warehouse provides bulk data
Datamart provides needed business rules
Transform data into information
Need both to make actionable decisions
Decision-making is what a warehouse or datamart supports
- not reporting
| version: | 1 |
|---|---|
| date: | 1998 |