Data Warehouse Design Patterns¶

Steven F. Lott, Consultant¶

Data Warehouse Design Patterns¶

1. Objectives and Background
2. Patterns and Pattern Language
3. Warehouse Example Patterns
- Data Warehouse
- Background Query Processing
4. Conclusion

1. Objectives and Background¶

Discuss patterns (briefly)
Two detailed examples
- Warehouse
- Background Query Processing
Eight summary examples

Background¶

A successful Data Warehouse implementation, acromym is “ACE”

Physical Size
Usage
Performance
History

Physical Size¶

Central fact table: over 120 million records

Activity on 6 million financial accounts

Size is about 400Gb

We use Oracle/RDB compression to keep it small

Complex Snowflake schema

Usage¶

About fifty users

Average of 400 queries per week, retrieving 2.1 million rows.

Bursts of 600 queries per week fetching 10.6 million rows.

All queries ad-hoc

No aggregation in advance!

Performance¶

Average performance per query

8500 rows each
26 minutes each

Average elapsed time per row returned

0.9 seconds each

History¶

Internal statistics warehouse

Application Managament
Over 118 weeks of data
Over 46,000 queries
all tables and columns referenced by all queries ever run in production

2. Patterns and Pattern Language¶

Provide a name to standardize discussion

Describe a problem

Describe a solution

Enumerate the consequences of design decisions

Why use them...

Why Use Patterns?¶

Provide a common framework for describing problems and solutions

Easy to catalog successful approaches

Easy to make incremental improvements

Pattern Language includes a template for describing a pattern

Pattern Language¶

Intent - purpose of the pattern,

Aliases - other names,

Motivation - background and justification for this approach,

Applicability - circumstances for use,

Structure - object model diagram,

Participants - summary of classes involved,

Collaborations - clients of and services to,

Consequences - benefits and liabilities,

Implementation - issues to be solved as part of the implementation

Related Patterns - other patterns which are similar to, incorporated in or which would incorporate this pattern,

References - my own experience

Sample Code - skipped

Known Uses - ACE warehouse

3. Warehouse Example Patterns¶

Background Query Processing

Warehouse Load Utility

Warehouse Statistics

Restricted Update Interface

Metadata Repository

Data Mining

Data Warehouse¶

Intent

want to keep history of facts on completed business transactions

no updates – only insert and retrieval

Aliases

OLAP, Decision Support System, Executive Information System

Motivation

Decision support different from day-to-day business operations.

Historical analysis different from decision support.

Relational database performance degrades with the size of the database.

Contrast warehouse (On-Line Analytical Processing–OLAP) vs. operational (On-Line Transaction Processing–OLTP)

Applicability. Use a Data Warehouse when:

Keeping historical data

Retaining accurate, uniform history.

Structure

Data Warehouse is an aggregate of one or more Star Schemas in the role of User-DB and Maint-DB.

User-DB is a copy of Maint-DB, provided to users to keep availability close to 24x7.

Attributes

two operating states: user retrieval or maintenance - a number of products, i.e. data from operational systems. Each product has state with date of most recent load - meta-data with source and transformations

Operations

Fact Load - Fact Proof - Dimension Maintenance - Clone. We clone the maintenance database to create a user database - User Query

Participants

warehouse is only an aggregate of star-schemae - user GUI is the most visible component - a background query processor permits users to do other things on their desktop (MS-Windows limitation) - load utilities do transformations; they may automate dimension maintenance - log everything in an administrative database - trusted user interface for database loads; no informal developer access

Collaborations

fed from operational systems - feeds data marts - user GUI may be a collaborator instead of a participant. May be outside the warehouse – user-selected tools - metadata maintenance may be collaborator or participant; often meta-data kept separate from warehouse (we use excel spread-sheet!)

Consequences

remove (expensive) history from operational databases - warehouse data must be final. What about adjustments? - to get availability to 24x7 clone maintenance DB to make user DB. Our database can be cloned in 4 hours

Implementation. Solutions will be required for all pattern elements

Star Schema - User GUI - Background Query Processing - Administrative DB - Trusted User Interface - Fact Load - Warehouse Metadata

Related Patterns

Operational Database

A Warehouse GUI

Background Query Processing

Warehouse Load Utilities

Warehouse Statistics

Warehouse Operator Interface

Star Schema¶

Problem

permit slicing and dicing by various keys

Solution

prejoin numeric facts with all relevant dimensions

Consequences

complex prejoined load

Snowflake Schema¶

Problem

dimensions are complex

Solution

code dimensions with surrogate key

dimension table has attributes

Consequences

complex dimension maintenance during loads

Warehouse GUI¶

Problem

support end-user ad-hoc query

Solution

show warehouse state, construct queries, submit for processing, collect results

Consequences

repository for query construction, background query processor

Background Query Processing¶

Intent

How to remove the processing load from the user desktop.

Permit compression of results before sending back to the desktop.

Motivation

user desktop limitations:

may not be multi-tasking

may not be reliable enough for long-running queries

may not be secure enough to leave unattended

DB server may not have enough performance to do long-running queries

network connection may not be suitable for large query results

Applicability. Use a Background Query Processor when:

desktop unsuitable for long-running queries

Structure

a queue of SQL requests

a query executor; creates results, reports status

a results manager; compress, cache and transmit

Participants

queue of SQL requests (SQL and start time) states: waiting, ready-to-run, finished.

query executor gets SQL, establishes a DB connection, runs the SQL query, collects status.

results manager compresses, caches and transmits results.

Collaborations

queue fed from warehouse GUI

status feed to warehouse GUI

results compressed and transferred to file server to users workstation

Consequences

protocol to enqueue a query; we insert SQL text into a table which represents the query queue.

protocol to fetch results; we use PKZip and FTP

protocol to show status; we put status into rows of a table

background control will be large and complex

multiple subtypes of background executions

ad-hoc vs. canned production

different result formats (depends on desktop tools)

Implementation

pick a mechanism to enqueue queries

pick a mechansim to execute the query; we use a sleep loop to poll for work

pick a mechanism to reply with status

pick a mechanism to send back results; our GUI is Business Objects; we reverse engineered their mechanism so we could replace it with PKZIP and FTP transfers

Related Patterns

Star Schema will be queried by background processes running on a query server.

Warehouse GUI enqueues queries, checks status and retrieves results.

record all queries and all performance-related data in the Warehouse Statistics DB.

Warehouse Load Utility¶

Problem

Move normalized data to denormalized star or snowflake schema

Solution

Standardized load utilities

Consequences

Unique subclass for each legacy system

Warehouse Statistics¶

Problem

Analyze performance and usage

Solution

Keep load and query processing statistics

Consequences

Need meta warehouse for statistics

Need to parse ad-hoc SQL

Restricted Update Interface¶

Problem

must assure quality of warehouse data

Solution

controlled, trusted, production update process

user proof and sign-off

Consequences

no developer or user write access to DB

Metadata Repository¶

Problem

what is meaning and provenance of data?

maintenance of extracts, loads, etc.

Solution

repository of meta data to describe source, business rules, and warehouse location

Consequences

must be kept up-to-date and used for all maintenance

Data Mining¶

Problem

users don’t have a useful business model

Solution

look for correlations among facts

Consequences

haphazard queries

4. Conclusion¶

Patterns capture knowledge

Patterns ease learning

Patterns facilitate incremental improvement

Warehouse is built from standard patterns

Star Schema
Warehouse GUI
Load Utilities
etc.

Pick and choose to make your own implementation

Patterns Leverage Experience

Focus on elements
Make improvements
Not overwhelmed by whole architecture

version:	1
date:	1999

Data Warehouse Design Patterns¶

Steven F. Lott, Consultant¶