| date: | 2005-12-01 11:24:46 |
|---|---|
| category: | Data Structures and Algorithms |
The open question is “What is the cost of fragmentation?”
The cost has some absolute components and some relative components. Since fragmentation is difficult to avoid except through grotesque over-allocation of space, the issue is to control fragmentation through normalization. A more important pair of questions, then, are these:
The absolute costs are relatively easy to identify: they are the costs of defragmenting. These are the downtime to defragment, and risk of failure in this processing which adds complexity but does not create value.
The relative costs can be measured through a comparison between three designs: denormalized, partially normalized and fully normalized.
Here is the denormalized MESS design. The ev’s are “Events” which are filled in by updates, leading to fragmentation. The business processing makes ev1 mandatory, and the other events may, or may not happen.
CREATE TABLE MESS(
key VARCHAR(20),
ev1text VARCHAR(100),
ev1date TIMESTAMP,
ev2text VARCHAR(100),
ev2date TIMESTAMP,
ev3text VARCHAR(100),
ev3date TIMESTAMP );
Here’s a normalized design which controls fragmentation by isolating ev2 and ev3 event data into a separate table. The M2 table can be sparse and can fragment.
CREATE TABLE M1C(
key VARCHAR(20),
ev1text VARCHAR(100),
ev1date TIMESTAMP );
CREATE TABLE M1X(
key VARCHAR(20),
ev2text VARCHAR(100),
ev2date TIMESTAMP,
ev3text VARCHAR(100),
ev3date TIMESTAMP );
CREATE UNIQUE INDEX M1C_X1 ON M1C( key );
CREATE UNIQUE INDEX M1X_X1 ON M1X( key );
The most interesting design is the following. This uses inserts instead of updates to fold in the additional data.
CREATE TABLE M2C(
key VARCHAR(20) );
CREATE TABLE M2E(
key VARCHAR(20),
evt NUMBER,
evtext VARCHAR(100),
evdate TIMESTAMP );
CREATE UNIQUE INDEX M2C_X1 ON M2C( key );
CREATE INDEX M2E_X1 ON M2E( key );