Fantastic Fireworks Data Warehousing Schema Design

1 Executive Summary

Fantastic Fireworks is a successful enterprise organization which continues to evolve through smart management and strategic vision. As a modern enterprise in the current business world, it requires the necessary strategy information to help make business decisions and compete commercially with other companies. After a general understanding of Fantastic Fireworks, a few conclusions can be drawn:

The management of data could be difficult. Fantastic Fireworks uses individual customer table for each store rather than a unified customer table. Also, it uses two separate systems, the inventory system, and sales system. These conditions make data integration and aggregation more difficult, and to some extent, will have a negative impact on the operation of the company.
It is necessary to store historical data. Data like the old addresses of customers would still be valuable for analytic use even if they are not up-to-date information. This data is crucial to the statistics of the best-selling products and the company’s most profitable customers. As a result, the data housing solution must provide a solution that can store historical data.
A large number of transactions are processed at Fantastic Fireworks every day. In order to improve the efficiency of daily operating and save costs, Fantastic Fireworks needs the system to provide a more convenient way to query data.

By adopting the data warehousing solution, these problems would be solved correspondingly:

As a single repository of organization data, the data warehouse has natural advantages for data integration and aggregation. The same type of data stored in different stores and systems will be converted into a uniform format to increase data consolidation.
A data warehouse has many ways of handling historical information. By introducing version numbers for the same customer, the system will be able to more easily track that customer’s address.
Data warehouse makes data available for users and easy to query. Like normal databases, its data could be queried. However, data warehouses are more intuitive and friendly to users without IT support. It allows users to run queries and get results online.

2 Data Warehousing Overview

Data warehouses are systems which are used to help organizations do decision-making and acquire strategic information. A data warehouse is a collection of strategies that provide all types of data support for the decision-making process of the enterprise. Different from normal databases, which focus on day to day operations, data warehouses focus more on helping users with revenue analysis and market segmentation. To have a better understanding of data warehousing, a few concepts need to be introduced.

Data mart: data mart is a subset of data warehouse, which represents a single business process. Different from the enterprise-width depth of data warehouse, a data mart is often controlled by a single department. In general, databases are used to record transactions, which means they are friendly to updating but not reading. In this case, data marts are necessary to be implemented for the convenience of users querying.

OLAP & OLTP: data processing can be divided into two broad categories: online analytical processing (OLAP) and online transaction processing (OLTP). OLTP mainly focuses on basic and daily transaction like bank transactions. The main application of OLTP is the use of databases. In an OLTP system, the memory efficiency of a database is the key, because there is a large number of transactions is processed per second.

OLAP is the primary application of data warehouse system. OLAP supports complex analytical operations, focuses on decision support, and provides intuitive query results. Data warehouses, the main application of OLAP, are targeted for decision support. Historical, summarized and consolidated data is more important than detailed, individual records (Chaudhuri, S., & Dayal, U. 1997). With data warehouse and OLAP, the decision makers and senior managers can make better decisions based on strategic information that can be relied on.

Dimensional modeling (star): dimensional modeling correlated with Bottom Up approach, one of the approaches to design a data warehouse. It is a design technique to identify and implement business processes. In dimensional modeling, business processes are measured and described from different dimensions. The most core business transaction will be placed in a table called fact table. A row in a fact table corresponds to a measurement. A measurement is a row in a fact table. All the measurements in a fact table must be at the same grain. (Kimball, R., & Ross, M. 2011)

There are other tables, called dimension tables, used to help describe the business event from different dimensions. There are many attributes contained in a dimension table, which are used to describe the rows in the table. Dimension tables are the entry points into the fact table. Robust dimension attributes deliver robust analytic slicing and dicing capabilities. The dimensions implement the user interface to the data warehouse. (Kimball, R., & Ross, M. 2011)

In general, data warehousing is the foundation of successful business intelligence. It provides a single repository for analytical use and decision making. By using data warehouses, managers will no longer make business decisions with limited data and their intuition. Besides, data warehouses allow users to query data without IT supports, which saves more time and money. The historical data in the data warehouse enables the enterprise to predict and evaluate the future through the analysis of different periods.

3 Dimensional Model

This part will firstly state the methodology and architecture that used to design the data warehouse. Then it will demonstrate and justify the process and consideration in data warehouse design.

3.1 Designing Methodology and Architecture

In this project, the Bottom Up approach is adopted to design the data warehouse. The requirements of Fantastic Fireworks contains the business processes as well as the required source data samples that are used in building the data warehouse schema model. Then the design is started with a single business process, analyzing its measurement and granularity, defining its assist dimensions. This analyzing process is iterated with all other business processes. Finally, these business processes are joint together by connecting the conformed dimensions to generate the whole view of the data warehouse model.

Considering the choice of the Bottom Up design methodology, the data warehouse was conducted on the basis of the Data Mart Bus architecture which is highly compatible with Kimball’s approach. With regard to this architecture, we choose one of the business subjects from the real business environment as the supermart, then we gradually combine other data marts by conforming the common dimensions.

3.2 Star Schema Design

A star schema comprises one or more centric fact tables and each fact table links its relevant dimension tables. The star metaphor treats the fact table as the “center” of the star and the dimension table as the “point” (Moody & Kortink, 2003). More specifically, the fact table corresponds to a certain business process, while the dimension tables are responsible for providing the analyzing aspects, considered as the granularity. The dimensional modeling consists of four steps, including the finding the business process; declare the granularity; defining the dimension tables and defining the fact tables. Figure 1 shows the star schema of the Fantastic Firework data warehouse. The design of this star schema will be discussed and justified with these four design steps.

Figure 1 The Star Schema of Fantastic Firework Data Warehouse

3.2.1 Business Processes

According to the business process of the Fantastic Fireworks company, there are two main business subjects in its business domain, including the sales and inventory. Besides, the data sources of the data warehouse are stored in the sales system and inventory system respectively, which should take into consideration when unifying the data format in the conformed dimensions.

3.2.2 Granularity

Granularity addresses the lowest level of detailed information stored in the data warehouse. It is used to slice the information cube to satisfy the different information demands of end users. Consequently, different fact tables may have different granularities.

Regarding the Fantastic Firework project, the sales and inventory are two main business processes(fact tables) of the data warehouse. Referring to the business requirements, the sales answers the question about which consumer has purchased how many products in which store and was served by which employee within a given time. Considering the inventory business event, it answers the question about which product has been ordered how many at a specific point of time. Therefore, the consumer, product, store, employee, and date are the identified granularity of the sales process. The product and date is the required granularity of the inventory process.

3.2.3 Dimension Tables

As we mentioned above, the dimension table is used to provide the fact tables with multiple analyzing aspects to support multiple detailed levels of information queries. Consequently, we can extract dimension tables from the granularities required by the related business processes. Figure 2 demonstrates the dimensions of each business processes and the common dimension between these two processes.

Figure 2 Bus Matrix of Fantastic Firework Data Warehouse

It is clear that the sales has five analyzing dimensions including consumer, product, store, employee as well as date, while the inventory just comprises product and date dimensions. It is worth mentioning that these two business processes have two common product dimensions.

When designing the data structure of each dimension table, it is important to define the appropriate metadata to integrate the source data and only load the data that is needed. The following paragraphs discuss the detailed design and consideration of each dimension tables with referring to Figure 1. The indexes that are used to support fast query in each dimensional table will be discussed in detail in Appendix 2.

Consumer Dimension Table

Attributes: consumerID, validUnitlDate, name, dob, age, postcode
Index: primary(consumerID, validUntilDate), columnstore(age)

The consumer might change the address(postcode), so it is a slowly changing dimension. The validUntilDate attribute in the composite primary key can be used to distinguish customers’ valid postcode in different periods and store all the history records. Besides, the current four stores have the four independent consumer tables, which can be integrated into a single customer table in the data warehouse. Apart from the existed attributed in the original system, the age field is added to the consumer dimension table as the summarized information of the customer that can assist in grouping customer by age.

Product Dimension Table

Attributes: partNumber, validFromDate, description, unitPrice
Index: primary(partNumber, validFromDate)

The product dimension is the conformed dimension between the sales process and inventory process. It integrates the product table in the inventory system together with the product price list table in the sales system. Considering the unit price of the product might vary sometimes, the validFromDate field is used to distinguish the current price with the historical prices. Besides, the natural key including the attributes of partNumber and the validFromDate act as the composite primary key in the product dimension table.

Store Dimension Table

Attributes: storeID, description, address
Index: primary(storeID)

In the store dimension table, we directly use the natural key of the original database as the primary key and each row in the table identifies a specific store.

Employee Dimension Table

Attributes: employeeID, name, commissionRate
Index: primary(employeeID)

We directly use the natural key in our employee dimension table to distinguish employees. Each row in this table stands for a sales person.

Date Dimension Table

Attributes: dateID, Day, Week, Month, Quarter, Year
Index: primary(dateID), column(Day/Week/Month/Quarter/Year), clustered(Day, Week, Month, Quarter, Year)

The date dimension table is the basic table in almost all data warehouses. The dateID is the surrogate key in the data warehouse. We use the rest attributes to identify a time point to customizable analyzing facts of sales and inventory processes.

3.2.4 Fact Tables

As we mentioned above, the fact table corresponds to the relevant business processes. Therefore, this data warehouse consists of two fact tables including the sales fact table and the inventory fact table. Each fact table comprises the business measurements which used to provide the valuable summarized information that can support decision making. The fact table also contains the foreign keys that link with the dimension tables to do information slicing and selection.

Sales Fact Table

Attributes: saleID, Employee_employeeID, Consumer_consumerID, Consumer_validUntilDate, Product_partNumber, Product_validFromDate, Date_dataID, Store_storeID, unitSales, dollarSales, margin, commission
Index: primary(saleID), columnstore(margin, commission, unitSales), foreign key indexes

The sales fact table integrates the sales table and sales item table in the original source database. The saleID is used as the surrogate key of sales fact table rather than the oderID and line number in the original system. The sales fact table contains the additive measurements of unitSales, dollarSales, margin and commission which can be accumulated among all linked dimensions. In addition, there are five foreign keys that link to the five dimension tables respectively to do information filter and selection around the specific customer, employee, store, product and purchase time in sale records.

Inventory Fact Table

Attributes: inventoryID, Date_dateID, Product_partNumber, Product_validFromDate, productDescription, quantity, unitCost
Index: primary(inventoryID), foreign key indexes

The inventory fact table uses the inventoryID as the surrogate key to identify the inventory record uniquely. It has two foreign keys to connect with the product dimension table as well as date dimension table so as to assist the analysis of the inventory orders with a given product at a given time.

4 Key Decisions

To begin with, there are some key attributes in the following sections that may cause misunderstanding. In order to provide a detailed explanation, some Data Dictionary Tables are given in Appendix 1.

4.1 Key Customers

In order to identify the key customers and their age groups in a given time period, the data warehouse should provide information about the margin, unitSales, dollarSales, dateID (date dimension), customerID (customer dimension), age and postcode. In this case, the data that need to be concerned most are margin, unitSales, dollarSales measurements (in Sales Table) and all attributes in the Customer table. Some other corresponding attributes are in date dimension. Additionally, the summarized information including the dollarSales and margin of each sales record are calculated at the data load stage. The following steps demonstrate how to locate the key customers.

Select all the sales records and the required information by joint the sales, customer and date table, in the given time period (e.g. All records in last year)
Group the selected records which share the same consumerID
Calculate the total margin, unitSales, dollarSales in each customer group
Sort these group with one of the three measurements (margin, unitSales, dollarSallers) in descending order with other required information
The top x sort results in the rank list can be used as the top x profitable customers. These top x customers can be grouped with the different age grades

With the results of key customers, it would be possible for the Fantastic Fireworks to make business decisions. The age attribute in Customer table can be used to analyze the age distribution of the top profitable consumer. With the margin ranking, Fantastic Firework can find out the customers who bring the most profit. Also, to choose the city of new stores, some centralized user clusters must be determined. With the help of the postcode (in Customer table) and margin (in Sales table), it is easy to identify the potential cities that has the high-margin customers. These cities would be more suitable for new stores.

4.2 Most Profitable Product

Those most profitable products in a time period can be elicited by comparing the accumulated margin, unitSales, dollarSales of different products in the given time granularity (monthly, quarterly, yearly). In this case, partNumber (product dimension), dateID (date dimension), margin, unitSales and dollarSales attributes (in Sales table) are most concerned.

The same calculating method as before, the margin attribute is derived by subtracting Cost (unitCost*unitSales) from dollarSales. This calculation is accomplished at the data load stage.

Select all the sales records within the given time period
Group the selected sales records that have the same productNumber
Calculate the total margin, unitSales, dollarSales in each product group
Sort these groups with one of the three measurement (margin, unitSales, dollarSales) in descending order together with the rest measurements, the productID and given time period
The most profitable product is at the top of the rank lit in the given time period

With the sorting list of profitable products, Fantastic Fireworks could make business decisions, like putting those highly profitable products in prominent places in the store in the particular month of the year.

4.3 Most Profitable Store

Based on the storeID (store dimension), dateID (date dimension), unitSales, dollarSales and margin attributes (in Sales table), the most profitable store in the given time period could be easily derived.

Select all the sales records between the given time period (monthly in the previous year)
Group the selected sales records which share the same storeID
Calculate the total margin, total unitSales, total dollarSales in each group
Sort the group with calculated margin (or unitSales or dollarSales) in descending order together with the rest summarized measurements, the given time period and storeID
The most profitable store in each month of the previous year is on the front of the sorting list

Some business decisions could be made with the help of these results. it would be wise to expand the floor space of those stores that own more sale records, earn more profits or have more dollar sales.

4.4 Most Profitable Time Period

The most profitable time period can be measured by adding the margin of all sales records with which the date associated with the dateID is located within the given time period. To identify the most profitable time period, the dateID (date dimension) and the margin in the sales table should be concerned.

Grouping the sales records with a given time granularity(weekday, monthly, quarterly or yearly)
Calculate the total margin in each group
Sorting the calculated margin and the corresponding time granularity in descending order, and the most profitable time should list in the front of the result.

With the help of busy time period like seasons, weekdays and monthly, Fantastic Fireworks is able to optimize store opening time more reasonable. Also, according to the current stocking strategy, most of the reordering decisions are made automatically. In order to prevent the sudden shortage of products, the manager of the inventory system could adjust the reordering decisions with the consideration of the busy time period.

4.5 Key Employees

The key employees could be measured by the commission or by the number of served sales records in a given time period. To identify the key employees in a specific time period, the commission (in Sales table), dateID (date dimension) and employeeID (employee dimension) should be taken into consideration. The commission in each record is calculated by dollarSales (in Sales table) multiply the corresponding commissionRate (in Employee table) according to the employeeID.

Select all the sales records from sales fact table within the given time period
Group the selected sales records which share the same employeeID
Calculate the total commission in each group as the total commission of that employee in the given time period
Sort the calculated commission with the corresponding employee information with descend order. The key employees would be seen in the front rank.

The approach to get the key employee who owns the most sales records can be elicited similarly by count the number of records in each group.

With these two different measurements, Fantastic Fireworks is able to evaluate its employees in different ways. Further strategies of staff management would be adjusted based on that.

Appendix 1: Data Dictionary

Product Table

Figure 3 Product Table

Inventory Table

Figure 4 Inventory Table

Sales Table

Figure 5 Sales Table

Customer Table

Figure 6 Customer Table

Employee Table

Figure 7 Employee Table

Store Table

Figure 8 Store Table

Date Table

Figure 9 Date Table

Appendix 2: SQL Statements

SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';

-- -----------------------------------------------------
-- Schema Fireworks
-- -----------------------------------------------------

-- -----------------------------------------------------
-- Schema Fireworks
-- -----------------------------------------------------
CREATE SCHEMA IF NOT EXISTS `Fireworks` DEFAULT CHARACTER SET utf8 ;
USE `Fireworks` ;

-- -----------------------------------------------------
-- Table `Fireworks`.`Store`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Fireworks`.`Store` (
  `storeID` INT NOT NULL,
  `description` VARCHAR(45) NULL,
  `address` VARCHAR(45) NULL,
  PRIMARY KEY (`storeID`))
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `Fireworks`.`Product`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Fireworks`.`Product` (
  `partNumber` VARCHAR(45) NOT NULL,
  `validFromDate` DATETIME NOT NULL,
  `description` VARCHAR(45) NULL,
  `unitPrice` DOUBLE NULL,
  PRIMARY KEY (`partNumber`, `validFromDate`))
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `Fireworks`.`Customer`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Fireworks`.`Customer` (
  `customerID` VARCHAR(45) NOT NULL,
  `validUntilDate` DATETIME NOT NULL,
  `name` VARCHAR(45) NULL,
  `dob` DATETIME NULL,
  `age` INT NULL,
  `postcode` VARCHAR(45) NULL,
  PRIMARY KEY (`customerID`, `validUntilDate`),
  INDEX `age` (`age` ASC) VISIBLE)
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `Fireworks`.`Employee`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Fireworks`.`Employee` (
  `employeeID` VARCHAR(45) NOT NULL,
  `name` VARCHAR(45) NULL,
  `commissionRate` DOUBLE NULL,
  PRIMARY KEY (`employeeID`))
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `Fireworks`.`Date`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Fireworks`.`Date` (
  `dateID` INT NOT NULL,
  `Day` INT NULL,
  `Week` INT NULL,
  `Month` INT NULL,
  `Quarter` INT NULL,
  `Year` INT NULL,
  PRIMARY KEY (`dateID`),
  INDEX `Day` (`Day` ASC) VISIBLE,
  INDEX `Week` (`Week` ASC) VISIBLE,
  INDEX `Month` (`Month` ASC) VISIBLE,
  INDEX `Quarter` (`Quarter` ASC) VISIBLE,
  INDEX `Year` (`Year` ASC) VISIBLE,
  INDEX `Date` (`Day` ASC, `Week` ASC, `Month` ASC, `Quarter` ASC, `Year` ASC) VISIBLE)
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `Fireworks`.`Sales`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Fireworks`.`Sales` (
  `saleID` INT NOT NULL,
  `Employee_employeeID` VARCHAR(45) NOT NULL,
  `Date_dateID` INT NOT NULL,
  `Store_storeID` INT NOT NULL,
  `Product_partNumber` VARCHAR(45) NOT NULL,
  `Product_validFromDate` DATETIME NOT NULL,
  `Customer_customerID` VARCHAR(45) NOT NULL,
  `Customer_validUntilDate` DATETIME NOT NULL,
  `unitSales` INT NULL,
  `salePrice` DOUBLE NULL,
  `dollarSales` DOUBLE NULL,
  `margin` DOUBLE NULL,
  `commission` DOUBLE NULL,
  PRIMARY KEY (`saleID`),
  INDEX `fk_Sales_Employee_idx` (`Employee_employeeID` ASC) VISIBLE,
  INDEX `fk_Sales_Date1_idx` (`Date_dateID` ASC) VISIBLE,
  INDEX `fk_SalesItem_Store1_idx` (`Store_storeID` ASC) VISIBLE,
  INDEX `margin` (`margin` ASC) VISIBLE,
  INDEX `commission` (`commission` ASC) VISIBLE,
  INDEX `fk_Sales_Product1_idx` (`Product_partNumber` ASC, `Product_validFromDate` ASC) VISIBLE,
  INDEX `fk_Sales_Customer1_idx` (`Customer_customerID` ASC, `Customer_validUntilDate` ASC) VISIBLE,
  INDEX `unitSales` (`unitSales` ASC) VISIBLE,
  CONSTRAINT `fk_Sales_Employee`
    FOREIGN KEY (`Employee_employeeID`)
    REFERENCES `Fireworks`.`Employee` (`employeeID`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_Sales_Date1`
    FOREIGN KEY (`Date_dateID`)
    REFERENCES `Fireworks`.`Date` (`dateID`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_SalesItem_Store1`
    FOREIGN KEY (`Store_storeID`)
    REFERENCES `Fireworks`.`Store` (`storeID`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_Sales_Product1`
    FOREIGN KEY (`Product_partNumber` , `Product_validFromDate`)
    REFERENCES `Fireworks`.`Product` (`partNumber` , `validFromDate`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_Sales_Customer1`
    FOREIGN KEY (`Customer_customerID` , `Customer_validUntilDate`)
    REFERENCES `Fireworks`.`Customer` (`customerID` , `validUntilDate`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;


-- -----------------------------------------------------
-- Table `Fireworks`.`Inventory`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `Fireworks`.`Inventory` (
  `inventoryID` INT NOT NULL,
  `Date_dateID` INT NOT NULL,
  `Product_partNumber` VARCHAR(45) NOT NULL,
  `Product_validFromDate` DATETIME NOT NULL,
  `productDescription` VARCHAR(45) NULL,
  `quantity` INT NULL,
  `unitCost` DOUBLE NULL,
  PRIMARY KEY (`inventoryID`),
  INDEX `fk_Inventory_Date1_idx` (`Date_dateID` ASC) VISIBLE,
  INDEX `fk_Inventory_Product1_idx` (`Product_partNumber` ASC, `Product_validFromDate` ASC) VISIBLE,
  CONSTRAINT `fk_Inventory_Date1`
    FOREIGN KEY (`Date_dateID`)
    REFERENCES `Fireworks`.`Date` (`dateID`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_Inventory_Product1`
    FOREIGN KEY (`Product_partNumber` , `Product_validFromDate`)
    REFERENCES `Fireworks`.`Product` (`partNumber` , `validFromDate`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;


SET SQL_MODE=@OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;

References

[1] Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM Sigmod record, 26(1), 65-74.

[2] Kimball, R., & Ross, M. (2011). The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.

[3] Moody, D. L., & Kortink, M. A. R. (2003). From ER models to dimensional models: bridging the gap between OLTP and OLAP design, Part I. Business Intelligence Journal, 8, 7-24.