Source data
The POINT table/file holds 100 records similar to this. Point ranges from 1000 to 5000.
The CUSTOMER table/file holds 100 records similar to shown below.
Requirement
1. Select only customers with points > 1000
2. Join with customer details on CustNo to populate customer details with points
3. Based on customer gender, create salutation field in output as Mr/Ms First name Surname
4. Create 3 feeds named Gold, Silver & Bronze based on following logic
• Gold - point > 4001
• Silver - point between 2001 and 4000
• Bronze - point between 1001 and 2000
All output feeds will have following fields:
CustNo
Salutation
Address
City PostCode
Point
Category (Gold/Silver/Bronze)
The output will look like this.
NB. There will actually be 3 output feeds - for gold, silver and bronze customers.
ETL Solution
The Clover ETL solution will be like shown in next figure.
Database Solution
The solution shown here uses Oracle database.
There will be a table named PROCESS_FLOW which will control the overall flow as explained further.
The sample contents of this table will be like this for the particular example we are discussing here.
CREATE OR REPLACE VIEW CUSTOMER_FILTER AS SELECT * FROM POINT WHERE POINT > 1000;
CREATE TABLE CUSTOMER_FILTER_TABLE AS SELECT * FROM CUSTOMER_FILTER WHERE 1=2;
INSERT INTO CUSTOMER_FILTER_TABLE SELECT * FROM CUSTOMER_FILTER;
COMMIT;
CREATE OR REPLACE VIEW CUSTOMER_JOIN AS
SELECT P.CUSTNO, C.CUSTNAME, C.ADDRESS, C.TOWN, C.POSTCODE, P.POINT, C.SEX
,CASE WHEN P.POINT > 4000 THEN 'GOLD' WHEN (P.POINT > 2000 AND P.POINT <= 4000) THEN 'SILVER' WHEN (P.POINT > 1000 AND P.POINT <= 2000) THEN 'BRONZE' END CATEGORY FROM CUSTOMER_FILTER_TABLE P JOIN CUSTOMER C ON P.CUSTNO = C.CUSTNO; INSERT INTO CUSTOMER_JOIN_TABLE SELECT CJ.CUSTNO, DECODE(SEX,'M','Mr','F','Ms','??') || ' ' || CJ.CUSTNAME , CJ.ADDRESS, CJ.TOWN || ' ' || CJ.POSTCODE , NULL,CJ.POINT, CJ.SEX,CJ.CATEGORY FROM CUSTOMER_JOIN CJ; COMMIT; SELECT * FROM CUSTOMER_JOIN_TABLE; -- POSTCODE WILL BE NULL AS WE COMBINED THAT WITH TOWN IN PREVIOUS FIELD CREATE OR REPLACE VIEW CUSTOMER_GOLD AS SELECT * FROM CUSTOMER_JOIN_TABLE WHERE CATEGORY = 'GOLD'; CREATE TABLE CUSTOMER_GOLD_TABLE AS SELECT * FROM CUSTOMER_GOLD WHERE 1=2; INSERT INTO CUSTOMER_GOLD_TABLE SELECT * FROM CUSTOMER_GOLD; COMMIT; SELECT * FROM CUSTOMER_GOLD; -- list of GOLD customers
The POINT table/file holds 100 records similar to this. Point ranges from 1000 to 5000.
CustNo | Point |
1054 | 2781 |
1055 | 3050 |
1056 | 3714 |
1057 | 3385 |
1058 | 3910 |
1059 | 3041 |
1060 | 3301 |
1061 | 4615 |
1062 | 4300 |
1063 | 5126 |
The CUSTOMER table/file holds 100 records similar to shown below.
CustNo | Name | Address | Town | Postcode | Sex |
1054 | Kirby Roman | QO8 9DS | F | ||
1055 | Brianna Kaufman | Ap # | MD01 6PY | F | |
1056 | Quon Maldonado | EJ7 3BW | F | ||
1057 | Sarah Burke | DuBois | J2 0ZW | F | |
1058 | Alice Pickett | Grambling | C6G 8YI | F | |
1059 | Odette Richard | O5 0QM | F | ||
1060 | Ursa Nunez | P30 8FZ | F | ||
1061 | Caryn Walker | Ap #969-5153 Ipsum. Rd. | GF1 2VR | F | |
1062 | Dai | Pass Christian | B3R 2JV | F | |
1063 | Claudia Calhoun | XU8 7LC | F |
Requirement
1. Select only customers with points > 1000
2. Join with customer details on CustNo to populate customer details with points
3. Based on customer gender, create salutation field in output as Mr/Ms First name Surname
4. Create 3 feeds named Gold, Silver & Bronze based on following logic
• Gold - point > 4001
• Silver - point between 2001 and 4000
• Bronze - point between 1001 and 2000
All output feeds will have following fields:
CustNo
Salutation
Address
City PostCode
Point
Category (Gold/Silver/Bronze)
The output will look like this.
CustNo | Name | Address | Town Postcode | Point | Category |
1073 | Ms Cassidy Stokes | 4120 | Gold | ||
1074 | Ms Germane Olson | 5402 | Gold | ||
1044 | Mr Emery Andrews | 6721 Quis Av. | 3158 | Silver | |
1045 | Mr Leonard Logan | 2569 | Silver | ||
1030 | Mr Stewart Sanford | Ap #246-7429 Vitae Av. | 1522 | Bronze |
NB. There will actually be 3 output feeds - for gold, silver and bronze customers.
ETL Solution
The Clover ETL solution will be like shown in next figure.
Database Solution
The solution shown here uses Oracle database.
There will be a table named PROCESS_FLOW which will control the overall flow as explained further.
The sample contents of this table will be like this for the particular example we are discussing here.
STEP | SOURCE_VIEW | TARGET_TABLE | ACTION | DESCRIPTION |
10 | CUSTOMER_FILTER | CUSTOMER_FILTER_TABLE | TRUNCATE INSERT | selecting customers with point > 1000 |
20 | CUSTOMER_JOIN | CUSTOMER_JOIN_TABLE | TRUNCATE INSERT | join above step with customer on CustNo |
30 | CUSTOMER_GOLD | CUSTOMER_GOLD_TABLE | TRUNCATE INSERT | select category = "GOLD" from above output |
CREATE OR REPLACE VIEW CUSTOMER_FILTER AS SELECT * FROM POINT WHERE POINT > 1000;
CREATE TABLE CUSTOMER_FILTER_TABLE AS SELECT * FROM CUSTOMER_FILTER WHERE 1=2;
INSERT INTO CUSTOMER_FILTER_TABLE SELECT * FROM CUSTOMER_FILTER;
COMMIT;
CREATE OR REPLACE VIEW CUSTOMER_JOIN AS
SELECT P.CUSTNO, C.CUSTNAME, C.ADDRESS, C.TOWN, C.POSTCODE, P.POINT, C.SEX
,CASE WHEN P.POINT > 4000 THEN 'GOLD' WHEN (P.POINT > 2000 AND P.POINT <= 4000) THEN 'SILVER' WHEN (P.POINT > 1000 AND P.POINT <= 2000) THEN 'BRONZE' END CATEGORY FROM CUSTOMER_FILTER_TABLE P JOIN CUSTOMER C ON P.CUSTNO = C.CUSTNO; INSERT INTO CUSTOMER_JOIN_TABLE SELECT CJ.CUSTNO, DECODE(SEX,'M','Mr','F','Ms','??') || ' ' || CJ.CUSTNAME , CJ.ADDRESS, CJ.TOWN || ' ' || CJ.POSTCODE , NULL,CJ.POINT, CJ.SEX,CJ.CATEGORY FROM CUSTOMER_JOIN CJ; COMMIT; SELECT * FROM CUSTOMER_JOIN_TABLE; -- POSTCODE WILL BE NULL AS WE COMBINED THAT WITH TOWN IN PREVIOUS FIELD CREATE OR REPLACE VIEW CUSTOMER_GOLD AS SELECT * FROM CUSTOMER_JOIN_TABLE WHERE CATEGORY = 'GOLD'; CREATE TABLE CUSTOMER_GOLD_TABLE AS SELECT * FROM CUSTOMER_GOLD WHERE 1=2; INSERT INTO CUSTOMER_GOLD_TABLE SELECT * FROM CUSTOMER_GOLD; COMMIT; SELECT * FROM CUSTOMER_GOLD; -- list of GOLD customers
So, we can see how involved database processing steps are!
Conclusion
An ETL process is much easier to maintain than DB specific process.
Difference between DB & ETL processing
Topic | DB | ETL |
Parallel processing | DB process is sequential. | ETL process is parallel. |
Phasing | DB process each step in a different phase. | Independent ETL components may be in same phase. |
I/O | DB process lands data in disk (= tables). | ETL process can run in pipeline manner, without landing data to disk thus reducing I/O. |
Intuitiveness | DB process is not intuitive. It makes maintaining them very difficult. | ETL process is a diagram. Very easy to understand and modify. |
Platform specific | DB process is DB specific. | ETL process is generic. Heterogeneous environments can implement ETL processes in similar fashion. |
Debugging | Debugging in DB is very cumbersome. | Debugging on ETL is often visual - so easy to spot problem. |
Conclusion
An ETL process is much easier to maintain than DB specific process.
No comments:
Post a Comment