30 May 2009

Do you really need an ETL tool?

Most ETL tools are quite expensive (especially those which actually work in very large DWH systems).

Many small and medium size organizations perform ETL operations via regular programming languages viz. C++ or PL/SQL.

White it may not be the most efficient and productive way of ETL processing, we must remember the cost benefit analysis.

In a project where source and target databases are both the same (say Oracle), it makes perfect sense to carry out moderate ETL processing via PL/SQL alone as long as you get acceptable performance out of it.

The requirement of a full scale ETL tool most felt in a heterogeneous environment.

The main disadvantage of using traditional programming languages for ETL work is the performance. Programming languages tend process data in loops. For small data volume it is fine. However, looping technique is not very performing for large data sets. Yes, it is possible to tailor made your procedural language code to take advantage of parallelism (tormenting in C++ but bearable in PL/SQL) but in reality it is as involving as writing an ETL tool yourself and thus not really rewarding experience (unless you can market your tool).

In ETL tools, you don’t have to do anything (except keeping your data organized across partitions etc.) to gain performance advantage. They are designed to take parallelism etc. in their execution plan.

But in procedural language, you are of your own. One way to improve performance in PL/SQL (or similar) is to write cursors less often and use set operations more often. Most databases are designed to use best parallelism available when you use set operations.

Don’t forget cost of recruiting personnel in your chosen ETL tool. Depending on market condition, experts in one ETL tool might demand more compensation than other one. You also need to budget for organizing training for your existing staffs.

Most multinational companies can easily afford the best tool sets. However, for medium sized firms, budget is always a constraint. So, they are more prone to in-house development rather than buying tool set of the shelf.

No comments:

Post a Comment