Datamartist gives you data profiling and data transformation in one easy to use visual tool.

« | »

Finding balance in your ETL strategy

ETL bottlenecks

Organizations large and small can experience ETL (Extract Transform Load) bottlenecks in one way or another. The task is complex and significant damage to your business can occur due to inadequate ETL strategy (or lack thereof).

In general, companies approach solving ETL needs in three different ways.

  1. ETL empires

These are companies that have a developed, structured, and well-documented approach to ETL activities. They typically have implemented an end-to-end integration process using batch file transfers. Some of them are able to do a complete load, and replace or load and merge every hour.  Yet, they struggle with integrating new data sources and creating new ETL processes, because of undesired downtime, the large amounts of manual work involved, and the logistics of process management.

  1. ETL ninjas

In organizations that do not have a data warehouse, do not integrate their data at all, or use a large data lake (in which a large amount of raw flat data is stored), people tend to master some form of self-service ETL process. But these ninjas act solo most of the time. Data transformations are repeated among different ETL ninjas and hours are lost every day.

  1. Others

Companies that have inconsistent, undocumented and cumbersome ETL processes can be scared of the huge pile of ETL jobs they have on their hands. The lack of consistency and documentation creates a great deal of confusion.They frequently are not sure what data has been refreshed, when, and who has access to it.

The reason that so many companies struggle with ETL is because their approach to it is inadequate. They are either constrained by an overly structured and inflexible ETL process, or are reluctant to establish an ETL process at all, ending up with analysts doing too much self-serve data preparation – which in turn results in the numbers not adding up and time wasted on repeat data preparation activities.

Use a hybrid approach to break through ETL bottlenecks in your organization

More and more innovative data services pop up on the cloud every day – so ETL is not going to go away anytime soon. Integrating this data means that your organization achieves a well-rounded understanding of the business over time. Regardless of which of the three categories above you are in, you have the opportunity to act immediately.

So what is the hybrid approach to ETL? In fact, in most cases, it is a balance between the three categories listed above. Analysts need to have a standardized structured ETL process available while also being able to rely on self-service ETL tools like Datamartist for agile ETL and ad-hoc analysis. New data should be allowed to come in and be used until it is documented and either implemented as a part of an ETL job, or left for self-service tools, to be accessed when required.

Think of running an ETL process as operating a flight route for large aircrafts. Establishing a new flight route is a long and laborious process. Although it will take a lot of people from point A to point B, in most cases it only makes economic sense to run this flight once a day, to prevent aircrafts from flying empty. Same with data – it is inexpensive to do a complete refresh every minute, but if it isn’t used, why do it?

Also, don’t limit your people to air transportation. It doesn’t make sense to fly a large aircraft to a town that’s only 30 minutes away by car. It also doesn’t make sense to create a flight to a new destination, unless you know that it will remain in high demand for a long time. So why limit your people to a standardized ETL? Let them ‘drive’ using self-service tools! Yes, they may take different routes to the destination and drive at different speeds, but they’ll get their job done without the establishment of yet another ETL job.

Finally, don’t let all of them drive for days to the same destination. If many of your employees are spending days creating nearly identical data pipelines using self-service data preparation tools,  this repetition of work is slowing your business down and wasting their time.

Benefits of the hybrid approach

This approach can help in the following ways:

  • Self-service data preparation will reduce the number of ETL jobs that an IT department has to get through by increasing the number of people who can handle small-scale ETL jobs.
  • Self-service ETL tools reduce technical demands: time and money spent on the creation, maintenance and administration of ETL tasks.
  • Automation of routine ETL activities will allow you to maintain highly consistent, reliable, and available data sets.
  • Automation of ETL processes that handle the most frequently used data will increase your reporting and analytics capacity.

There are many BI tools out there that will promise everything: automation of highly sophisticated ETL processes; comfortable use for both IT and business users; flexible workflows; integration with all data sources, including streaming data, etc. The truth is that the hybrid approach may require the use of various ETL tools that do the job that they were designed to do. Find your own way to balance, to prevent ETL bottlenecks from killing your business.

Tagged as:


« | »

Leave a Response