Knowledge Centre > Data Migration > What is ETL?
What is ETL?
ETL is a commonly used acronym that describes the three main stages of a data migration; Extract, Transform and Load.
Extract – This phase is when the data is collected from the data source or system which currently holds the business data that needs migrating. In more complex data migrations there may be several data sources involved. The data collected is then consolidated into a single repository. Data engineers may choose to use hand-coded data extraction with scripts and queries however this is error-prone and very time intensive.
Transform – In this phase the data extracted is changes to match the formats required in the target system. For example a source CRM system may use Full Name for a contact whilst the target CRM uses First Name and Surname. The two fields will need to be merged before the next phase. Transformation is the most important phase of ETL. It is at this point that you have the opportunity to ensure that the data is fit for purpose and error free.
Load – The final part of the process in ETL. All data that has been extracted and transformed is now ready to transferred in to the source system. There are different approaches that can be implemented at this point, such as loading the data all at once (Big Bang) or in intervals (Phased).
Common ETL use cases
With data volumes growing, alongside the availability and value that the data provides, the reasons for ETL are wide-ranging.
Legacy system replacements
Digital transformation initiatives will bring new software solutions into the business which require legacy data to maximise their value. Legacy data in older systems will need transforming to fit these new systems. If data quality issues are not addressed before the migration issues will develop down the line.
Business Intelligence
Businesses have more data than ever. There is huge value and potential to gain real insight form this data to make intelligent business decisions. The key is being able to access and analyse the data. Low-code platforms make it easy to access multiple data sources, merge, correct and enrich the data, then present them in user screens and reports.
Master Data Management
Enterprise data will be located in many systems, each with its own purpose. With your data in silos it’s difficult to know which contains the most up to date information. The solution is to use Low-Code platforms, such as PhixFlow, to create a master data set with golden records.
Why ETL isn’t enough.
In its simplest form ETL does the job, but it can be better. In the three step process there is no validation or reconciliation to ensure that you have all of the data from the extraction phase and after the load phase. If not checked businesses run the risk of having missing or incomplete data. This may manifest itself later down the line when dealing with customer issues leading to a decrease in customer satisfaction.
Many standard ETL tools lack the ability to report on the data, instead relying on separate tools. This involves connecting to an external service, or having to make connections to your data again, running the risk of missing key information.
Why use Low-code as an alternative to ETL?
Whilst many wont immediately think of Low-Code as a solution to their data migration issues, they do offer a better alternative.
Firstly, Low-Code platforms make it easy for you to connect to your data source, such as a data warehouse, data lake or business platform. This is achieved utilising simple to configure, drag-and-drop connectors. All of your source systems can be connected using this method allowing data quality checks and analytics to be performed.
With source system data connected and checked for errors, the data can be pulled into a master dataset. It is recommended that this is done in small chunks at a time. Any data ingested to this data set will then be reconciled to ensure that the correct data has been collected and is fit for purpose. This allows for easy identification of any changes that need to be made to the business rules for extracting all data. Changes can be made iteratively until all reconciliation checks are complete.
With the expected data held in a separate dataset, it’s time for the transformation of the data. Low-Code platforms, like PhixFlow, allows for the iterative approach to be used. For example, with a CRM migration, you may wish to concentrate on the contact details first. Create the business rules for this data and perform a test run with reconciliation to ensure that all data has been transformed correctly and matched the requirements for the new CRM. This process is then repeated for all other subsets of the data until all data has been accounted for.