Data integration is the process of consolidating data from different sources.  Data integration is often a prerequisite to other processes including analysis, reporting, and forecasting.  

 

Data integration vs application integration vs ETL

Data integration is often confused with application integration and ETL.  While they are closely related, there are important distinctions between the three terms.

Data integration is a process where data from many sources goes to a single centralized location, which is often a data warehouse. The end location needs to be flexible enough to handle lots of different kinds of data at potentially large volumes. Data integration is deal for powering analytical use cases. 

Application integration involves moving data back and forth between individual applications to keep them in sync. Typically, each individual application has a particular way it emits and accepts data, and this data moves in smaller volumes. Application integration is ideal for powering operational use cases. One example is ensuring that a customer support system has the same customer records as the accounting system.   

ETL stands for Extract, Transform, and Load.  This refers to the process of extracting data from source systems, transforming it into a different structure or format, and loading it into a destination. Data integration and application integration are two types of ETL, which is the umbrella term.  
 

Want to learn about setting the data strategy for your organization?

Signup for a free 30 day course to learn what you need in order to succeed with data. We have worked with over 500 companies of all sizes and helped them build their data infrastructure, run analytics, and make data-driven decisions. Learn how the data landscape has changed and what that means for your company.

We will never share your email

Data integration example 

Let's take the example of a company called See Food Inc (SFI). SFI's product is a mobile app where users can take pictures of different items and identify if the item in the picture is, or is not, a hot dog. SFI has a lot of different tools to run its business.  

  • Facebook ads and Google AdWords in order to acquire new users
  • Google Analytics to track events on their website and in their mobile app
  • MySQL database to store user information and image metadata (eg hot dog or not hot dog)
  • Marketo to send marketing emails and nurture leads 
  • Zendesk to perform customer support 
  • Netsuite for accounting and financial tracking

Each of those applications has a silo of information about SFI's operations.  In order to get a 360 degree view of the business, all of that data needs to be combined in one place. That process is data integration.  
 

Data integration ROI

Getting a 360 degree view sounds nice, but before undertaking any data integration project, it's important to understand what the return on investment will be.  Your use case will vary, but here's an example of the value data integration can bring.  

Let's say that SFI is considering increasing its advertising budget, but they are not sure if they should spend more on Facebook or Google. They could ask if the cost of acquisition is lower on Facebook or Google, but that misses out on whether there are differences between the kinds of users they acquire on the two different channels. Some additional questions they might want to ask are:

  • Do users from Facebook post more photos of hotdogs?
  • Do users from Google file more customer support tickets?
  • Which users are more likely to refer friends?

Each of these questions can be combined and further segmented to individual campaigns and variations on ad creative. These questions can only be answered when the data is integrated. 
 

Want to learn about setting the data strategy for your organization?

Signup for a free 30 day course to learn what you need in order to succeed with data. We have worked with over 500 companies of all sizes and helped them build their data infrastructure, run analytics, and make data-driven decisions. Learn how the data landscape has changed and what that means for your company.

We will never share your email

In-house data integration

Note - This section is for companies that are comfortable writing code and using the command line. If that's not for you, skip to the section on simple data integration with Stitch. 

If you have software engineers on your team, you may want to initiate an in-house data integration project. Software engineers who specialize in building the systems that transmit data throughout a company are often called data engineers

While you can start your data integration project from scratch, it's often helpful to leverage an open source project to save time. One option is the Singer open source ETL project.  Singer leverages reusable components for pulling from data sources (taps) and sending to destinations (targets). That means if you need one additional source that hasn't been built before, you only need to build a single tap and it will automatically work with all of the other taps and targets.  

Here's quick guide for how to pull data from Github. (You can access the code and more details here).

First, create a virtual environment and install the github tap.  

 
> virtualenv -p python3 venv
> source venv/bin/activate
> pip install tap-github

Then create a configuration file named config.json that contains your GitHub access token and the path to the repository from which you want data.  It should looks something like this: 

 
{"access_token": "your-access-token",
 "repository": "singer-io/tap-github"}

Finally, run the application

 
tap-github --config config.json

Simple data integration with Stitch

Stitch is a cloud data integration service. Stitch connects to today’s most popular business tools – including Salesforce, Facebook Ads, and more than 60 others – and automatically replicates the raw data to a data warehouse. There's no code to write, and it automatically keeps your data up to date.  

Stitch was built to solve data integration. With just a few clicks, Stitch will extract your data from wherever it lives and get it ready to be analyzed, understood, and acted upon.  

14-day free trial  |  setup in minutes  |  no ETL scripts necessary

MORE HELPFUL TOOLS FOR WORKING WITH DATA:  

TO POSTGRES  |  TO REDSHIFT  |  DATA WAREHOUSE  |  QUERY MONGO  |  CUSTOMER LIFETIME VALUE  |  SQL JOIN  |  CHURN RATE  |  A/B TEST SIGNIFICANCE  |  COHORT ANALYSIS