Author Gary Nakanelua

The Unspoken Cost of Building Data Pipelines

December 11 2018 | Analytics, Conduit

Leveraging data pipelines is an integral part of any analytics initiative.

Many organizations opt to build their data pipelines in-house, taking advantage of open source software and efforts by an internal development team. However, what starts with the best of intentions, often turns into an extended project with increasing delivery times and cost. We launched Conduit to help companies focus on the core organizational outcomes of their data analytics initiatives rather than getting constrained with access control and data connectivity issues in their pipelines.

Real Story. Real Cost.

I had the pleasure of working with one of the largest news organizations in the United States on one of their data analytics initiatives. As part of that initiative, they were evaluating a very popular data visualization product that promised actionable, real-time insight into customer usage of the portfolio of mobile applications they had. After several demos, a decision was made to test out the product with data from App Annie, Adobe Analytics (formerly Omniture) and a custom application usage tracking system.

Armed with marketing collateral and some demo videos, the project began. Scope was limited to allow the team to focus, and they estimated the work could get done in a two-week sprint, but the journey proved to be filled with roadblocks.

Access

The first roadblock was getting access to all the relevant data. In the case of the custom app tracking system, there was data that the database admin wasn’t comfortable giving access to, such as the sales data associated with in-app purchases. Since the data visualization product required “all or nothing” access to a data store, it was decided that an intermediate data store would be setup and some scripting work would be done to duplicate approved data into that intermediate store.

Due to the operational environment, various access requests and approvals were required to setup the intermediate data store. Because scripting work was required, the project team did a significant level of testing to ensure the source data was left untouched.

Time & Cost Breakdown

Connection

The second roadblock was getting connected to all of the data. Data from Adobe Analytics and the custom app tracking data, stored in MS SQL, was directly supported by the visualization product. However, App Annie was not supported. The rep at the visualization product vendor mentioned there was no intent of adding App Annie support in the near future. However, the rep suggested a few different approaches they had seen customers take when faced with an unsupported data source.

The team settled on an approach where custom code would be used to pull relevant data via the App Annie API then store it in a MS SQL database. The visualization product would then pull data directly from that MS SQL database.

Time & Cost Breakdown

Latency

The third roadblock revealed itself once the data was connected and visualizations were built in the product. The news organization often curated certain content based upon popularity. This required real-time insight into audience interactions with content delivered via their mobile applications. However, the team learned that the visualization product only supported real-time data refresh for a small set of data connection types. All other connection types were limited to hourly refresh rates. This would have a huge negative impact on the organization’s ability to target relevant content in a timely manner to its audience and significantly impact advertising revenue.

The team experimented with a number of different approaches but none would prove successful at enabling real-time access to their data. After seven weeks of hard work, the decision was made to pass on the visualization product and build an internal tool instead.

Time & Cost Breakdown

The Unspoken Cost of Building the Data Pipeline

The project started with a clear objective and an experienced team. Over the course of 7 weeks, the organization spent nearly $12,000 in an attempt to build a small data pipeline for the purpose of evaluating a data visualization product. This dollar figure doesn’t include the financial impact on other projects due to the work taking 3x as long as originally estimated. 

Consider if the evaluation were successful and the data visualization product was accepted by the business. From a pure pricing perspective, the product would have represented a significant investment on the part of the business. Typically, with such a large purchase, the use of the product becomes a mandate for various groups within an enterprise. This will involve a large variety of data sources across an even larger number of internal use cases. Assuming we use the spend that went into the initial evaluation data pipeline as a baseline for each use case, we can perform a real quick back-of-napkin estimation on the cost of adopting the product across the organization by multiplying the number of use cases by $12,000. This estimation does not include the cost of maintaining the data pipelines over time. 

We built Conduit to help companies easily setup secure and real-time data pipelines. In the above scenario, Conduit would have enabled quick access to data that the developer needed and give the database admin control over what data was exposed. Conduit would have reduced the work from weeks down to a couple of hours, saving nearly $10,000 per data source. With the real-time data access that Conduit provides, the business would have been able to complete the visualization evaluation and continue the analytics initiative rather than having to place the entire project on hold. 

If this scenario sounds familiar, try Conduit free for 30 days and avoid the unspoken cost of building data pipelines.

 

Ready to start your next big thing?

Contact Us