Skip to content
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Menu
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Back to insights

Gotta keep ’em separated: A primer on why storage and compute belong apart

By Blueprint Team

Whether or not to combine cloud storage and compute is an argument approaching the intensity of longstanding debates like Mac vs. PC or leasing vs. buying a car. These are two radically different approaches, but an argument can be made either way.

At Blueprint we’re not going to weigh in on the Mac vs. PC or leasing or buying a car questions – that’s an argument for another day. But we tend to err on the side of separating storage and compute – and with good reason. When it comes to separating storage and compute functions, not only is that a fundamental tenet of cloud computing, it’s also more affordable and ensures flexibility and future adaptability as technologies mature and change.

While the idea behind combining cloud storage and compute is to simplify things for data managers while maintaining flexibility, by doing this you actually lose flexibility in working with different data sets and adopting new emerging compute engines, and you end up feeding more data into the compute engine, which is the most expensive part of operating in a cloud environment.

You can facilitate affordability and flexibility without compromising simplicity.

Affordability

Cloud data storage is just storage and it should be thought of that way. It is inexpensive, fast, supports all data types and can be supported by all cloud services, data-ingestion tools and apps. Cloud storage also keeps data in its native state as your data, meaning you can take it wherever you want in the future.

We suggest keeping storage simple, cheap and distinct from compute by parking it in an Azure Data Lake. Following this suggestion allows you to use any compute engine — we often recommend Databricks — and only pay for compute resources on the data sets you want and only when you are running analytics. When you park all your data in a warehouse that also runs your compute, it results in paying a steeper price because your compute is run on all your data, rather than spinning up data from an inexpensive storage location to run compute when you need it and for only as long as you need it.

It is simple – the more you reduce compute – the more you reduce cost.

No company or organization should pay for resources they don’t need – we are no longer in the age of monolithic platforms and the massive hardware spend required to run data analysis. By leveraging the power of the Data Lake and coupling it with a compute engine like Databricks, you only pay for the services you need when you need them.

Flexibility

Companies ingest, own and buy an immense amount of data. It may or may not have a purpose or use yet and that is OK. If a company doesn’t have an immediate use for its data, cloud storage in a data lake tiers your data to the cheapest possible level, only re-tiering it when you decide it is useful and needed for business intelligence.

Separating storage and compute and using the data lake for your storage allows you to better manage your team’s experience, your data and your usage. For example, multiple compute resources can leverage the same data in the data lake. By storing your data in this way, users can interact with it differently. One person can be working on machine learning with Spark while another runs reports on the same data set using a high-speed Power BI connector, for example.

By creating a modern data estate that utilizes the data lake, what may historically have been disparate data sources for an organization that get copied and moved around for different queries can now all be viewed and queried holistically and simultaneously using the numerous tools and connectors available through tools like Databricks. Not only is this a more affordable business model, but with Databricks you can now eliminate the wasted time and energy associated with moving data between different platforms that perform different tasks. Speed is your friend when it comes to extracting insights from data – don’t waste time over-processing data if it’s not needed.

With Databricks Delta Lake, for example, you have one complete compute platform overtop your data from which you can perform BI-type queries, data engineering workloads with SQL or Python and data science with any of the common frameworks right where your data are.

Taking it one step further, streaming data analytics represents the next frontier in unlocking insights from data. Embrace it! By having cloud storage and compute separate, cloud storage can collect data from streaming services and Databricks can process it – very easily and without running compute on your whole database at the same time. Leaders can start integrating this into their data estates now to be more agile when more streaming data becomes available to you, such as IoT devices and web, mobile and customer experience platforms.

Because Databricks is so feature rich with respect to data-engineering, data science and support for all business intelligence tools, you should be asking yourself “Why am I paying more, losing time making things more complicated and moving data to yet another data store? Shouldn’t I learn what Databricks can do with the data I already have in my data lake?”

At Blueprint, we love to talk data. If you’re interested in learning more about how you can decrease costs while increasing the productivity of your data-driven insights, let’s start a conversation.

Let's build your future.

Contact us

Share with your network

Share on twitter
Share on facebook
Share on linkedin
Share on email

You may also enjoy

Localization - How to localize your searches for an international market

How to apply smart localization approaches to online searches

While online searches are intuitive to every web user, complexities around language and cultural references make non-English searches more of a challenge.
Let's talk Conduit: Part 2

Let’s talk Conduit: Part 2

Looking for a data virtualization tool that’s light on your wallet? Conduit is the perfect solution for businesses looking to connect disparate data sources without incurring growing costs.

What we do

  • Cloud and infrastructure
  • Data migration
  • Modern data estate
  • Modern workplace
  • Data science and analytics
  • Application development
  • IoT enablement
  • Video analytics
  • Support engineering
  • Localization
Menu
  • Cloud and infrastructure
  • Data migration
  • Modern data estate
  • Modern workplace
  • Data science and analytics
  • Application development
  • IoT enablement
  • Video analytics
  • Support engineering
  • Localization

Our approach

  • Business strategy
  • Facilitated innovation
  • Managed services
  • Product development
  • Project definition workshop
  • Proof of concept
  • Solution development
Menu
  • Business strategy
  • Facilitated innovation
  • Managed services
  • Product development
  • Project definition workshop
  • Proof of concept
  • Solution development

Our work

Insights

Careers

Contact us

Nash Video Analytics
Linkedin
Youtube
Twitter
Facebook
Instagram

© 2021 Blueprint Technologies, LLC.
505 106th Avenue Northeast, Third Floor 
Bellevue, WA 98004

All rights reserved.

Privacy Policy

  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Menu
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Follow
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram
Menu
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram