Skip to content
Blueprint Technologies - Data information specialists
  • What we do

      Technology Solutions

      Application development
      Cloud and infrastructure
      Data governance
      Data migration
      Data science and analytics
      Ethos privacy platform
      IoT enablement
      Modern data estate
      Video analytics

      Solution Accelerators

      Data Catalog
      Data Loader
      Data Sharing Portal
      Datalake Query Editor
      Ethos Privacy Program
      Lakehouse Monitor

      Supportive Services

      Privacy consulting services
      Support engineering
      Localization

      Partnerships

      Databricks Partnership

      We specialize in using the power of the Databricks Lakehouse to help our clients solve real-world business problems

      Learn more
  • Our approach
  • Our work
  • Insights
  • Careers
Connect
Blueprint Technologies - Data information specialists
Back to insights

Databricks vs Snowflake – December 2022 Take

By Blueprint Team

Introduction

As technology advisors, we take great care to recommend best-fit solutions to our clients. We’re often asked to compare Databricks vs. Snowflake, but these two platforms were borne to serve different functions and coexisted as a great pairing to address different needs. Over time, we’ve seen more overlap in features to the extent they now often compete to be the center of gravity for your data universe.

Before we begin, you need to understand two things:

  1. Data warehouses, data lakes, and lakehouses have evolved, are built for different purposes, and have their own advantages and disadvantages. We assume you have a general understanding of this.
  2. Keep in mind your purpose in evaluating a data platform. What do you need your data to do for your business? Who are the primary data producers, consumers, and beneficiaries?

Every use case and every persona has a unique need that should be considered when making an architectural decision. To get the conversation started, we take a broad view of the platforms, which are apples-to-oranges, and you need to consider the tradeoffs important for your needs. Follow along with us as we compare and share our take on the latest.

Snowflake Data Cloud, Data Warehouse Platform Blueprint's Take

Year founded

2013  

Foundation was built in 2009 when Apache Spark was created 

2012 

Service Model

Platform as a Service (PaaS)

Software as a Service (SaaS)

Who's it for primarily?

Analysts, data scientists and data engineers

Data analysts

Snowflake is primarily for data analysts. While Databricks started off primarily for data scientists and engineers, there’s now plenty there for analysts, especially those who want to get closer to the data.

Core competency

Databricks is built on Apache Spark’s distributed computing framework, making management of infrastructure easier. Databricks is a data lake rather than a data warehouse, with emphasis more on use cases such as streaming, machine learning, and data science-based analytics. Databricks can be used to handle raw unprocessed data in large volume, and can run on AWS, Azure, and Google clouds.

Snowflake uses a SQL engine to manage information stored in the database. It processes queries against virtual warehouses, each one in its own independent cluster node. On top of that can sit cloud services for authentication, infrastructure management, queries, and access controls. Snowflake enables users to analyze and store data using Amazon S3 or Azure resources.

For those wanting a top-class data warehouse, Snowflake may be sufficient.

For those needing more robust ETL, data science, and machine learning features, Databricks is the winner. Databricks is the first and only lakehouse platform in the cloud, combining the best of data warehouses and data lakes to offer an open, unified, and seamless platform for data and AI at massive scale. If you want to future-proof your investment with advanced capabilities to accommodate future use cases, Databricks may be the way to go.

Data engineering setup

Databricks has auto-scaling of clusters but may not be as user friendly. The more advanced UI has a steeper learning curve because it is designed for a technical audience. It allows more advanced control and fine-tuning of Spark. The release of Delta Live Tables (DLT) in April 2022 simplifies ETL development and management with declarative pipeline development, automatic data testing, and detailed logging for real-time monitoring and recovery.

The Snowflake data warehouse has a user-friendly, intuitive SQL interface that makes it easy to get set up and running. It also has automation features to facilitate ease of use. For example, auto-scaling and auto-suspend help stop/start clusters during idle or peak periods, and clusters can be resized easily.

Snowflake wins on ease of setup, but Databricks was designed for more advanced users and AI/ML use cases, which require more robust ETL, data science, and machine learning features.

Data ownership

Databricks focuses primarily on the data application and data processing layers. Your data can live anywhere, even on-premises, in any format. Databricks runs on top of Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Snowflake decouples the processing and storage layers, so each can be scaled independently. You’re processing less data than you’re storing. However, Snowflake provides the storage layer (AWS or Azure through Snowflake) and does not decouple data ownership, retaining ownership of both the data processing and data storage layers.

Databricks fully decouples ownership of the data processing and storage layers. You can use Databricks to process data in any format, anywhere.

What kind of data does it store and process?

Databricks works with all data types in their original format (unstructured, semi-structured, structured).

Snowflake allows you to save and upload both semi-structured and structured files without using an ETL tool to organize the data before loading it into the EDW, then the data is transformed into Snowflake’s internal structured format. Unstructured data is currently external (AWS S3, Azure Blob Storage, etc.). Snowpark API (launched in 2022) helps with processing.

Databricks natively handles huge amounts of unstructured data. This is the “data lake” part of the Lakehouse, specifically, Delta Lake. Snowflake is playing catchup when it comes to unstructured data.

You can use Databricks as an ETL tool to add structure to unstructured data so that other tools (like Snowflake) can work with it, putting Databricks ahead on data structure.

Performance (query engine)

Databricks has shown 2-4x acceleration of SparkSQL for deployments and claims up to 60x performance improvements for specific queries.

Delta Engine (launched Jun 2020) layered on top of Delta Lake boosts performance using SQL queries.

Adjacent features like Photon (C++ execution engine) can speed up performance further for large jobs.

Source: Photon - Databricks

Query Processing Layer that consists of multiple independent compute clusters with nodes processing queries in parallel. Snowflake calls these clusters virtual warehouses. Each warehouse is packed with compute resources (CPU, memory, and temporary storage) required to perform SQL and DML (Data Manipulation Language) operations.

Source: Overview of Warehouses - Snowflake Documentation

There have been a series of blogs released by both as they battle for dominance in performance benchmarks. Today, it looks like Databricks has the cost/performance advantage.

Here's one take from ZDNet on the TPC-DS benchmark wars:

“ What the TPC and BSC results do show is that the lakehouse architecture can take these BI workloads on. This is significant because most Spark-based systems, including Databricks, had previously been best for data engineering, machine learning, and intermittent queries in the analytics realm. Getting such a system to service ongoing analytics workloads, or ad hoc analysis involving multiple queries that build on each other, was harder to come by.”

Andrew Brust, Jan. 24, 2022
Databricks' TPC-DS benchmarks fuel analytics platform wars | ZDNET

Query performance summary (for laypeople)

According to Gartner, users have run Databricks successfully on extremely challenging workloads, up to petabytes of storage in their systems.

Better at interactive queries since Snowflake optimizes storage at the time of ingestion.

Snowflake is the go-to for BI (smaller) workloads, report and dashboard production.

For big data (50 GB+) and/or intense computing, Databricks is not just faster, but scales better in both performance and cost.

Integration Platforms & Dev Tools

Fivetran
Rivery
Data Factory
Informatica Cloud
Other

Fivetran
Rivery
Data Factory
Informatica Cloud
Other

For integrations, both platforms now enjoy compatibility with most major data acquisition vendors. This wasn’t always the case. With the advent of Databricks SQL data warehouse engine, all vendors now have the necessary methods in place to integrate data into either, from nearly all sources.

For tooling, Snowflake has enjoyed a longer run and market dominance and, until recently, has claimed a wider set of data design and ETL tools. However, this gap has effectively closed. Databricks, a popular ETL and data modeling tool, supports both platforms as do a wealth of CI/CD and repositories for managing coded artifacts.

Data sharing

Delta Sharing

Delta Sharing (launched 2021): An open protocol for real-time collaboration. The product is based on an open-source project by Databricks. Organizations can easily collaborate with customers and partners on any cloud and run complex computations and workloads using both SQL, Python, R, and Scala with consistent data privacy controls.

Databricks Marketplace (launched 2022): Data providers can securely package and monetize digital assets like data tables, files, machine learning models, notebooks, and dashboards

Snowflake Marketplace: Sharing (Data marketplace and sharing platform) is one of their most powerful features. Can securely share data, without replication, in a GDPR-compliant and scalable environment.

Snowflake data sharing enables sharing of selected objects to other Snowflake accounts. Users can be granted read-only access (reader account) to query and view data, but cannot perform any of the DML tasks that are allowed in full accounts (data loading, insert, update, etc.)

Snowflake-to-Snowflake sharing is supported, but Databricks wins with Delta Sharing, the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations regardless of which computing platforms they use.

Data Science and Machine Learning capabilities

MLFlow

Spark provides the tools and environment for running ML workloads across huge, distributed data repositories

In addition to horsepower, Databricks provides mature and unified ML capability to manage the ML cycle from start to finish

MLflow, an open-source package developed at Databricks, is the most widely used program for MLOps

AutoML functionality means low-code, faster deployment of models

Only available via additional tools, such as its Snowpark API, which has Python integration (to build and optimize complex data pipelines) and third-party integrations, though they are plentiful.

Databricks is the clear winner in this category.

Since day one, the platform has always been geared towards data science use cases like recommendation engines and predictive analytics.

Databricks Snowflake Blueprint's Take
Year founded 2013

Foundation was built in 2009 when Apache Spark was created
2012
Service Model Platform as a Service (PaaS) Software as a Service (SaaS)
Who's it for primarily? Analysts, data scientists and data engineers Data analysts Snowflake is primarily for data analysts. While Databricks started off primarily for data scientists and engineers, there’s now plenty there for analysts, especially those who want to get closer to the data.
Core Competency Databricks is built on Apache Spark’s distributed computing framework, making management of infrastructure easier. Databricks is a data lake rather than a data warehouse, with emphasis more on use cases such as streaming, machine learning, and data science-based analytics. Databricks can be used to handle raw unprocessed data in large volume, and can run on AWS, Azure, and Google clouds. Snowflake uses a SQL engine to manage information stored in the database. It processes queries against virtual warehouses, each one in its own independent cluster node. On top of that can sit cloud services for authentication, infrastructure management, queries, and access controls. Snowflake enables users to analyze and store data using Amazon S3 or Azure resources. For those wanting a top-class data warehouse, Snowflake may be sufficient.

For those needing more robust ETL data science, and machine learning features, Databricks is the winner. Databricks is the first and only lakehouse platform in the cloud, combining the best of data warehouses and data lakes to offer an open, unified, and seamless platform for data and AI at massive scale. If you want to future-proof your investment with advanced capabilities to accommodate future use cases, Databricks may be the way to go.
Data engineering setup Databricks has auto-scaling of clusters but may not be as user friendly. The more advanced UI has a steeper learning curve because it is designed for a technical audience. It allows more advanced control and fine-tuning of Spark. The release of Delta Live Tables (DLT) in April 2022 simplifies ETL development and management with declarative pipeline development, automatic data testing, and detailed logging for real-time monitoring and recovery. The Snowflake data warehouse has a user-friendly, intuitive SQL interface that makes it easy to get set up and running. It also has automation features to facilitate ease of use. For example, auto-scaling and auto-suspend help stop/start clusters during idle or peak periods and clusters can be resized easily. Snowflake wins on ease of setup, but Databricks was designed for more advanced users and AI/ML use cases, which require more robust ETL, data science, and machine learning features.
Data ownership Databricks focuses primarily on the data application and data processing layers. Your data can live anywhere, even on-premises, in any format. Databricks runs on top of Amazon S3, Azure Blob Storage, and Google Cloud Storage. Snowflake decouples the processing and storage layers, so each can be scaled independently. You’re processing less data than you’re storing. However, Snowflake provides the storage layer (AWS or Azure through Snowflake) and does not decouple data ownership, retaining ownership of both the data processing and data storage layers. Databricks fully decouples ownership of the data processing and storage layers. You can use Databricks to process data in any format, anywhere.
What kind of data does it store and process? Databricks works with all data types in their original format (unstructured, semi-structured, structured). Snowflake allows you to save and upload both semi-structured and structured files without using an ETL tool to organize the data before loading it into the EDW, then the data is transformed into Snowflake’s internal structured format. Unstructured data is currently external (AWS S3, Azure Blob Storage, etc.). Snowpark API (launched in 2022) helps with processing. Databricks natively handles huge amounts of unstructured data. This is the “data lake” part of the Lakehouse, specifically, Delta Lake. Snowflake is playing catchup when it comes to unstructured data.

You can use Databricks as an ETL tool to add structure to unstructured data so that other tools (like Snowflake) can work with it, putting Databricks ahead on data structure.
Performance (query engine) Databricks has shown 2-4x acceleration of SparkSQL for deployments and claims up to 60x performance improvements for specific queries.

Delta Engine (launched Jun 2020) layered on top of Delta Lake boosts performance using SQL queries.

Adjacent features like Photon (C++ execution engine) can speed up performance further for large jobs
Query Processing Layer that consists of multiple independent compute clusters with nodes processing queries in parallel. Snowflake calls these clusters virtual warehouses. Each warehouse is packed with compute resources (CPU, memory, and temporary storage) required to perform SQL and DML (Data Manipulation Language) operations. There have been a series of blogs released by both as they battle for dominance in performance benchmarks. Today, it looks like Databricks has the cost/performance advantage.

Here's one take from ZDNet on the TPC-DS benchmark wars:

“What the TPC and BSC results do show is that the lakehouse architecture can take these BI workloads on. This is significant because most Spark-based systems, including Databricks, had previously been best for data engineering, machine learning, and intermittent queries in the analytics realm. Getting such a system to service ongoing analytics workloads, or ad hoc analysis involving multiple queries that build on each other, was harder to come by.”

Andrew Brust, Jan. 24, 2022
Query performance summary (for laypeople) According to Gartner, users have run Databricks successfully on extremely challenging workloads, up to petabytes of storage in their systems. Better at interactive queries since Snowflake optimizes storage at the time of ingestion. Snowflake is the go-to for BI (smaller) workloads, report and dashboard production.

For big data (50 GB+) and/or intense computing, Databricks is not just faster, but scales better in both performance and cost.
Integration Platforms & Dev Tools Fivetran
Rivery
Data Factory
Informatica Cloud
Other
Fivetran
Rivery
Data Factory
Informatica Cloud
Other
For integrations, both platforms now enjoy compatibility with most major data acquisition vendors. This wasn’t always the case. With the advent of Databricks SQL data warehouse engine, all vendors now have the necessary methods in place to integrate data into either, from nearly all sources.

For tooling, Snowflake has enjoyed a longer run and market dominance and, until recently, has claimed a wider set of data design and ETL tools. However, this gap has effectively closed. Databricks, a popular ETL and data modeling tool, supports both platforms as do a wealth of CI/CD and repositories for managing coded artifacts.
Data sharing Delta Sharing (launched 2021): An open protocol for real-time collaboration. The product is based on an open-source project by Databricks. Organizations can easily collaborate with customers and partners on any cloud and run complex computations and workloads using both SQL, Python, R, and Scala with consistent data privacy controls.

Databricks Marketplace (launched 2022): Data providers can securely package and monetize digital assets like data tables, files, machine learning models, notebooks, and dashboards
Snowflake Marketplace: Sharing (Data marketplace and sharing platform) is one of their most powerful features. Can securely share data, without replication, in a GDPR-compliant and scalable environment.

Snowflake data sharing enables sharing of selected objects to other Snowflake accounts. Users can be granted read-only access (reader account) to query and view data, but cannot perform any of the DML tasks that are allowed in full accounts (data loading, insert, update, etc.)
Snowflake-to-Snowflake sharing is supported, but Databricks wins with Delta Sharing, the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations regardless of which computing platforms they use.
Data Science and Machine Learning capabilities Spark provides the tools and environment for running ML workloads across huge, distributed data repositories

In addition to horsepower, Databricks provides mature and unified ML capability to manage the ML cycle from start to finish

MLflow, an open-source package developed at Databricks, is the most widely used program for MLOps

AutoML functionality means low-code, faster deployment of models
Only available via additional tools, such as its Snowpark API, which has Python integration (to build and optimize complex data pipelines) and third-party integrations, though they are plentiful. Databricks is the clear winner in this category.

Since day one, the platform has always been geared towards data science use cases like recommendation engines and predictive analytics.

Key Takeaways

Overall, Snowflake and Databricks are both good data platforms for BI and analysis purposes. Selecting the best platform for your business depends on your data strategy, usage patterns, data needs and volumes, and workloads. Snowflake is a solid choice for standard data transformation and analysis, particularly for SQL users. However, our clients have consistently chosen Databricks for its advanced capabilities in streaming, ML, AI, and data science workloads, especially because of support of raw unstructured data and Spark support for multiple languages.

As businesses advance in their data maturity and data needs, we’re more and more in favor of the Databricks Lakehouse Platform as the best choice for unifying the best of data warehouses and data lakes into one simple platform for handling all your data, analytics, and AI use cases at massive scale.

NOTE: You’ll notice that a pricing comparison is suspiciously missing here. Pricing depends on many variables related to your specific processing and storage configurations, and it should be evaluated on a total cost of ownership basis. Thus, we couldn’t adequately cover it here. Contact us if you’d like a deeper analysis and comparison.

What's next?

Blueprint has a solution for all of your Databricks needs.

Have questions or need some advice? Wherever you are in your data journey, we can be an extension of your team. Our data engineering and operations teams are best-in-class. Let’s talk.

Sources

“Databricks CTO: Making our bet on the lake house”. Tiernan Ray. The Technology Letter

“Gartner Magic Quadrant for Cloud Database Management Systems”. Henry Cook and Merve Adrian, Dec 14 2021. Gartner Reprint

“The Good and the Bad of Snowflake Data Warehouse”, Apr 26 2022. (Altexsoft.com)

“Snowflake vs Databricks vs Firebolt”. Jun 15 2022, Robert Meyer. (Firebolt.io)

“Snowflake vs. Databricks: A Practical Comparison”. Upsolver.

“What is Databricks? Components, Pricing, and Reviews”. Eran Levy, Oct 14, 2022. Upsolver.

“Deep Dive: Databricks vs Snowflake”. Francis Odum, Sept 15 2022. (Contrary.com)

“Databricks vs Snowflake: A Side By Side Comparison”. March 15 2022. (Macrometa.com)

“Snowflake Co-Founder Reveals His Multi-Billion Dollar Secrets”. Gabrielle Olya, Dec 20 2018. (Finance.yahoo.com)

“Complicated rivalry between Snowflake and Databricks spotlights key trends in enterprise computing”. Mark Albertson, Aug 08 2022. (Siliconangle.com)

“What Does Databricks Do and Why Should Investors Care?”. Sep 6 2021. Nanalyze.

“Databricks’ TPC-DS Benchmarks Fuel Analytics Platform Wars”. (ZDNet.com)

“Comparison of Data Lake Table Formats (Apache Iceberg, Apache Hudi and Data Lake)”. (Dremio.com)

“Snowflake Data Governance — Data Discovery, Security & Access Policies”. (Atlan.com)

Introduction to Unstructured Data Support — Snowflake Documentation

“Snowflake Launches Unstructured Data Support in Public Preview”. Saurin Shah and Scott Teal. Snowflake.

Let's build your future.

Share with your network

You may also enjoy

Article

6 Quick Wins for Databricks Cloud Cost Optimization

Today’s challenges, marked by the new normal, a looming recession, persistent inflation, layoffs, rising prices, and an ongoing supply chain crisis, have elevated cloud cost optimization to a pressing concern—with solutions like Databricks emerging as particularly effective.

Article

Finding the Competitive Advantage in Data

Navigating Data Privacy in 2023

Taking a proactive approach to compliance and future-proofing your data privacy program.
Blueprint Technologies - Data information specialists

What we do

  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data migration
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics
Menu
  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data migration
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics

Our approach

  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development
Menu
  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development

Our work

Insights

Careers

Accelerator Support

Contact us

Linkedin Youtube Twitter Facebook Instagram
© 2022 Blueprint Technologies, LLC. 2600 116th Avenue Northeast, First Floor
Bellevue, WA 98004

All rights reserved.
Media Kit

Employer Health Plan

Privacy Notice
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Menu
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Follow
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram
Menu
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram