Skip to content
Blueprint Technologies - Data information specialists
  • What we do

      Technology Solutions

      Application development
      Cloud and infrastructure
      Data governance
      Data migration
      Data science and analytics
      Ethos privacy platform
      IoT enablement
      Modern data estate
      Video analytics

      Solution Accelerators

      Data Catalog
      Data Loader
      Data Sharing Portal
      Datalake Query Editor
      Ethos Privacy Program
      Lakehouse Monitor

      Supportive Services

      Privacy consulting services
      Support engineering
      Localization

      Partnerships

      Databricks Partnership

      We specialize in using the power of the Databricks Lakehouse to help our clients solve real-world business problems

      Learn more
  • Our approach
  • Our work
  • Insights
  • Careers
Connect
Blueprint Technologies - Data information specialists
Back to insights

The business case for Apache Beam

By Gary Nakanelua

You’ve just learned about a new streaming data processing technology that would solve many of the technical challenges you are experiencing within your organization today. Unfortunately, it would require significant time and budget to integrate and operationalize within your current solution.

Enter Apache Beam.

According to the main website, “Apache Beam provides an advanced unified programming model, allowing you to implement batch and streaming data processing jobs that can run on any execution engine.” It’s analogous to a general contractor; they utilize specialized subcontractors to perform the work yet you only have to interact with the general contractor. If you need a new roof on your home because a previous subcontractor did a sub-par job, you only have to work with the general contractor. They don’t have to rebuild the entire house; they simply hire a new subcontractor to put on a new roof.

Dealing with “Out of Scope”

Today’s agile sprint teams are driven by their solution backlog. This backlog is filled with bugs, feature requests and spikes written to address needs that should be delivered by the current solution. Yet, how often does a feature get requested, only to have the technical team dismiss it as “out of scope”? They note the original specification document didn’t include any mention of the need for stateful computations, event-time windowing or some other fancy set of words used to describe the technical approach to address your request. “If only you had made it part of the original requirements,” they say, “then we could have accounted for it in our architecture and approach”.

So another project team is started. One tasked to create the “v-next” version of the original solution that will include all current functionality plus the new features requested. It will be leaner, meaner and created in the latest technology so as to avoid the mistakes of the past. “It will scale with all your needs” the super motivated project team touts. Product backlogs are created. Releases are made. The world rejoices until an “out of scope” feature is requested. Then the cycle repeats itself. As a decision maker, how do you break this cycle?

Enter Apache Beam.

Beam gives you a unified, portable and extensible solution from which to answer your top level streaming architecture decisions. I’ve had the pleasure of meeting and talking with Andrew Psaltis, author of “Streaming Data: Understanding the real-time pipeline” on several occasions. In his Apache Beam presentation at QCon in 2016, he noted:

“You can switch to whatever is more performant, more scalable, maybe something that requires a smaller footprint. Whatever your requirements are, it becomes easy to switch”.

You can view his presentation in its entirety at https://www.infoq.com/presentations/apache-beam.

Encouraging The “New Hotness”

Engineers and developers love working with new frameworks, libraries and api’s. Whether it’s for performance, ease of development, speed of deployment or just intellectual curiosity, the desire to utilize < insert new technology here /> will always be a topic of conversation within technical teams.

Consider stream processing computation engines. In the last six years, we’ve seen Storm, Spark, Flink and Apex grow in popularity (to name a few). Each is/was the “new hotness” and all promise scalable, performant and fault tolerant solutions to today’s streaming data problems. In practice, they all have their pros and cons when used within a solution for any given organization. How do you enable a technical team to stay relevant, curious and motivated to experiment with the next big thing without draining your budget?

Enter Apache Beam.

Admittedly, my interest in Apache Beam grew from a conversation I had with another engineer, Ryan Harris, at a local Apache Spark meetup. I’ve spent a lot of time with Spark and wanted to see what his excitement was all about.

I ran through the Python quick start at https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python with a local runner. Next I gave it a go with Google’s Cloud Dataflow runner. Finally, I ran it using the Spark runner. Aside from a few local development environment configuration adjustments (those were my own fault), Apache Beam let me experiment with capabilities from a few different technologies quickly.

You can check out the current Apache Beam capability matrix at https://beam.apache.org/documentation/runners/capability-matrix/. Don’t see the latest technology listed? Apache Beam is open source and has well-documented SDK’s so new runners can be created. Plus, Apache Beam is a core component of Google’s Cloud Dataflow service, so look for new additions to Apache Beam all the time.

Conclusion

As a decision maker, you want the peace of mind that a technical solution can scale with future business needs and enable innovation within your organization through technology experimentation. Apache Beam is a worthwhile addition to a streaming data architecture to give you that peace of mind.

Let's build your future.

Share with your network

You may also enjoy

Article

6 Quick Wins for Databricks Cloud Cost Optimization

Today’s challenges, marked by the new normal, a looming recession, persistent inflation, layoffs, rising prices, and an ongoing supply chain crisis, have elevated cloud cost optimization to a pressing concern—with solutions like Databricks emerging as particularly effective.

Article

Finding the Competitive Advantage in Data

Navigating Data Privacy in 2023

Taking a proactive approach to compliance and future-proofing your data privacy program.
Blueprint Technologies - Data information specialists

What we do

  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data migration
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics
Menu
  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data migration
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics

Our approach

  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development
Menu
  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development

Our work

Insights

Careers

Accelerator Support

Contact us

Linkedin Youtube Twitter Facebook Instagram
© 2022 Blueprint Technologies, LLC. 2600 116th Avenue Northeast, First Floor
Bellevue, WA 98004

All rights reserved.
Media Kit

Employer Health Plan

Privacy Notice
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Menu
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Follow
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram
Menu
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram