Skip to content
Blueprint Technologies - Data information specialists
Main Menu
  • What we do

      Technology Solutions

      QuickStart Offerings
      Application development
      Cloud and infrastructure
      Data governance
      Data migration
      Data science and analytics
      IoT enablement
      Modern data estate
      Video analytics

      Solution Accelerators

      Lakehouse Optimizer
      Data Loader
      Datalake Query Editor
      Data Catalog
      Data Sharing Portal
      Ethos Privacy Program

      Supportive Services

      Privacy consulting services
      Support engineering
      Localization

      Centers of Excellence

      Databricks
      Large Language Models (LLMs)
  • Our approach
  • Our work
  • Insights
  • News & Events
  • Careers
Connect
Blueprint Technologies - Data information specialists
Back to insights

The business case for Apache Beam

By Gary Nakanelua

You’ve just learned about a new streaming data processing technology that would solve many of the technical challenges you are experiencing within your organization today. Unfortunately, it would require significant time and budget to integrate and operationalize within your current solution.

Enter Apache Beam.

According to the main website, “Apache Beam provides an advanced unified programming model, allowing you to implement batch and streaming data processing jobs that can run on any execution engine.” It’s analogous to a general contractor; they utilize specialized subcontractors to perform the work yet you only have to interact with the general contractor. If you need a new roof on your home because a previous subcontractor did a sub-par job, you only have to work with the general contractor. They don’t have to rebuild the entire house; they simply hire a new subcontractor to put on a new roof.

Dealing with “Out of Scope”

Today’s agile sprint teams are driven by their solution backlog. This backlog is filled with bugs, feature requests and spikes written to address needs that should be delivered by the current solution. Yet, how often does a feature get requested, only to have the technical team dismiss it as “out of scope”? They note the original specification document didn’t include any mention of the need for stateful computations, event-time windowing or some other fancy set of words used to describe the technical approach to address your request. “If only you had made it part of the original requirements,” they say, “then we could have accounted for it in our architecture and approach”.

So another project team is started. One tasked to create the “v-next” version of the original solution that will include all current functionality plus the new features requested. It will be leaner, meaner and created in the latest technology so as to avoid the mistakes of the past. “It will scale with all your needs” the super motivated project team touts. Product backlogs are created. Releases are made. The world rejoices until an “out of scope” feature is requested. Then the cycle repeats itself. As a decision maker, how do you break this cycle?

Enter Apache Beam.

Beam gives you a unified, portable and extensible solution from which to answer your top level streaming architecture decisions. I’ve had the pleasure of meeting and talking with Andrew Psaltis, author of “Streaming Data: Understanding the real-time pipeline” on several occasions. In his Apache Beam presentation at QCon in 2016, he noted:

“You can switch to whatever is more performant, more scalable, maybe something that requires a smaller footprint. Whatever your requirements are, it becomes easy to switch”.

You can view his presentation in its entirety at https://www.infoq.com/presentations/apache-beam.

Encouraging The “New Hotness”

Engineers and developers love working with new frameworks, libraries and api’s. Whether it’s for performance, ease of development, speed of deployment or just intellectual curiosity, the desire to utilize < insert new technology here /> will always be a topic of conversation within technical teams.

Consider stream processing computation engines. In the last six years, we’ve seen Storm, Spark, Flink and Apex grow in popularity (to name a few). Each is/was the “new hotness” and all promise scalable, performant and fault tolerant solutions to today’s streaming data problems. In practice, they all have their pros and cons when used within a solution for any given organization. How do you enable a technical team to stay relevant, curious and motivated to experiment with the next big thing without draining your budget?

Enter Apache Beam.

Admittedly, my interest in Apache Beam grew from a conversation I had with another engineer, Ryan Harris, at a local Apache Spark meetup. I’ve spent a lot of time with Spark and wanted to see what his excitement was all about.

I ran through the Python quick start at https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python with a local runner. Next I gave it a go with Google’s Cloud Dataflow runner. Finally, I ran it using the Spark runner. Aside from a few local development environment configuration adjustments (those were my own fault), Apache Beam let me experiment with capabilities from a few different technologies quickly.

You can check out the current Apache Beam capability matrix at https://beam.apache.org/documentation/runners/capability-matrix/. Don’t see the latest technology listed? Apache Beam is open source and has well-documented SDK’s so new runners can be created. Plus, Apache Beam is a core component of Google’s Cloud Dataflow service, so look for new additions to Apache Beam all the time.

Conclusion

As a decision maker, you want the peace of mind that a technical solution can scale with future business needs and enable innovation within your organization through technology experimentation. Apache Beam is a worthwhile addition to a streaming data architecture to give you that peace of mind.

Let's build your future.

Share with your network

You may also enjoy

Article

The home as office: Balancing health and work

The pandemic has brought many changes to how and where we work. The Blueprint Localization Team is one of many that has permanently switched to working from home.

Article

U.S. State Comprehensive Privacy Law Update 

In 2022, federal privacy legislation in the U.S. did not pass, resulting in a complex privacy landscape in 2023. Blueprint’s privacy program assessments offer a roadmap for success in navigating evolving regulations.
Blueprint Technologies - Data information specialists

What we do

  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics
  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics

Our approach

  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development
  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development

Our work

Insights

Careers

Accelerator Support

Contact us

Linkedin Youtube Facebook Instagram

© 2023 Blueprint Technologies, LLC.
2600 116th Avenue Northeast, First Floor
Bellevue, WA 98004

All rights reserved.

Media Kit

Employer Health Plan

Privacy Notice
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Follow
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram