Light Speed your Spark Deployments and Best Practices

When

November 15, 2022

4:00 pm PST

Where

Blueprint

2600 116th Ave NE

Bellevue,

 WA

 98004

Featured Speakers

Karthikeyan Ramasamy - Head of Streaming at Databricks

Karthikeyan Ramasamy

Head of Streaming

Praveen Gattu - Engineering Leader at Databricks

Praveen Gattu

Engineering Leader

Nan Zhu - Platform Engineer at Safegraph

Nan Zhu

Platform Engineer

SafeGraph Logo

Lively discussions + industry-leading presentations + delicious food + networking

Featured TOPICS

Karthik Ramasamy and Praveen Gattu

Project #Lightspeed - Next generation Apache Spark Structured Streaming

Streaming data is a critical area of computing today. Streaming processes data as it moves from source to destination in real time and facilitates quick insights. To meet the stream processing needs, Structured Streaming was introduced in Apache Spark™ 2.0. Spark Structured Streaming has experienced over 150% YOY growth and is widely adopted across thousands of organizations, processing more than 1 PB of compressed data per day on the Databricks platform alone. As adoption accelerated and the diversity of applications moving into streaming increased, new requirements emerged. Project Lightspeed is a new initiative that will take Spark Structured Streaming to the next generation. In this talk, we will give an overview of the proposed few features, performance and functionality in Project Lightspeed.

Nan Zhu

Avoid Burning Money with Spark!

SafeGraph is a geospatial data company providing comprehensive and accurate information on tens of millions of global places and how people interact with these locations. We build our data processing stack on top of Spark to transform and generate the massive size of datasets.

Together with the rapidly growing user demands and complexity of our data processing pipelines/algorithms, our Spark computing cost increases with an undesired pace until we take action on it. In this talk, Nan will share his experience to build, operate and optimize the Spark infrastructure saving hundreds of thousands of dollars per year. Specifically, Nan will share the building of observability stack to support easy detection of money burners, optimization of resource provisioning and examples of business logic optimization in Spark applications.

Hosted at Blueprint