Head of Streaming
Lively discussions + industry-leading presentations + delicious food + networking
Karthik Ramasamy and Praveen Gattu
Project #Lightspeed - Next generation Apache Spark Structured Streaming
Streaming data is a critical area of computing today. Streaming processes data as it moves from source to destination in real time and facilitates quick insights. To meet the stream processing needs, Structured Streaming was introduced in Apache Spark™ 2.0. Spark Structured Streaming has experienced over 150% YOY growth and is widely adopted across thousands of organizations, processing more than 1 PB of compressed data per day on the Databricks platform alone. As adoption accelerated and the diversity of applications moving into streaming increased, new requirements emerged. Project Lightspeed is a new initiative that will take Spark Structured Streaming to the next generation. In this talk, we will give an overview of the proposed few features, performance and functionality in Project Lightspeed.
Avoid Burning Money with Spark!
SafeGraph is a geospatial data company providing comprehensive and accurate information on tens of millions of global places and how people interact with these locations. We build our data processing stack on top of Spark to transform and generate the massive size of datasets.
Together with the rapidly growing user demands and complexity of our data processing pipelines/algorithms, our Spark computing cost increases with an undesired pace until we take action on it. In this talk, Nan will share his experience to build, operate and optimize the Spark infrastructure saving hundreds of thousands of dollars per year. Specifically, Nan will share the building of observability stack to support easy detection of money burners, optimization of resource provisioning and examples of business logic optimization in Spark applications.
Hosted at Blueprint