Mastering data migration from Cloudera to Databricks

In collaboration with a data-driven real estate investment firm, Blueprint joined its customer on a journey to streamline their data operations. With a focus on migrating from Cloudera to Databricks within a tight timeframe, the Blueprint team executed a seamless transition while addressing critical challenges surrounding data silos, and heavy storage costs. Through meticulous planning, a robust strategy, and hands-on mentorship, Blueprint not only facilitated a successful migration but also empowered its customer with enhanced data governance, automation capabilities, and insightful analytics.

Client Snapshot

Who:

A real estate investment firm

Industry:

Real Estate Investment and Development

Stakeholders:

Head of Engineering and Director of Engineering

Work Summary

What we did:

  • Migrated more than 100TB of 20 companies’ complex data from Cloudera to Databricks on AWS
  • Addressed critical Geographic Information System (GIS) requirements
  • Implemented a robust CI/CD pipeline
  • Finalized a security catalog, facilitated an onboarding process, and ensured compatibility with external application dependencies
  • Implemented the Lakehouse Optimizer powered by Blueprint to enable cost transparency and offer insights into what they can improve within their Databricks platform

Client background

Our customer is a real estate and investment development company that focuses on the untapped pockets of real estate in the United States. With over $18B in assets under management, they focus on creating innovative solutions that are driven by technology to provide the best service for their investors. Combining data, technology, and analytics allows them to make time sensitive decisions that can positively impact their customer’s investment. The technology they use also allows them to see the value in properties, supporting better selection and risk management strategies.

The Blueprint way

The customer’s goal was to migrate from Cloudera to Databricks in approximately 3 months to save on large costs that they were incurring. The Blueprint team stepped in to help make that transition smooth and to help ensure that no data was lost in the process. The result was a full transition to Databricks within the allotted time frame. Within the 3-month window, the Blueprint team trained and mentored the customer’s engineers to help retain success once the migration was complete.

The challenge

Their biggest challenge was not being able to see all the information they needed, in the right place, at the right time. Most of their data was siloed, and this caused inefficiencies with the business that pulled valuable time away from their data teams. They were facing heavy costs for data storage on Cloudera and were charged an additional licensing fee annually. They also had some constraints with their on-prem server that were not ideal. 

The solution

The Blueprint team developed a strategy where the names of the main data pipelines matched those used in the workstream approach.  This ensured consistency across the organization, as these key data pipelines were grouped together to form Data Products that were easily recognizable and understandable by the business stakeholders. The team stepped in to learn, build on, and enhance their orchestration tool, allowing them to deploy code in an automated manner. Blueprint used a ‘move-and-improve’ technique to expedite migration and retain critical business logic. This included preserving a complex Java application that calculated tax impacts for real estate investors. Not only were we successfully able to migrate our customers’ data to Databricks, but we also segmented areas for security and privacy reasons, allowing them better control of their data governance procedures. Blueprint shared a daily tracker guide with leadership at each stand-up to demonstrate progress and communicate how the development and testing phases were progressing to the customer.

Impact

Ready to begin your data intelligence transformation?

Get started with us today.

Share with your network

You may also enjoy

With a focus on migrating from Cloudera to Databricks within a tight timeframe, the Blueprint team executed a seamless transition while addressing critical challenges surrounding data silos, and heavy storage costs.