Do your organization’s data cubes perform the way you want them to?
At Blueprint Technologies, we have a client that leverages large amounts of data for insight into its sales strategy performance. Sales data flows from multiple big-box retailers to an expansive database, which is leveraged by multiple teams across the organization. Each department maintains its own set of Power BI dashboards for reporting.
Due to the database’s size and complexity, different departments have created function-specific data cubes. The maintenance costs of these cubes have grown to such a degree that our client has had to seek external help. The inefficiency of the existing data ecosystem had significantly increased time to insight into the company’s sales strategy performance.
After reviewing the current architecture, we arrived at a single question:
Do they even need these data cubes?
We designed and conducted an experiment to determine if we could maintain, or even improve, system performance by removing the data cubes entirely. We hypothesized that it would be possible by virtualizing direct database access via Conduit, our lightweight data virtualization tool. We selected three different scenarios to test that hypothesis.
The first scenario was intended to replicate the client’s current environment in a simplified manner. Power BI was connected to the source data via a data cube in Azure Analysis Services (AAS). Data adjustments would occur within the cube.
The second scenario removed the data cube from the equation. With the idea of maximizing performance, we virtualized data directly from the source database via Conduit and imported that data, on a scheduled refresh, into Power BI. Data adjustments would occur within Power BI.
The final scenario is similar to the second, except that Direct Query was used to access the virtualized data within Power BI. Data adjustments would again occur in that platform.
Blueprint Data Engineer Alex Kuksenko conducted the experiment and explains the results of each scenario:
Scenario 1 - Data cube usage
“It took some time to prebake the data (perform the calculations and relationship mapping within the cube) — usually 1-2 minutes for 50 GB of data with simple math. Dashboard refreshes had delays, and I often had to run the AAS to see new data. The visuals loaded fairly quickly and were responsive, but the big downside is that you can’t add new measures and work with the data.”
Scenario 2 - Virtualized data
“I was impressed by how quickly Conduit picked up the new data and virtualized it. I didn’t even have time to switch tabs before the refresh pulled the new stuff. On import, the refresh of visuals and filters is under 1 second on average, though the data refresh does take up to 2 minutes.”
Scenario 3 - Virtualized data with Direct Query
“With Direct Query, the refresh speed of the data is mind-blowing, but the refresh of the visuals is over 5 seconds on average.”
Based on those results, we have concluded that it is possible to decrease overall time to insight by removing data cubes and virtualizing data access with Conduit. By doing so, the cost of generating those insights is also slashed because you remove the associated costs of data cube storage and maintenance. In our opinion, the best results came from scenario 2. We were able to remove the data cube and maximize performance by virtualizing data directly from the source database. Data refresh was quick, and the flexibility offered by this scenario enables self-serve business intelligence, removing the need for data upkeep by third parties.
One more thing
This experiment focused strictly on performance comparison. However, while immersing ourselves in the current problem experienced by the team, we learned that access control to the data was also an issue. Currently, service accounts were created on an ad-hoc basis and used within Power BI. The IT department would frequently, as part of an audit, delete service accounts, thus blocking data access to the Power BI dashboards. The constant creation and deletion of accounts further adds to the system’s inefficiency.
Using Conduit rather than the data cube allows for granular permissions to be handled via Active Directory accounts.
Try it yourself
You can run a very similar experiment in your Azure environment. Preeti Benedict, one of our amazing data engineers, had the entire sand box environment spun up in less time than it took to make a smoothie, using data from Microsoft’s free example database, Adventureworks. Power BI is available for free here. To learn more about Conduit and to schedule a demo or free thirty-day trial, visit our Conduit homepage.