Your supply chain uses SAP, customer service uses an Oracle CRM and the sales team uses Salesforce. If you want to do analysis across those data sets, how do you bring all that data together? With Conduit.
At Blueprint Technologies, one of our core products is Conduit. Because it is still new, we find that many do not understand the basics of Conduit, nor do they accurately estimate the benefits to be gained from investing in a good data virtualization product and partner. We interviewed our Conduit expert, Bobby Huang, to help answer some of the most common questions.
What is Conduit?
Conduit is Blueprint’s lightweight data virtualization tool. Conduit allows you to unify and query data from disparate locations without having to make a copy of the data. In any large business, you’re likely to find that different lines of busines or teams use different software, formats and architecture to manage and store their data. For example, supply chain may use SAP, customer service may use an Oracle CRM and the sales team may use Salesforce. So, the question then becomes, if you want to do analysis across those data sets, how do you bring all that data together? With Conduit.
Conduit easily scales up and out to meet the compute needs of many business use cases and does it at a fraction of the cost.
What do you mean by lightweight?
The traditional way of bringing disparate data together is with a data warehouse ETL methodology, a process that ultimately limits flexibility. It’s not easy to set up one of these data warehouses, and the process generally begins with establishing stakeholder consensus on what data sets should be shared at a company level for various reporting purposes. Data architects then step in to design an architecture to store clean copies of that data in one place. As the company gathers and creates more data, they find themselves continually planning and waiting for these new data assets to be incorporated into the data warehouse.
Data virtualization, on the other hand, allows organizations to reach out and analyze data quickly rather than migrating it to the data warehouse and duplicating it. Modern data virtualization takes that a step further, adding the capability of cataloging and tagging data for easy searching as well as the ability to query it. Lightweight data virtualization, like Conduit, represents the cutting edge, allowing end users to query using their preferred analytics tool (from Power BI to Jupyter Notebooks) and incorporate intelligence caching policies that minimize impact to source databases. Best of all, organizations can leverage their own hardware to run Conduit on premises or with a hybrid cloud set up rather than buying an expensive branded product and paying a third party based on the amount of compute used.
What makes Conduit different from other data virtualization tools available on the market?
That’s where the lightweight part comes in. Most other data virtualization tools on the market charge based off the compute that you use. If you look at just the biggest one, Denodo, they haven’t really invested much money in queryability. Denodo doesn’t have the ability to incorporate a fast, scalable native query engine within their tool, so their customers are forced to use a separate query engine that charges based on how much data is run through it. This is cumbersome and unnecessary. Conduit is unique in the sense that it utilizes an optimized Spark query engine and/or GPU engines to deliver fast results and scalability. Conduit not only unifies your data and provides you a powerful query engine, but the cost of Conduit is a function of the virtual machine or on-premises machine that you rent or own already. So usage cost is not tied to the number of queries or compute that are run, which results in significant savings.
What are the chief benefits of using Conduit for data virtualization?
Conduit benefits companies in two major ways: flexibility and cost management. It gives a company the flexibility to connect to and query new data sets over time. The data environment they have today may not be the same a year or two in the future. Their marketing department may buy a bunch of data, they may merge with another company and have a bunch of data that needs integrated, they may build out a line of business that becomes successful and generates its own valuable data that needs to be connected. There are a host of reasons why data asset environments change over time, and companies can find themselves trapped in an inflexible architecture because they spent all their resources building the perfect warehouse without considering future needs. Organizations should strive to build in as much flexibility and scalability as possible, which is where data virtualization as a philosophy comes in.
The other main benefit of a lightweight data virtualization product like Conduit is on the cost management side. Large, well-known hosted brands like Teradata, Snowflake and Denodo charge based off the compute that is run. The second someone submits a query cost begin to rack up. Conduit is installed on a machine and it uses the compute capabilities of that computer. That could be a single virtual machine or a cluster of them, but what you are paying for at the end of the day is the cost of running Conduit on your hardware, and you’ll never pay more for compute than that.
What are the risks companies run by not paying attention to, and investing in, data virtualization with Conduit?
The short answer is not getting the insights that drive critical business decisions in a timely manner.
Any time their data assets change, companies are saddled with the cost of bringing new data into the existing warehouse environment. By not investing in data virtualization, organizations set themselves up for ever-growing costs as their data asset environment evolves. It really is all about cost. The cost to move, store and serve up data, from a hardware, security and human labor perspective, is high. Conduit, on the other hand, is pointed at data you can report against using the compute of the computer you already own or have rented – either on premises or in the cloud. Then, when you are done analyzing that data, its deleted out of the environment, it is not persistent.
Most data problems can be solved with enough time, money and people, but that doesn’t mean you have to. Data virtualization leverages new technology to short cut traditional ways of accessing data, and Conduit provides all of the features at a fraction of the cost.