OData is a vastly underrated and under-utilized data access standard that solves some otherwise intractable problems. One example: You have data in an on-premises SQL Server instance, and your IT folks will only allow external access to your data center on ports 80 and 433 (HTTP and SSL, respectively). Since SQL Server doesn’t use those ports, you’re sunk—unless you bring OData into the mix.
A Standard Ahead of Its Time
A half-dozen years after George Clooney and company created the classic, Oh Brother, Where Art Thou?, Microsoft and a host of others created the OData standard (https://www.odata.org/). At least part of the motivation driving the standard was to bring REST semantics to data access. The OData standard was slow to get much traction because, quite frankly, most of us didn’t need it. Tools like Hibernate bridged the object-to-relational database divide and connecting to databases on their native ports using standard JDBC/ODBC drivers was easier, and even in the corporate world, that’s what we did.
Ah, the good/bad old days.
If you can remember the state of the world a dozen years ago, many enterprises served web pages from a web server in a network DMZ that allowed limited and highly controlled access to on-premises data stores. The cloud was a white puffy thing on the horizon or, if you live in or around Seattle, the 5,000-foot grey ceiling above your head from September through May.
As illustrated in Figure 1, the code on the server in the DMZ could connect directly to a server or database in the internal subnet. This was usually secure because of the two firewalls—hackers needed to compromise two servers to breach the network instead of just one.
But a funny thing happened on the way to “data democratization”: The cloud appeared, filled the digital sky, and changed everything.
Today’s Forecast: Cloudy with a Chance of Hacking
The advent of cloud computing has impacted many things, not the least of which is the architecture of data-centric applications. The economics of cloud computing are such that it doesn’t usually pay to put your applications (web, mobile, and others) on physical hardware in your data center, so everybody and their brothers are moving those to the cloud.
Databases, however, are a stickier widget: Whether it is general concerns over data security or regulatory necessities, many databases are still on-prem. And, as a group, network administrators lose a lot of sleep trying to protect those data assets. This, in turn, has meant that sleep-deprived and occasionally grumpy admins are not all that receptive to the idea of allowing access from “the public cloud” to the walled garden in which the corporate crown jewels are physically located.
Some company’s network admins just forbid external access altogether. This devalues the data, of course, and makes any access at all extremely expensive and often quite slow and indirect.
However, many of these same admins are now taking a lesson from days of yore when applications were in a DMZ: Allow limited and highly controlled access via HTTP and SSL. Not only does this constrain access to two well-known ports (80 and 443), but software on the server must be installed to listen to those ports. Generally speaking, that software is a web server hosting a REST API, and web servers have a very narrow range of things you can ask them to do. This is a Good Thing in the eyes of rightly paranoid network admins.
Unfortunately, none of those things involve getting us any closer to our data… unless you bring OData into the mix.
Remember when I said that OData was designed to bring REST semantics to data access? REST is an HTTP/S protocol that goes across ports 80 and 443. Although the standard permits a wide range of queries, it does not allow clients to connect to databases directly.
Enlightened network admins have upped their security game while still providing very controlled access to databases and flat files as shown in Figure 2, below.
There are several key elements to Figure 2:
- Unlike the old-school DMZ approach, there is no direct access from the public internet to anything. This is courtesy of a VPN defined in Azure; only specified members of the VPN (on local, non-routable IP addresses) can pass the firewall. As far as Boris, Dark Lord of Hexadecimal is concerned, your data center and on-premises database don’t exist. You can’t get there from here—or anywhere else outside the VPN.
- Within the data center itself, the only pinhole in the firewall is to allow HTTP/S requests to pass to an OData API. The API itself is a sort of reverse proxy server to the database and more… OData does not pass SQL or anything else except its own query commands, and those are carefully crafted and well-defined.
Assuming you have appropriate credentials (e.g. a valid OAuth2 token), you can get your data and nothing more. That is all that OData will give you.
Fortunately, that is all we want. The network configuration in Figure 2 allows code running in Azure to access on-prem databases in a highly secure manner. This approach provides a mechanism whereby you can code in the cloud and access data on-prem without risking your entire internal network. Your code resides in the VPN, grabs the data it needs via Conduit’s OData API, executes analysis or aggregations, and then returns “answers” (but generally not the original raw data) where it is needed.
The API Gotcha
The first thing that an astute reader will realize is that there are virtually no databases that provide OData APIs out of the box. In almost all cases, you will need to write the API that “fronts” the database. This is not extraordinarily difficult, and there are solid libraries for both Java and .NET that can help (references are at the end).
It is time consuming, however, and a bit frustrating to have to go to so much work just to get your data. The amount of code you must write is directly proportional to the number of tables you need to access. A system of significant complexity, say, 50 tables, can take a couple weeks to code and test by hand in Java or C#.
Enter Conduit… As the name implies, Conduit is a tool for facilitating secure database connectivity as well as providing lightweight data virtualization functions. It can do hybrid joins between two radically different database technologies, for example, which can make your report writing or in-code calculations far less burdensome.
Back to the point, one of the great functionalities within Conduit is that it auto-generates an OData API around your database and provides an endpoint for connections from your code or reporting tool.
Yes, you read that right: Conduit auto-generates OData APIs. This means you can stand up such an API in just a couple minutes regardless of the size or complexity of the database.
A Conduit Walkthrough
Just to show you how easy it is to create an OData API with Conduit, let’s look at each step of the configuration. Incidentally, the process you’ll see here is how you would set up any kind of connection from Conduit to a database; the OData API creation is automatic.
Step 1: Pick your database type.
Conduit currently supports seven different databases and multiple cloud file systems, including Azure and AWS, with many more on the road map. This means you can expose CSV or Parquet data files that Conduit will present to your code or your reporting tool just like they were databases. The data can be cached, and you can write queries and filters against it.
For our OData purposes, we want to wrap a SQL Server instance in an OData wrapper, so we will select the tile shown below. Pro-tip: Importing your file data into a database may have other advantages, of course, but for many applications exposing your file data as a queryable data source is huge time and money saver when compared to putting it into a cloud database.
Step 2: Enter basic connection data
The connector name and description fields are the most useful if they are as descriptive as possible. The connection URL is used to connect to the database from Conduit itself—not your app or reporting tool—and can accept an IP address. The port is not required because in this case we chose the SQL Server connector type, so Conduit knows we’re talking to port 1433.
Step 3: Configure the authentication mechanism
The Active Directory with impersonation option shown below means that Conduit will first make sure the entity connecting to Conduit itself has valid Active Directory credentials. These would usually belong to a service principal used for connecting with code.
Then Conduit would also connect to the database itself using the additional set of database credentials.
This is probably the most secure combination of authentication options because it effectively sets up a “double login”.
Nevertheless, the first time you try to write code to access a Conduit-generated OData API in a development environment, I highly recommend that you choose the first selection, Anonymous with Impersonation. This just means that Conduit will let your code connect to it anonymously, but it will use the database credentials you configure to connect from Conduit to the database. This lets you make sure all your database connection and data handling code functions without having to deal with the intricacies of programmatic Active Directory authentication—which is worth an article on its own. That article is, indeed, forthcoming.
Step 4: Select the tables to expose via the connector
Step 5: Specify which Active Directory Group members can access your connector.
Step 6: Optionally set caching configuration for each table
Step 7: Copy the OData endpoint URL and press Submit to save
Yes, that is all there is to it. Your SQL Server or Azure SQL Server database is now accessible via an OData API.
In our next installment we will show how to connect to an OData endpoint in both Power BI and one or more programming languages. Hint: OData is a REST standard, and REST APIs usually return JSON. JSON is easy, well-supported across many languages, and quite human readable.
If you’re a coder or a report author, part 2 is where the fun really starts.