Ross Lambert
Author Ross Lambert

OData, Where Art Thou?: The Sequel

July 25 2019 | Big Data, Product, Technology, Conduit

Executive Summary (AKA TL;DR)

In the first installment we looked at how the OData protocol solves some real-world security and network access problems in several key hybrid cloud scenarios. We also walked through setting up a connector in Blueprint Technologies’ Conduit product, the point being that it auto-generates an OData API, saving tons of development time and money in the process while fitting nicely into the security story of a hybrid cloud.

In today’s installment, we will show what you can do with the OData data source Blueprint’s Conduit gives you in both Microsoft’s Power BI and in your own code.

An OData API: Now that we’ve got one, what do we do with it?

If you remember from our first installment, we are solving for a specific use case, namely a Conduit instance running in a hybrid network where the application code is in the cloud and the highly Secret Squirrel data is in the data center. Thanks to an Azure VPN, there is zero access from the public internet, and our Conduit instance provides the OData connection to the database.

We’ll reprint Figure 1 from last time as a quick refresher as to what is where: 

Figure 1 – Azure VPN member connection to OData API

First Things First: The Data 

As much fun as Microsoft’s Northwind database has been over the years, I decided to take a walk on the wild side and look up some new and more interesting data sources. I like to be interested in my data when playing around in Power BI, and Northwind just wasn’t doing it for me anymore.

So lookey here what I found:  First, this awesome article about publicly available datasets…

https://www.freecodecamp.org/news/https-medium-freecodecamp-org-best-free-open-data-sources-anyone-can-use-a65b514b0f2d/

…which led me to this bit of data:

https://datacatalog.worldbank.org/dataset/quarterly-external-debt-statistics-sdds

…which for those of you not clicking along, is a quarterly accounting of the national debt for all the nations of the world since 1998 (although not all go back that far). The World Bank has many more datasets, too, but I chose that one since it is easy to understand, but still rather fun.

And, as it happens, there is an interesting finding or two therein, too.

OData Pro-Tip: For your own sanity, obey these two tidbits of advice and your OData life will be significantly less frustrating:
  1. Avoid: Column names with spaces, and
  2. Avoid: Column names that don’t start with a letter

This is not an article about SQL Server (Azure or otherwise), so suffice it to say that I merely created a table in my Azure SQL instance that we can use to hold the data from the CSV. I also added an index for the most likely filter column we’d want (Country Name), like so:

Listing 1 – DebtStats Schema

CREATE TABLE [dbo].[DebtStats]  ( 
   [Country_Name]     varchar(80) NOT NULL,
   [Country_Code]     varchar(10) NOT NULL,
   [Indicator_Name]   varchar(P200) NOT NULL,
   [Indicator_Code]   varchar(100) NOT NULL,
   [P1998Q1]          money NULL,
   [P1998Q2]          money NULL,
   [P1998Q3]          money NULL,
   [P1998Q4]          money NULL,
   [P1999Q1]          money NULL,
   [P1999Q2]          money NULL,
   [P1999Q3]          money NULL,
   [P1999Q4]          money NULL,
   [P2000Q1]          money NULL,
   [P2000Q2]          money NULL,
   [P2000Q3]          money NULL,
   [P2000Q4]          money NULL,
   [P2001Q1]          money NULL,
   [P2001Q2]          money NULL,
   [P2001Q3]          money NULL,
   [P2001Q4]          money NULL,
   [P2002Q1]          money NULL,
   [P2002Q2]          money NULL,
   [P2002Q3]          money NULL,
   [P2002Q4]          money NULL,
   [P2003Q1]          money NULL,
   [P2003Q2]          money NULL,
   [P2003Q3]          money NULL,
   [P2003Q4]          money NULL,
   [P2004Q1]          money NULL,
   [P2004Q2]          money NULL,
   [P2004Q3]          money NULL,
   [P2004Q4]          money NULL,
   [P2005Q1]          money NULL,
   [P2005Q2]          money NULL,
   [P2005Q3]          money NULL,
   [P2005Q4]          money NULL,
   [P2006Q1]          money NULL,
   [P2006Q2]          money NULL,
   [P2006Q3]          money NULL,
   [P2006Q4]          money NULL,
   [P2007Q1]          money NULL,
   [P2007Q2]          money NULL,
   [P2007Q3]          money NULL,
   [P2007Q4]          money NULL,
   [P2008Q1]          money NULL,
   [P2008Q2]          money NULL,
   [P2008Q3]          money NULL,
   [P2008Q4]          money NULL,
   [P2009Q1]          money NULL,
   [P2009Q2]          money NULL,
   [P2009Q3]          money NULL,
   [P2009Q4]          money NULL,
   [P2010Q1]          money NULL,
   [P2010Q2]          money NULL,
   [P2010Q3]          money NULL,
   [P2010Q4]          money NULL,
   [P2011Q1]          money NULL,
   [P2011Q2]          money NULL,
   [P2011Q3]          money NULL,
   [P2011Q4]          money NULL,
   [P2012Q1]          money NULL,
   [P2012Q2]          money NULL,
   [P2012Q3]          money NULL,
   [P2012Q4]          money NULL,
   [P2013Q1]          money NULL,
   [P2013Q2]          money NULL,
   [P2013Q3]          money NULL,
   [P2013Q4]          money NULL,
   [P2014Q1]          money NULL,
   [P2014Q2]          money NULL,
   [P2014Q3]          money NULL,
   [P2014Q4]          money NULL,
   [P2015Q1]          money NULL,
   [P2015Q2]          money NULL,
   [P2015Q3]          money NULL,
   [P2015Q4]          money NULL,
   [P2016Q1]          money NULL,
   [P2016Q2]          money NULL,
   [P2016Q3]          money NULL,
   [P2016Q4]          money NULL,
   [P2017Q1]          money NULL,
   [P2017Q2]          money NULL,
   [P2017Q3]          money NULL,
   [P2017Q4]          money NULL,
   [P2018Q1]          money NULL,
   [P2018Q2]          money NULL,
   [P2018Q3]          money NULL,
   [P2018Q4]          money NULL,
   [P2019Q1]          money NULL,
   [Id]               bigint IDENTITY(1,1) NOT NULL 
   )
GO
CREATE NONCLUSTERED INDEX [idx_debtstats_name]
   ON [dbo].[DebtStats]([Country_Name])
GO

There are lots of ways to import the CSV that the World Bank provides, but SQL Server’s BULK INSERT will do the trick nicely. Here’s a video that describes the ins and outs of that:

https://www.youtube.com/watch?v=FPd7EMZ0Fzw 

Powering Up Power BI

IMPORTANT NOTE: If you’re a hard-core coder who thinks Courier (or maybe Consolas) is the One True Font and Power BI is for Weenies, there’s a good chance you’re ready to bail out on this article right now. Rather than doing that, just skip to the Show Me the Code section. It will be far more likely to ring your chimes.

For the rest of us… With the creation and population of our new DebtStats table out of the way, we can now we reopen our Conduit connector, select the Publish section, and make sure to add DebtStats:

Figure 2 – Adding the DebtStats Database to our Existing Connector

Be sure and navigate to the Endpoints tab and copy the OData Endpoint:

Figure 3 – Copying the OData Endpoint URL

Warning: Don’t forget to also select the Submit button… you’ll lose your changes if you don’t.

Now let’s open Power BI and get to work. I’ll trust you can select New and open the data sources selector (Get Data) to speed things along. From there, select the OData Feed option:

Figure 4 – Selecting Other -> OData Feed from Get Data in Power

Next, we paste the URL we copied out of Conduit into the next screen in Power BI as the OData feed URL:

Figure 5 – Adding the OData feed URL

And not quite last and definitely not least, we set the authentication method:

Figure 6 – Selecting the Authentication Method

Astute readers may be gasping in horror because I allowed Anonymous access in the screenshot above. Do not try this at home, kids.

Seriously, the different kinds of authentication and authorization available through Conduit, Azure Active Directory and your database merit a full and lengthy discussion—and I promise that will be my next series. For the sake of brevity (relatively speaking, since this is a long article already), we’re just going to let anybody connect and read data. This is, after all, a publicly available data set, so no real harm done here. But as I said previously, don’t do this with your data.

Figure 7 – Selecting the DebtStats table for Power BI

Figure 8 – Power BI Doing its Initial Load 

The World Bank national debt database has approximately 270,000 rows. Depending on the speed of your internet connection, it will likely take Power BI a little over five minutes to load all the data from the Azure SQL instance. You’ll see something like this while you wait:


Once the initial data load completes, you’ll be dumped back into a blank Power BI report canvas.

Power BI Pro-Tip: There is often a temptation when starting a new Power BI report to start grabbling grids and charts, but here is a ProTip: If at all possible, start with a “slicer”. The reason is that all of your other grids and charts will react to your changes faster with smaller data sets. Use the slicer to pick a very small set at the outset and everything else you do will go faster thereafter. 

The purpose of our little report is simple: We’re just proving that OData data sources behave pretty much like any other data source. Beyond that we want to compare the debt of a few countries in both actual and relative terms. We will accomplish the former by placing a grid and picking the Q4 data for four different years spaced five years apart (2003, 2008, 2013 and 2018).

The relative spending in actual dollars is easily seen with a grouped bar chart; each country is a group. The fascinating thing is that there are actually at least a couple European countries, the UK and Belgium, with less debt in 2018 than in 2008 for a few major debt types.

The Power BI report itself consists of two slicers across the top. The slicer on the left is the country-picker, whereas the single select slicer on the right is the debt-type picker. Both slicers control what is shown in the raw data grid and the grouped bar chart, both shown in Figure 9.

Note that if a selected country does not appear at all, that is because it has no data for the type of debt chosen.

Figure 9 – The Power BI Global Debt Report

The Power BI *.pbix file for the report above can be downloaded here.

Show Me the Code

I know that all the “real” developers are not having a good day until they see some code, so let me cheer you up now and we will look at ways to read our OData quickly and easily from Python. 

There is a pre-requisite, however: We need to install the requests package (https://realpython.com/python-requests/)

The quick way is to jump out to your terminal/command line and type: 

> pip install requests

Or perhaps:

> pip3 install requests

If you recall the first article in this series, you’ll realize that OData is a RESTful API protocol, so it makes perfect sense that we would need the requests library

We are also going to use the built-in json library for Python, which also makes sense in context: OData returns JSON results. There is an OData library for Python, pyodata, but it is not yet Python 3-compatible, not to mention that OData is straightforward enough that wrapping it in another library isn’t particularly valuable to me.

But then again, I make jokes in hexadecimal (hey, $D00D, where’s the $BEEF?). I know, it is sad.

Nevertheless, let’s walk through the code and I think you’ll agree it is all very straightforward. My comments are interspersed within the Python listing. Even if you’re not a Python person, the principles are the same and the code is not far different in Java or even C#.

Listing 2 – azure-odata-worldbank-example.py

# -------------------------------------------
#  Azure OData API Sample for Python
#  by Ross Lambert, Principal Architect
#  Blueprint Technologies (http://bpcs.com)
# -------------------------------------------
# Purpose: 1. Illustrate how to connect to and iterate data returned
#             from an OData endpoint
#          2. Shows usage of a Conduit-generated OData endpoint
import requests
import json

The SERVICE_URL below was copied from the Endpoint tab in Conduit…

# This URL is not guaranteed to work past August 31st, 2019,
# but you can use the other sample file, azure-odata-northwind-example.py,
# indefinitely.
SERVICE_URL = "https://demo-conduit.westus.cloudapp.azure.com/odata/sample_odata_wrapper/odata.svc/"
# Many APIs require an API key, but the example above does not
# However, if it is needed as a header value, you can put it
# here. NOTE: You will probably need to change both the key's
# name as well as the value to match what the service you are
# calling expects.
headers = {
    "cache-control": "no-cache",
    "x-api-key": "YOUR_API_KEY",
    "OData-Version": "4.0",
    "OData-MaxVersion": "4.0"
}

As the comment below suggests, a GET request to the root of the SERVICE_URL of an OData endpoint returns a list of all the tables that can be fetched from that endpoint

# a call to the root returns a list of the tables exposed
# by this endpoint as well as the schema of each table
response = requests.request("GET", SERVICE_URL, headers=headers)
# json.loads puts the JSON response into a Dictionary
dict = json.loads(response.text)
tables = dict['value'];
print("--- list of tables ---")
for table in tables:
    print(table['name'])
print()

The $metadata “command” added to the root service URL returns the schema for each table in XML. This can be useful if you’re exploring an OData source “blind”, but that is fairly unusual. Nevertheless, it is invaluable if you really need it.

# a call to $metadata returns a list of the tables exposed
# by this endpoint as well as the schema of each table
response = requests.request("GET", SERVICE_URL + "$metadata", headers=headers)
# IMPORTANT: $metadata returns XML, not JSON
# json.loads puts the JSON response into a Dictionary
print(response.text)

Now we will build some typical queries, but we will start simple. In this case, we’re just getting the top 10 rows from the table. As it happens with our World Bank data, they will all belong to Afghanistan.

# adding a table name to the URL requests rows from
# that table. $top limits the number of rows
table = "dbo___DebtStats"
topNum = str(10)
url = SERVICE_URL + table + "?$top=" + topNum
response = requests.request("GET", url, headers=headers)
# the response is a composite dictionary that contains the context in
# one element and the list of rows in the value element
dict = json.loads(response.text)
rows = dict['value'];
print("--- top " + topNum + " country entries ---")
print
for row in rows:
    print(row['Country_Name'], row['Indicator_Code'], row['Indicator_Name'])
print()
    

Next we will stretch our legs with a more complex OData command. First, since we don’t really want all the columns, we will use the $select command to restrict the columns to just those of interest. That is easy enough to construct; the general form looks like this:

$select=column1,column2,column

For our purposes now, we are only interested in one type of debt with a code of “DT.AMT.DECT.CD.CB.24P.US”. As it happens, that code refers to a debt type with this name: “Ext. Debt Service Pmt, Deposit-Taking Corp., exc. CB, More than 2yrs, All instruments, Principal, USD”. Translated into layman’s terms, it is the amount of money paid towards the principal of debt instruments greater than two years in term length. 

It was interesting to me which countries showed up with tons of this kind of debt. I will refrain from editorializing more than I already have. This is not a political science or economics essay, but I’d sure be willing to crunch some numbers if someone wants to pick up that mantle. Let’s just say the USA keeps strange economic company when it comes to this kind of debt.

Moving on… in the code, we built up a couple strings, one for the $select and one for the $filter. We glued them altogether with the table name at the end of the SERVICE_URL. I’m a fan of Python 3’s .format statement, in case you’ve not seen it before. It is handy and can help you write much more flexible string generation code with fewer errors.

 # SQL to translate to OData:
# SELECT Country_Name, Indicator_Code, ds.[2018Q4] FROM [dbo].[DebtStats] ds
# where ds.Indicator_Code = 'DT.AMT.DECT.CD.CB.24P.US' and ds.[2018Q4] > 0
# add a filter
select = "$select=Country_Name,Indicator_Code,P2018Q4"
debtType = "DT.AMT.DECT.CD.CB.24P.US"
filterCol = 'Indicator_Code'
valueCol = "P2018Q4"
filter = "$filter={0} eq '{1}' and ({2} gt 0)".format(filterCol, debtType, valueCol)
url = SERVICE_URL + table + "?" + select + "&" + filter
response = requests.request("GET", url, headers=headers)
dict = json.loads(response.text)
rows = dict['value'];
print("--- ' Ext. Assets in Debt Instruments, General Government, Long-term, Currency and deposits, USD' ---")
print()
for row in rows:
    # print value for end of 2018 for each country
    print(row['Country_Name'], row['Indicator_Code'], row['P2018Q4'])


Here are the results of the last query, with values rounded to whole USD and formatted as USD. Note that this is just a single type of debt, not total national debt (not by a longshot!).

Argentina

1,978,329,854

Belarus

1,188,161,153

Croatia

1,252,080,716

Czech Republic

5,534,616,322

Egypt, Arab Rep.

2,817,020,000

Georgia

1,750,816,006

Kazakhstan

3,542,349,336

Kyrgyz Republic

92,604,420

Lithuania

685,855

Moldova

76,491,448

North Macedonia

312,455,962

Romania

1,470,706,700

Russian Federation

43,608,125,332

South Africa

941,000,000

Thailand

9,060,427,178

United States

459,503,000,000 


The full Python listing as well as another publicly available OData data source demo is available via the links in the Resources section at the end.

A Modest Conclusion

OData and Conduit are a power couple. You’ll notice we did not have to write an API at all—Conduit just offered that up for us, no muss, no fuss. Dealing with the OData API isn’t trivial, but it still very straightforward. And OData itself is mature standard (v 4.01) with lots of documentation (cf. Resources below). 

As we’ve seen, OData APIs are first-class citizens in most reporting tools nowadays—and especially Power BI—and they’re not hard to code against, either. Best of all, if you have Conduit fronting your databases, you don’t have to write a single line of OData API code.

Remember: The best code is the code you don’t have to write. That frees you up to do more interesting and valuable work.


Resources

Power BI

Example World Bank Data Report

Python Sample Code

World Bank Sample Debt Data

OData.org Northwind OData Service

Ready to start your next big thing?

Contact Us