I used to be a research scientist studying insect behavior. I’ve long since transitioned into the software industry, but I never lost my interest in experimenting. Fortunately for me, here at Blueprint we constantly experiment internally. Our successes and challenges help keep us at the cutting edge of technology and the software industry. This in turn helps us deliver the best products and solutions to our clients.
Lately I’ve been running some experiments on how to apply data science at the Edge to problems in the oil and gas industry. For those unfamiliar with the term, Edge computing refers to placing computing and computational resources at the outer edge of any given network—as opposed to centralizing the bulk of the computing resources at servers in data centers. This intersection of data science and the oil and gas industry is a particularly interesting problem space. At the well site, unreliable network connectivity often impedes valuable data from being collected and sent back to a company’s data center (or the Cloud), and power constraints put limits on the compute infrastructure that can be utilized for on-site analysis.
I’ve been especially interested in the problem of geosteering. Geosteering is the process of steering a drill during directional well drilling to follow a target stratum (layer of rock) to maximize collection of valuable hydrocarbons. The problem is complicated; essentially an operator is steering a drill thousands of feet under the earth while trying to stay within a stratum which might angle up or down at any given point, and the operator cannot directly detect how that layer is changing. Fortunately, sensors at the drill head (and elsewhere) can generate data which can hint at the drill’s location relative to the target stratum. Successful geosteering depends on using this sensor data to accurately determine the drill head location and doing this quickly enough to steer the drill in real-time. This real-time requirement, combined with the unreliable network constraint, creates an ideal situation for Edge computing.
As with any experiment, the first step is to explore what we know already. In this case we have a reasonable amount of publicly available data courtesy of the University of Texas “University Lands” project. This data consists of actual operations data from companies drilling on lands owned by the UT system. This data gives us an idea of the kinds of sensors actually used during drilling, and the quality of data produced by those sensors. As an example, I’ve taken the data from a well used for exploration. Those interested can find the data here. (Login as guest)
My initial exploration focused on the gamma radiation logs from drilling. Trace amounts of gamma radiation can be observed from naturally occurring radioisotopes in the Earth. Different layers of the Earth have higher or lower amounts of gamma radiation due to their different mineral makeup. This can be a useful tool during drilling; for instance, the intensity and energy of gamma radiation observed can hint at the type of rock, and if continuously measured during drilling, changes in gamma radiation can indicate that the drill has moved from one stratum to another. Gamma radiation data also tends to be commonly collected during drilling, which makes it a potentially useful, and practical candidate for a geosteering algorithm.
While looking at the plot of gamma radiation vs depth from the well, I initially thought the measurements had an intense Gaussian noise component (due to rapid and large fluctuations) and I tried to eliminate this with a moving average filter. However, upon further analysis, I found two pieces of evidence that suggested that this “noise” was real signal: First, if the fluctuations I observed were due to Gaussian noise, they would have had correlated errors, which should be impossible for random Gaussian noise. Second, the intensity of the fluctuations changed over time in sudden jumps, which also would be unexpected if this was random Gaussian noise. My current thinking is that the large fluctuations are due to actual changes in gamma radiation levels over depths of five to ten feet. I found another interesting, and troubling characteristic of the gamma radiation data. At irregular intervals, the sensor seems to switch into a “low resolution” mode, where for distances of a few feet to tens of feet the sensor reports a constant value close to the most recent “good” value. In all these cases, the value reported is evenly divisible by seven, which based on my experience with other types of sensors suggests that either the electrical power, computational power, or bandwidth available to the sensor was limited during these periods.
Usually during drilling, an operator has access to reference well log data from nearby exploratory wells, the data from which they can compare to drilling data. For instance, they might use the reference log data to identify the levels of gamma radiation in each stratum, which they can then compare to gamma radiation levels during drilling to see whether the target stratum is found at the same depth as at the exploratory well. I don’t necessarily have access to this reference data; however, I have a workaround. At the well, their team created a sophisticated well log which I use as a “simulated” reference log.
Now that I have a good amount of data available, how do I use it to do geosteering? The answer is to use machine learning with the data to infer the movement of the drill head relative to the target stratum, and if necessary, correct the direction of the drill. For the purposes of this experiment I am just considering the inclination of the drill, but the process could be easily extended to include the azimuth of the drill as well.
To infer the location and movement of the drill, I can use a class of machine learning models known as Dynamic Bayesian Networks (DBNs). This class of model, which includes common models such as the Kalman Filter and Hidden Markov Models (HMMs), uses Bayesian statistics to infer the values of hidden variables over time from known or observed variables. In my case, I know the values of some variables, such as the gamma readings from drilling, the gamma readings from the reference well, the inclination of the drill, and the distance the drill has traveled at each time step. In my case, the hidden variable, which I want to find is the position and orientation of the drill at each time step relative to the target stratum.
The value of DBNs here is that I can encode domain knowledge into the model. For instance, I’ve composed mathematical equations that relate the inclination of the drill and the inclination of the target stratum together, such that if the inclination of the drill and stratum are aligned, then forward movement keeps the drill in the same location relative to the stratum. Alternatively, if I know that the inclination of the stratum and drill are not equal, I can determine the rate at which drill bit moves up or down in the stratum. I can also relate the observed gamma radiation levels to the reference log gamma radiation levels to make an estimate of my current location relative to the target stratum. If I were a geologist, I could encode even deeper domain knowledge into the model, but this will do for now.
Gamma radiation is not the only data available to use for geosteering. There are two other pieces of data I would like to incorporate into the model.
The first piece is rate of penetration (ROP). This is basically how fast the drill is moving through the Earth. Different types of rock have different physical properties, such as hardness, which affect how fast a drill can move through the rock. Based on my exploration of the data, it seems that rock type might primarily affect acceleration of ROP, and so actual ROP is a state function where the value at each time step is determined by the ROP of the previous time step and the type of rock the drill is currently in. This can be expressed as mathematical equations and encoded in the DBN model fairly easily.
The second piece of data is more complicated. Mudlogging can provide very valuable data. Lithographic data, as well as the volume and composition of hydrocarbons can be useful data to determine the relative location of the drill. However, there are challenges to using this data to do real-time geosteering. Observations during mudlogging are the result of where the drill was at some point previously because the rock and hydrocarbons need to move with the mud back to the surface. To make matters worse, this time delay will change over the course of drilling; as the measured depth of the well increases, the time delay will increase. This time delay would need to be compensated for before mudlogging data could be included in any geosteering model.
Another complication with mudlogging, is band broadening. This is like what happens in gas chromatography or HPLC. When a drill grinds up a small section of rock, the hydrocarbons and rocks from that section will start moving with the mud up toward the surface in a tight length of mud. By the time that material reaches the surface, it will have spread out over a longer length of mud due to diffusion and other physical processes. This reduces the resolution of the mudlogging data. Additionally, increasing measured depth of the well will increase the intensity of this effect, and the rock and hydrocarbons may diffuse (and move) at different rates through the well to reach the surface.
Geosteering is an interesting problem, and an important one in the Oil and Gas industry. By using Dynamic Bayesian Networks, domain knowledge about geology and petroleum engineering can be easily combined with data available before/during drilling. This in turn enables rapid inference of the relative path of a drill and real-time geosteering. Best of all, due to the relatively low compute power required for these algorithms, this real-time geosteering can be pushed to, and run on low-power computers at the Edge (in this case, the drill site). Modern IoT software from the Cloud providers makes this easy; geosteering algorithms can be designed, built, and validated in familiar a development environment in the Cloud and, when ready, pushed to the Edge for use, all while streaming summary data back to the Cloud for later analysis.