As Blueprint Technologies’ data science team began to work on the video processing capabilities of NASH, our video analytics product, we faced a philosophical question: Should we take the tried-and-tested approach and put our computer vision skills to good use, or are we better off experimenting with newer concepts and technologies, getting our hands dirty with some spicy deep neural network magic?
At Blueprint, we prioritize innovation and experimentation as avenues to get the best results for our clients, and we thrive in the unknown. After some consideration, we decided this was one of those situations where our data scientists could get the best of both worlds. We set out to couple our computer vision skills with neural networks, to enable more precision and better flexibility when analyzing videos.
NASH allows users to extract information from videos – the lengths of which would make human analysis unfeasible. The hypothesis during NASH’s development was that marrying these two techniques would improve the video analytics capabilities included in the product, allowing users to extract more complex information from the videos they want to process. For example, instead of simply counting how many cars appear in a video, we wanted to also analyze the most common paths the cars take – and which cars deviate from those paths, making them anomalies.

Cars being detected and assigned unique IDs within NASH
When developing NASH, an early step for Blueprint’s team of data scientists was identifying a reliable way to detect certain objects (i.e., cars and trucks) and track them throughout a video with a unique ID applied to each object that would remain consistent despite:
- Occlusions (other cars passing in front of it)
- Illumination changes (passing from light to shadow)
- Hiccups in the detection algorithm
- Issues with the tracking algorithm
Neural networks
The main function of an object detector is, as the name implies, to detect objects. After detecting whichever object is needed, the detector draws a rectangle around that object and passes the coordinates forward. We went with an EfficientDet neural network as our object detector because it outshines classic computer vision techniques in two areas: adaptability and robustness.
Adaptability – If the neural network is properly trained it can be used over a multitude of videos with different viewing angles and resolutions without needing to tweak a long list of parameters.
Robustness – Because it is less finicky than a classic computer vision algorithm, you can be more confident that it will miss fewer detections.
The table below explains some of the reasons we chose EfficientDet over other neural networks.
High Level | Detail |
---|---|
It was the best solution available at the time | State of the art for Real-Time Object Detection task over the COCO dataset, with 55.1 mAP (mean average precision) |
Close to real-time processing capability | Smallest version could reach up to 36 FPS and maintaining 23 FPS for the 3d version of the neural network (D3) |
Flexibility to choose between multiple options with emphasis on either accuracy or speed |
Various options between: - quick processing, low resource usage - less accurate results (D0 version) - slower processing, greater resource usage - more accurate results (D7 version) |
For Blueprint’s project, the optimal solution (when optimizing between accuracy and resource consumption) was D3 as our default model. It had 45.6 mAP at 23 FPS and used 1.6 GB of our VRAM. While it is the best model for most applications, it can be easily adjusted should a potential client need a more lightweight model or a heavier, more performant model.

Object detection graph showing the flow of cars as it changes in time
Multi-object tracking
The basic function of a tracker is to assign an ID to a detected object and to keep that ID consistent throughout the entire video. It can, however, also work in unison with our neural network to help us mitigate any shortfalls in image detecting. We created our tracker to use a mix of multiple techniques that enable us to identify and monitor object activity. These include:
- Kalman filtering
- for predicting the future position of detection boxes
- for predicting the future position of detection boxes
- Joint detection embedding (JDE) tracking
- assigns a tracklet to each bounding box
- associates the detection with corresponding tracklets
- identifies lost, removed, found and active tracklets
The tracker also has a sliding window search, which uses template matching and custom feature extractors to locate objects that were lost track of.
The basic function of a tracker is to assign an ID to a detected object and to keep that ID consistent throughout the entire video. It can, however, also work in unison with our neural network to help us mitigate any shortfalls in image detecting. We created our tracker to use a mix of multiple techniques that enable us to identify and monitor object activity. These include:
- for predicting the future position of detection boxes
Joint detection embedding (JDE) tracking
- assigns a tracklet to each bounding box
- associates the detection with corresponding tracklets
- identifies lost, removed, found and active tracklets
The tracker also has a sliding window search, which uses template matching and custom feature extractors to locate objects that were lost track of.

Cars being detected and assigned unique IDs within NASH
Marriage
The concept of using our object tracking algorithm in tandem with the neural network is not complicated. Essentially, what we do is feed our video to the detector, the detector processes each frame, detects the required objects then draws squares around those objects. The tracker then takes the coordinates of these squares and assigns an ID to each one of them. After every frame, the detector tries to see if any object in the current frame has been present in the previous frame. If this is true, the object gets assigned the same ID. If not, the object gets a new ID.

Object detection and tracking pipeline within NASH
Neural networks and object tracking are both great when used alone, but they each have their own weaknesses. The neural network can recognize a type of object, but it can be difficult for it to get more specific. For example, it can tell the difference between a vehicle and a giraffe, but it cannot identify a specific type of vehicle. The object tracker can track something once it knows what it is supposed to track but needs to be told precisely what the user wants it to follow.

Common paths identified and clustered
Because NASH combines neural networks with objecting tracking, instead of simply counting vehicles, it can follow specific vehicles to see if any do anything out of the ordinary. This marriage of technology means NASH has fewer false positives. Furthermore, NASH independently figures out what to track thanks to the neural network, alleviating any need for constant human intervention.
This novel and efficient combination of two data science techniques has made NASH a superior video analytics product. But it’s also a great example of what makes Blueprint so special. We’re into everything data, and our data science team works on the bleeding edge of technology and innovation. That experience and expertise makes for some exciting results. Let’s start a conversation about how Blueprint can help you get the most out of your data.
Want to know more about video analytics and how Blueprint can move you forward into this new technology? Introducing NASH, Blueprint’s Advanced Video Analytics tool.