Catching Anomalies in Cold Storage Temperature Data

 

Disclaimer

Disruptive Technologies (DT) do not provide an end solution for cold storage anomaly detection. This application note is a proposed approach made to serve as an example for developers who want to get started with cold storage applications using DT Wireless Temperature Sensors. 

This guide assumes you are familiar with the DT ecosystem and have access to temperature sensors and Cloud Connectors. If not, take a look at our Getting Started guide and consider ordering a sensor kit from our Webshop

Introduction

Cold storage applications are often subject to strict temperature requirements during operation. Continuously tracking the condition of fridges containing food, medicine, or other easily spoiled product can help avoid loss of produce by detecting failure onset [1]. However, merely setting a fixed temperature threshold for alerts can result in many false alarms. During business hours, staff might open the doors to a fridge several times, causing a short but large spike in temperature. Also, industry-grade refrigerating equipment often features a regular defrosting cycle feature, quickly raising the temperature at set intervals to promote de-icing.

In this application note, an alternative approach to triggering alarms is proposed, separating short-time oscillations in temperature from the more representative baseline with the aim of reducing false alarms. The following sections explain step-by-step how you can set this up for your own DT Studio project, further building upon the proposed method.

intro.png

Figure 1: A significant change in temperature triggering an alarm.

 

DT Studio Project Configuration

The implementation is built around using the developer API to interact with a single DT Studio project containing all temperature sensors used in anomaly detection. If not already done, a project needs to be created and configured to enable API functionalities.

Project Authentication

For authenticating the developer API against the DT Studio project, three separate authentication details have to be located and generated, later to be used in the example code. By following this guide, the project should ready to interface with the API.

Adding temperature sensors to the project

By default, all temperature sensors in the project are assumed independent from each other and will be processed as such. The number of sensors used does not have to be configured beforehand and is scaled automatically. The option to move a sensor between projects can be found when selecting a sensor in a DT Studio project, as shown in figure 2.

studio_project.png

Figure 2: Detailed overview of sensors in the DT Studio project.

 

Example Code

An example code repository is provided in this application note. It illustrates one way of detecting anomalies in temperature data and is meant to serve as a precursor for further development and implementation. It uses the Developer API to interact with the DT Studio project.

Source Access 

The example code source is publicly hosted on the official Disruptive Technologies GitHub repository under the MIT license. It can be found by following this link.

Environment Setup

All code has been written in and tested for Python 3. While not required, it is recommended to use a virtual environment to avoid conflicts. Required dependencies can be installed using pip and the provided requirements text file.

pip3 install -r requirements.txt 

Using the details found during the project authentication section, edit the following lines in sensor_stream.py to authenticate the API with your DT Studio project.

USERNAME   = "SERVICE_ACCOUNT_KEY"    # this is the key
PASSWORD   = "SERVICE_ACCOUNT_SECRET" # this is the secret
PROJECT_ID = "PROJECT_ID"             # this is the project id 

Usage

If the example code is correctly authenticated to the DT Studio project as described above, running the script sensor_stream.py will start streaming data from each temperature sensor in the project for which the temperature is continuously monitored.

python3 sensor_stream.py 

For more advanced usage, such as visualizing the estimates, one or several flags can be provided upon execution.

usage: sensor_stream.py [-h] [--path] [--starttime] [--endtime] [--plot]
                        [--debug]

Desk Occupancy Estimation on Stream and Event History.

optional arguments:
  -h, --help    show this help message and exit
  --path        Absolute path to local .csv file.
  --starttime   Event history UTC starttime [YYYY-MM-DDTHH:MM:SSZ].
  --endtime     Event history UTC endtime [YYYY-MM-DDTHH:MM:SSZ].
  --plot        Plot the estimated desk occupancy.
  --debug       Plot algorithm operation.

The arguments --starttime and --endtime should be of the format YYYY-MM-DDThh:mm:ssZ, where YYYY is the year, MM the month, and DD the day. Likewise, hhmm, and ss are the hour, minutes, and seconds respectively. Notice the separator, T, and Z, which must be included. It should also be noted that the time is given in UTC. Local timezone corrections should, therefore, be made accordingly.

By providing the --plot argument, a visualization, as shown in figure 4, will be generated. If historical data is included, an interactive plot will be produced after estimating occupancy for the historical data. By closing this plot, the stream will start, and a non-interactive plot will update for each sample that arrives. 

Similar to --plot, the --debug argument will also generate a visualization. It shows an overview of thresholds and other values calculated by the algorithm. It is meant to be used mainly for debugging purposes. It does not work for streaming data by default.

output.png

Figure 3: 20 days of data for three sensors, visualized using the --plot argument.

 

Proposed Implementation

With the aim of automatically monitoring temperature changes in cold storage applications with a reduced amount of false alarms, a simple yet effective approach has been proposed and implemented. By utilizing robust statistics[2] with historical data, an upper- and lower envelope is continuously calculated as new samples arrive in the stream.

For new data to be considered an anomaly, the calculated envelopes have to be breached, though the duration of which the envelopes are exceeded is also considered before sending an alert. The temperature baseline is also extracted and isolated using a centered rolling median. Figure 4 shows an overview of the algorithm flow.

The implementation has been structured such that typical tuning parameters for the algorithm are located in the file ./config/parameters.py. As no single configuration can work for all data, users are encouraged to experiment with different combinations better suited for their own data.

cold_storage_flow.png

Figure 4: Algorithm flow chart from a new event sample to a triggered warning.

 

Baseline Extraction

Much of the false alarms can be removed by merely extracting the temperature baseline, \(b\), from the raw temperature data \(x\). This alone might be sufficient for many applications and would result in a straightforward implementation, but is expanded upon in this application note. Instead of using the rolling average, the centered rolling median has been chosen as it is much more robust against outliers and generally spiky data.

Only a single parameter, the window width \(L\), has to be set for this operation. For each new temperature sample, the baseline value is calculated by taking the median of previous samples within length \(L\), as shown in figure 5. Therefore, a larger \(L\) will produce a smoother baseline, and vice versa. A larger \(L\) does, however, also result in a longer introduced delay given by

\begin{equation}
\tau_L = \frac{L}{2},
\end{equation}
and should be minimized. It is therefore recommended that \(L\) is set no longer than the feature of the data one wishes to remove. As shown in figure 5, this is enough to completely remove said feature due to the median being used, while minimizing delay. When subtracting \(b\) from \(x\), the resulting differentiated temperature \(y\) is therefore given by

\begin{equation}
y[k] = b[k] - x[k],
\end{equation}
where \(k=0,1,...,N-L/2\) and \(N\) is the total number of temperature samples.

It should be mentioned that the introduced delay \(\tau_L\) could, in practice, be removed by using a right-aligned rolling median instead of the center-aligned presented here, as the calculated baseline value would then be situated together with the latest temperature value. However, this would cause the baseline to lag, not really reducing the delay at all while introducing unnecessary artifacts in later steps when subtracted from the temperature data. 

baseline.png

Figure 5: Extracting the temperature baseline through a rolling median.

 

Historic Data Envelope

In order to detect when the temperature behavior changes significantly, previous historical data of length \(T\) is used to produce an envelope that spans the area of normal operation. Integral to the envelope calculation, the minimum- and maximum value, together with the median absolute deviation (MAD) must be found. While similar to the standard deviation (STD), the MAD given by

\begin{equation}
\text{MAD} = \text{median}( |y_k - \text{median}(y_k)| ),
\end{equation}
is much more robust towards outliers than the squared nature of the STD.

Instead of evaluating the whole time period \(T\), \(M\) smaller windows of size \(W\) can instead be assessed individually, finding both the MAD and maximum value in isolation. Thereafter, by taking the median of all found values, the result is a much better representation of the general data behavior over the time period \(T\). The windows can also be overlapped to increase the number of resulting calculations. Figure 6 shows the minimum- and maximum value found for \(M=9\), \(W=8\) hours over \(T=3\) days.

windowing.png

Figure 6: Minimum- and maximum value found for nine isolated temperature windows.

 

After windowing the data and finding the maximum, minimum and MAD value for each window separately, the upper- and lower envelope \(e_{u/l}\) is given by

\begin{align}
e_u &=  b[k] + \text{median}(\text{max}_W[m]) + \epsilon \cdot \text{median}(\text{MAD}_W[m]) \\
e_l &=  b[k] - \text{median}(\text{min}_W[m]) + \epsilon \cdot \text{median}(\text{MAD}_W[m]),
\end{align}
where \(\text{max}_W\), \(\text{min}_W\), and \(\text{MAD}_W\) is the respective values found for each \(m=0,1,...,M\) subwindow, and \(\epsilon\) a modifier which controls envelope width. Regardless of any spurious changes in the temperature data, this envelope should behave rather consistently over time, providing an upper- and lower threshold to evaluate outliers against. As the baseline is used to change the envelope level, only unnaturally large spikes in temperature should be caught.

bounds.png

Figure 7: Calculated envelope for one week of temperature data.

 

Defining Anomalies

What does and does not define an anomaly comes down to specific use-cases in the end. No one configuration can work for all types of data. Still, if the implemented anomaly detection system is flexible, only a few parameters have to be changed to produce decent performance and a low amount of false alarms.

By using the baseline and calculated envelope in previous sections, this implementation proposes that three types of alarms can be used for sufficient granularity:

No Alarm: If a temperature value is within the calculated envelope, nothing of interest is happening and can be promptly ignored.
Warning: If a temperature value exceeds the calculated envelope, but returns within a period of \(L/2\), the baseline is mostly unaffected and does not see a rise. Such a spike is often caused by opening fridge doors during a typical working day and can be ignored. One could, however, mark these short-time spikes with a warning label for posterity without triggering a full-fledged alert.
Alert: Should the temperature rise for a period longer than \(L/2\), the baseline will do so too. This is useful information, as for applications such as food storage where a maximum storage temperature is often set by regulations [1], such a prolonged rise in temperature is unfortunate. Therefore, triggering an alarm when the baseline exceeds said maximum value would be a natural choice. 

Several occurrences of all three aforementioned types of alarms can be seen in figure 8. The benefit of this method is that even though the temperature exceeds the maximum storage temperature several times, only two of them do so for a prolonged time and triggers an alert. This would drastically reduce alarm fatigue, and the only parameter which has to be tuned is the median window length \(L\).

anomaly.png

Figure 8: Two alarms triggered due to a prolonged rise in temperature. 

 

Reference

  1. https://www.mattilsynet.no/mat_og_vann/transport_og_lager/frakt_og_transport/krav_til_transport_av_mat
  2. https://en.wikipedia.org/wiki/Robust_statistics