Advanced AI Algorithm Development in Energy Forensics: A Python Guide to Transformer Models for Smart Grid Theft Detection through Consumption Patterns

Photo Credit: Pexels

Insights from German Energy Data: A Guide by Stephanie Ness on Consumption Patterns and Online Data-Supported Analytics for small scale Settings

Amid the vast sea of data, each kilowatt-hour whispers its tale. Deep within the intricate consumption patterns may lie anomalies, discreet tell-tales of energy theft. 

These stories, though frequently mundane, sometimes echo with the whispers of foul play. Malevolent actors subtly siphon off electricity, their actions masked within the labyrinth of consumption data. Yet we possess a potent weapon, forged in the crucible of machine learning’s relentless innovation: the transformer model.

Even though the transformer model is no longer the newest star in the AI galaxy, its unparalleled prowess in processing sequences makes it the detective of choice for our mission. Armed with Python, we can uncover the anomalies to identify the malevolent actors and take corrective actions with our own data science detection algorithms.

The Essence of Energy Analysis

In a world increasingly reliant on sustainable energy, understanding consumption patterns is vital. From homes and businesses to industries and government establishments, a comprehensive analysis of energy consumption serves manifold purposes. Whether it’s about efficiency, cost-saving, or sustainability, there’s always more to energy than meets the eye.

Unraveling Theft in the Maze of Consumption

The dataset in question, sourced from the Open Energy Data Initiative and published in May 2022, contained energy consumption details for a diverse range of consumers. Within this vast sea of data, six specific types of theft stood out:

  1. A substantial reduction in electricity use during daytime hours.
  2. Arbitrary and sudden drops in consumption to zero.
  3. Random multiplicative reductions in hourly consumption.
  4. Consumption data presenting as a random fraction of the mean.
  5. Consistent reporting of average consumption, irrespective of actual usage.
  6. A complete reversal in the order of consumption readings.

Each of these theft types was intricately embedded within the dataset, necessitating a sophisticated approach to detect them effectively.

The method explained in this tutorial is transformer based. It is employed primarily because it capitalizes on the inherent parallel processing capabilities of transformers. Instead of analyzing data linearly, it processed multiple data points simultaneously, resulting in expedited analysis.

Step 1: Setting Up Your Environment


Before any data analysis, it’s crucial to set up a conducive environment. This ensures all tools are at our disposal when we need them.

pip install pandas numpy tensorflow

This command installs three libraries: pandas (for data manipulation), numpy (for numerical computations), and tensorflow (our deep learning framework).

Step 2: Loading the Dataset


Datasets are the backbone of any machine learning project. Here, we’re sourcing our dataset from an online link.

import pandas as pd
import numpy as np

# Load dataset
url = “”
data = pd.read_csv(url)

Upon execution, data will contain the entire dataset, ready for preprocessing.

Step 3: Preliminary Data Analysis


Before diving into model-building, it’s wise to understand the data you’re working with. This step is about gaining that understanding.

This will display the first few rows and a summary (like mean, standard deviation) of the dataset.

Step 4: Data Preprocessing


Raw data isn’t always suited for machine learning. Preprocessing refines the data, making it more digestible for our models.

consumption_data = data[“Electricity:Facility [kW](Hourly)”].values
We focus on the column Electricity:Facility [kW](Hourly) as it records hourly electricity consumption, a key factor for our analysis.

Normalization is essential to scale all data to a uniform range, making computations more stable:

mean = consumption_data.mean()
std = consumption_data.std()
consumption_data = (consumption_data – mean) / std

Step 5: Building the Transformer Model


Transformers, introduced in the paper “Attention is All You Need”[1], are a breakthrough in handling sequences. Unlike traditional recurrent models, transformers can pay selective “attention” to different parts of the input data.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Model parameters
embedding_dim = 64
num_heads = 4
ff_dim = 32
num_blocks = 2

# Transformer block
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
    x = layers.LayerNormalization(epsilon=1e-6)(inputs)
    x = layers.MultiHeadAttention(
        key_dim=head_size, num_heads=num_heads, dropout=dropout
    )(x, x)
    x = layers.Dropout(dropout)(x)
    res = x + inputs
    x = layers.LayerNormalization(epsilon=1e-6)(res)
    x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation=”relu”)(x)
    x = layers.Dropout(dropout)(x)
    return x + res

inputs = layers.Input(shape=(None, embedding_dim))
x = inputs

for _ in range(num_blocks):
    x = transformer_encoder(x, embedding_dim, num_heads, ff_dim)

x = layers.GlobalAveragePooling1D()(x)
x = layers.Dense(30, activation=”relu”)(x)
x = layers.Dense(1, activation=”linear”)(x)

model = keras.Model(inputs=inputs, outputs=x)

Here, we’ve set up a transformer block that first normalizes the input data, applies multi-head attention, and then processes it through a feed-forward network.

Step 6: Preparing Training Data and Model Training


Machine learning models learn from data. This step feeds our preprocessed data into the model so it can learn to predict energy consumption patterns.

X = []
y = []

for i in range(len(consumption_data) – 24):
    X.append(consumption_data[i : i + 24])
    y.append(consumption_data[i + 24])

X = np.array(X)
y = np.array(y), y, epochs=10, validation_split=0.1)

This code breaks our time-series data into sequences of 24 hours, making it easier for the transformer to analyze patterns within a day.

Step 7: Anomaly Detection


Anomalies are deviations from the norm. By predicting and comparing with actual consumption, we can pinpoint where these anomalies lie.

predictions = model.predict(X)

anomalies = np.where(np.abs(predictions – y) > 1)[0]

print(f”Anomaly Indices: {anomalies}”)

After training, our model attempts to predict the next data point in our time series. When it fails significantly (a deviation of more than 1 after normalization), we may have detected a potential energy theft or other anomalies.

Alternative detection methods:

The transformer model, introduced by Vaswani et al., 2017, has indeed revolutionized many domains in machine learning, especially within natural language processing. But before its reign and in some niches even now, several models and techniques were and are employed for sequence data and other types of applications. Here are some alternatives:

Recurrent Neural Networks (RNNs):

Vanilla RNN: The original RNNs can remember past data in sequence and were used primarily for sequential data. However, they suffer from the vanishing gradient problem.

Long Short-Term Memory (LSTM): An improved RNN, designed to remember past data in sequences for long periods. It is quite popular and has been successful in tasks like text generation and more.

Gated Recurrent Units (GRU): A simplified version of LSTMs with similar performance but fewer parameters.

Convolutional Neural Networks (CNNs):

Traditionally used for image data due to their ability to recognize patterns, textures, and shapes. However, they have also been adapted for sequence data by applying 1D convolutions.

Radial Basis Function Networks (RBFNs):

Mostly used for classification and regression tasks, they’re not necessarily for sequence data but have been employed in various applications.

Hidden Markov Models (HMMs):

Especially popular for speech recognition and other sequence labeling tasks before deep learning models became dominant.

Sequence-to-Sequence models:

Initially designed for tasks like machine translation, where an input sequence should be transformed into an output sequence.

Attention Mechanisms (Without Transformers):

Attention was developed to allow models, especially in sequence-to-sequence tasks, to focus on specific parts of the input when producing an output. It was the stepping stone for transformers, but attention can be and has been used without the full transformer architecture.

Feedforward Neural Networks (FNNs) or Multi-layer Perceptrons (MLPs):

These are the standard neural networks used for various tasks, from classification to regression.


Mainly used for dimensionality reduction and anomaly detection, they find patterns in the data by compressing and then reconstructing the input.

Tree-based Models:

Models like Random Forests, Gradient Boosted Trees, etc., can be used for a variety of tasks. They are not suited for sequence data but can be powerful alternatives depending on the application.

Time Series Forecasting Models:

For sequence data specifically related to time series forecasting, traditional models like ARIMA, Exponential Smoothing State Space Model (ETS), and Prophet can be effective.

The choice among these alternatives depends on the nature and structure of the data, the specific application, and the problem’s requirements. For instance, if the sequence’s temporal aspect is highly crucial, models like LSTMs or GRUs might be more suitable. On the other hand, if spatial patterns are more critical (as in images), CNNs would be a natural choice.

While the transformer model has shown impressive results across many domains of theft detection, it’s always essential to consider the problem’s specifics and not assume that the latest or most popular model is the best fit. Sometimes, simpler or older models can outperform newer ones depending on the dataset and problem constraints.

In Closing:

Through the marriage of energy data and advanced AI techniques, we’ve crafted a rudimentary detective capable of spotting unusual energy consumption patterns. While our model is basic, it serves as a foundation upon which more sophisticated theft detection systems can be built.



Zidi, S., Mihoub, A., Qaisar, S. M., Krichen, M., & AbuAl-Haija, Q. (2022). Theft detection dataset for benchmarking and machine learning based classification in a smart grid environment. *Journal of King Saud University – Computer and Information Sciences*. Available at: []( [Accessed 24 Jul 2023].

National Renewable Energy Laboratory. (2015). Smart Grid Data from the American Recovery and Reinvestment Act (ARRA) Projects [data set]. Available at: []( [Accessed 24 Jul 2023].

Academic references:

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762v6. Available at: [Accessed 24 Jul 2023].

Liu, H., Liang, J., Liu, Y., & Wu, H. (2023). A Review of Data-Driven Building Energy Prediction. *Buildings*, 13(2), 532. Available at: []( [Accessed 24 Jul 2023].

About The Author

Scroll to Top
Share via
Copy link
Powered by Social Snap