Towards reproducable SOTA Energy Disaggregation

Well unlike other notes that have an appendix at the end, I’ll give some context at the very beginning so the entirety of the paper makes sense to you.

Context

Smart meters are basically digital electricity meters that transmit consumption data to energy providers in real time, through IoT or some wired connection, this is to help them plan better and get energy consumption data in near real time.
Disaggregation is used to get an idea of power consumption by a particular device or appliance within a building or metered unit.
NILM : Non Intrusive Load Monitoring
Historically, NILM has had fragmented/inconsistent metrics
Newer work mostly compares with benchmark models, which are quite. The authors propose a platform to compare newer-work with SOTA by keeping current StateOfTheArt work as benchmarks.

DE-aggregation ?

Smart meters would provide us with results from a single connection / meter, which in most cases would be an entire commercial/residential unit.
We aim to measure appliance drawn energy $X_i$ from an aggregate $Y$ which is given by our smart meter in near real time.

Choice of train test split

Geographies and user-behaviour affects power consumption, which makes device/appliance level performance poor if the test set is widely different from the train set
Take care to have buildings from similar geographies, if available use data from different datasets.
The paper discusses a case where a mix of European and American buildings causes poor performance, which has nothing to do with the model.

What does this paper talk about ?

New Experiment API

The authors simplify train/eval process.
Aiming to make experimenatation modular, lowering barrier for new researchers and helping simplify benchmarking of results against SOTA.
Earlier users had to iterate over data chunks across diff buildings from diff datasets, combine predictions for each of these blocks and then pass these through an interface to obtain metrics.

Unified benchmarking

3 baseline + 9 SOTA NILM algorithms
New ideas can be easily evaluated against SOTA architectures (plug and play) for publically available datasets, which is usually aggregated power consumption for buildings.
It was hard to assess generalizability of NILM work as most papers were evaluated on a single dataset, or sometimes even subsets of these datasets, which would make comparing diff works very difficult.

From the paper

Decouple the data loader and model train function
For algorithms that require pre-processing, a call_preprocessing method was added which allows users to store pre-processed data in HDF5 file format.
This way instead of saving data to limited memory, we can simply export this to another file format and use this at train-time.

Chunk : Contiguous portion of time series data that fits into memory
Window : Portion of the chunk on which model trains in 1iteration, the window can be rolling (with or without overlap) as shown in the graphic below.

Illustration
A single window (window == chunk) is used for traditional algorithms like FHMM (Hidden Markov Models), whereas for seq-to-point we would consider multiple overlapping windows.

Algorithms

mean

predict a constant mean power value for each appliance
Useful as a sanity check for always-on appliances, does not work well for appliances that have on/off patterns

Edge Detection

find significant changes in power signal to identify appliance switching between on/off states

misses out on :

Combinatorial Optimization

Assign states to each appliance, so that $\Sigma~states$ is close to aggregate power consumed

Denoising Autoencoder

Denoise on a per-appliance basis, consider the aggregate power usage as appliance usage + NOISE.

Recurrent Neural Network

From a sequence of readings of power consumption from our Smart meter, output a single value for power consumption for a particular appliance.

This is not a many-to-many RNN architecture as the outputs are not linked to each other / fed back into the model / determined by intermediate signals.


Seq-2-seq

predict a window of target appliance power from a window of aggregate power consumed

Seq-2-point

similar to Seq-2-seq explained above
Instead of predicting an entire window for a particular appliance, we only predict the mean of the window.

Online GRU

Lightweight RNN for quicker inference, as this method is used to get appliance usage on the fly.
Gated Recurrent Units are basically RNN’s but with gates that limit the flow of information to some degree, improving the ability of the model to learn context.

What new could be done ?

newer SOTA ?

More focus on Online Inference

Fix Typo

Things I didn’t understand

(I plan to revisit foundational concepts like Hidden Markov Models and sparse coding)

Factorial HMM Metrics

I read about Hidden Markov Models in the context of predicting the next speech token as part of an undergrad course on Speech Processing
But honestly it’s been a long long time, and I will have to revisit it in entirety to understand the underlying workings.
From a very naive understanding, these methods might work to predict the appliance energy consumption, or the next state (on/off) based on a prior sequence of energy usage.
We have a HMM for all of our appliances, which makes scaling difficult, and to combine predictions we take these models in proportion, hence the term factorial in its name.

Discriminative Sparse Coding

Haven’t heard of this, will need to dive deep
Left this for now, to prioritise on other tasks at hand, apologies.

What is the artifical aggregate

Do we have the truth values for energy consumed by each appliance and sum it all up for a single commercial/residential unit to get the aggregate ?


Personal notes

Well I kinda was dumbfounded for most of the paper, until it talked about getting the energy consumption for each device/appliance, after which I could piece it all together.
Electricity providers in India are pushing for wider adoption of ‘smart meters’ and a faster rollout, as this would reduce theft and provide energy-generators with crucial information in door-to-door consumptions metrics in Near-Real-Time, which would help make better predictions for base load demand and the fluctuations we see within a day and help understand seasonality in demand.

Last updated: July 10, 2025