Hestia Task 3.3: Consumer behaviour profiling service
Hestia Task 3.3 aims to deploy the infrastructure for the profiling of user consumption
behaviour. This profiling will
- enable devising a suitable consumer engagement strategy, and
- serve as the basis for predicting the household consumption behaviour on the next day.
This webpage has two functions:
- It dives into the pilot data (grouped on the apartment level) to assess the quality and
readiness of the data for machine learning tasks.
- It demonstrates the kinds of patterns that can be extracted from raw household electricity
consumption time series.
The menu bar the top has two main groups:
- The Data menu links to interactive dashboards for exploring apartment data from the Dutch and
Italian pilots, as well as weather data from all the pilots. This gives indication on the
quality and completeness of the collected data at a glance and per household.
- The Clustering menu links to four different cluster analysis variants that differ as follows
(detailed explanation below):
- Variant 1: household-days are normalized and there are 6 clusters.
- Variant 2: household-days are normalized and there are 12 clusters.
- Variant 3: households are normalized and there are 6 clusters.
- Variant 4: households are normalized and there are 12 clusters.
The clustering is carried out on a collection of items which we call household-days. The
household-days arise from chopping up the single timeseries of each household into individual
days, with each of 24-h piece then belonging to exactly one household on one day.
The number of clusters corresponds the number of different behaviour groups that the
algorithm will try to assign daily consumptions to. There exist automatic methods for devising
an optimal number of clusters for a particular dataset, but depending on our needs, we may want
to have more or fewer than the “optimal” number. We opted to present the results of two fixed
numbers of clusters to illustrate the differences that may arise.
Normalization is a common pre-processing step carried out before clustering. It aims at
making the data more similar in a controlled way. The two normalization types alluded to in the
above differ as follows:
- In the first type, we treat every household-day separately: we first calculate the mean of
all of its values, and then divide the values by that mean. This way, every household-day
has the same mean, so the clustering focuses solely on differences in the pattern.
- The second normalization type is simpler: we calculate the mean of all values belonging to
a household, divide the values by that mean, and only then chop up the timeseries into
household-days. This way, every household has the same mean, but household-days generally
have different means, so the clustering may identify different patterns but also different
overall levels of consumption.
Our Team
Milos Sipetic
Jan Kurzidim
Adam Buruzs
AIT Austrian Institute of Technology GmbH
Sustainable Thermal Energy Systems
Center for Energy
Giefinggasse 2 | 1210 Vienna | Austria