Scripts for processing and analyzing the Swiss Household Energy Demand Survey (SHEDS) data in R and Python. SHEDS data is stored in SPSS format (.sav) which includes value labels and variable descriptions. Descriptions and more information can be found at SHEDS - Sweet Cross.
The project provides example scripts demonstrating how to work with SHEDS data in R and Python. It also includes a CSV file listing all question identifiers across survey years, indicating when each question was used, to improve transparency and facilitate longitudinal analysis.
sheds_data_scripts/
├── sheds_questions_up2025.csv # Table of question identifiers over the years
├── README.md
├── .gitignore
└── src/
├── python/
│ ├── utils.py # Contains useful functions
│ ├── sheds_explore.ipynb
│ ├── longitudinal_exploration.ipynb
│ └── read_sav_example.ipynb
└── R/
├── utils.R # Contains useful functions
├── sheds_explore.Rmd
└── longitudinal_exploration.rmd
pip install pandas numpy pyreadstat matplotlib seaborninstall.packages(c("haven", "tidyverse", "zoo", "scales"))source("utils.R")
# Load single wave (filters out screen == 3)
sheds <- read_clean_sheds("/path/to/SHEDS2025.sav")
# Get summary statistics
summary <- get_data_summary(sheds)from utils import read_clean_sheds, get_data_summary
# Load single wave (filters out screen == 3)
sheds = read_clean_sheds("/path/to/SHEDS2025.sav")
# Get summary statistics
summary = get_data_summary(sheds)import pyreadstat
# Load with metadata
df, meta = pyreadstat.read_sav("/path/to/SHEDS2025.sav", encoding="UTF-8")
# Get variable label (question text)
meta.column_names_to_labels['accom11_1']
# -> "How satisfied are you with your current heating system?"
# Get value labels (response options)
meta.variable_value_labels['accom11_1']
# -> {1: 'Very dissatisfied', 2: 'Dissatisfied', ..., 5: 'Very satisfied'}
# Apply labels to create readable values
df['accom11_1_label'] = df['accom11_1'].map(meta.variable_value_labels['accom11_1'])library(haven)
sheds <- read_sav("/path/to/SHEDS2025.sav")
# Get variable label
attr(sheds$accom11_1, "label")
# Get value labels
attr(sheds$accom11_1, "labels")
# Apply labels
library(dplyr)
sheds %>%
mutate(accom11_1_label = as_factor(accom11_1))| Function | Description |
|---|---|
read_clean_sheds(filepath) |
Read SPSS file, filter out screened respondents (screen != 3) |
get_data_summary(data) |
Returns n_respondents, n_variables, completion_rate, avg_duration |
build_car_history(all_waves_dict) |
Combine waves, carry forward car data for longitudinal analysis |
analyze_ev_ownership_data(data_history, year) |
Analyze EV/hybrid ownership for a specific year |
save_plot(plot, path, filename) |
Save figure in PDF and EPS formats |
check_finished(data, year) |
Report completion statistics for a wave |
Since we do not ask all respondents about their car type in every wave—only when they report a change—it is necessary to reconstruct the full car‑ownership history for the analysis. In each wave, respondents are asked whether they have changed their car since the previous survey. If they report a change, we collect the type of car; if not, the question is skipped. To build a complete car‑type history, we need to carry forward (i.e., “roll forward”) the car type reported in the most recent previous wave whenever no change is indicated, and update the value only in waves where a change is reported.
from utils import read_clean_sheds, build_car_history, analyze_ev_ownership_data
import pandas as pd
# Load all waves
years = [2016, 2017, 2018, 2019, 2020, 2021, 2023, 2025]
waves = {}
for year in years:
waves[str(year)] = read_clean_sheds(f"/path/to/SHEDS{year}.sav")
# Build car history with forward-fill
car_history = build_car_history(waves)
# Analyze each year
results = pd.concat([
analyze_ev_ownership_data(car_history, year)
for year in [2019, 2020, 2021, 2023, 2025]
])source("utils.R")
years <- c(2016, 2017, 2018, 2019, 2020, 2021, 2023, 2025)
waves <- list()
for (year in years) {
waves[[as.character(year)]] <- read_clean_sheds(paste0("/path/to/SHEDS", year, ".sav"))
}
car_history <- build_car_history(waves)
results <- bind_rows(
analyze_ev_ownership_data(car_history, 2019),
analyze_ev_ownership_data(car_history, 2020),
analyze_ev_ownership_data(car_history, 2021),
analyze_ev_ownership_data(car_history, 2023),
analyze_ev_ownership_data(car_history, 2025)
)| Variable | Description |
|---|---|
id |
Respondent ID (consistent across waves) |
finished |
Survey completion (1 = finished) |
screen |
Screening status (3 = screened out) |
mob2_1 |
Number of cars in household |
mob3_3 |
Fuel type of main car (8 = Electric) |
mob2_e |
Has electric vehicle as secondary car (1 = yes) |
q_totalduration |
Survey duration in minutes |
| Code | Fuel Type |
|---|---|
| 1 | Gasoline |
| 2 | Diesel |
| 3 | Natural Gas |
| 4 | LPG |
| 5 | Hybrid gasoline |
| 6 | Plug-in Hybrid |
| 7 | Hybrid diesel |
| 8 | Electric |
| 9 | Other |