netCBS

netCBS efficiently creates network-based measures using CBS POPNET network tables (e.g. family, colleagues, neighbors, schoolmates, housemates). For example: compute the average income of a person’s parents, or the average income of the parents of their classmates, using CBS network links.

Installation

pip install netcbs

Quick start

See notebook for accessible information and examples.

The core function is transform(query, df_sample, df_agg, ...).

Inputs

df_sample: your “ego” sample. Must contain:
- RINPERSOON (unique person identifier). Note: RINPERSOONS must be R
df_agg: the table containing variables you want to aggregate for alters reached by the network traversal. Must contain:
- RINPERSOON. Note: RINPERSOONS must be R
- all variables referenced in the query’s aggregation-variable list (e.g. Income, Age)

Query format

A query describes:

Which variables to aggregate (first segment), and
Which network hops to traverse (one or more context segments), ending in sample.

Format:

"[Var1, Var2, ...] -> ContextA[types] -> ContextB[types] -> ... -> sample"

The first segment must be in square brackets: "[Income]" or "[Income, Age]".
Each context is one of: Family, Colleagues, Neighbors, Schoolmates, Housemates.
Context type selector is either:
- [all] (use all relationship codes valid for that context), or
- [101,102,...] (explicit relationship codes)
The final segment should be sample (case-sensitive recommended).

Example:

query = "[Income, Age] -> Family[301] -> Schoolmates[all] -> sample"

This means: find the aggregated Income and Age of parents (301) of the schoolmates of the people in the sample (df_sample).

Usage

import polars as pl  
import netcbs

query = "[Income, Age] -> Family[301] -> Schoolmates[all] -> sample"

df_out = netcbs.transform(
    query=query,
    df_sample=df_sample,     # must contain: RINPERSOON
    df_agg=df_agg,           # must contain: RINPERSOON, Income, Age
    year=2021,
    format_file="parquet",   # "parquet" (recommended) or "csv"
    agg_funcs=("avg", "sum", "count"),  # DuckDB aggregate function names (strings)
    return_pandas=False, 
)

About `agg_funcs` (important)

agg_funcs must be a sequence of DuckDB aggregate function names as strings, e.g.:

"avg", "sum", "count", "min", "max" (and other DuckDB aggregates)

The output columns are named:

"_"

So with agg_funcs=("avg","sum") and "[Income, Age]", you get:

avg_Income, sum_Income, avg_Age, sum_Age

How it works

Validate query
validate_query() checks:
- query structure
- df_sample has RINPERSOON
- df_agg has RINPERSOON and all requested aggregation variables
- each context and relationship-type selector is valid
- (optionally) referenced CBS files exist for the requested year
Resolve network files
For each hop, format_path() selects the latest available version of the CBS network file for the requested year.
- For format_file="parquet", files are expected under a geconverteerde data subfolder.
- For format_file="csv", files are read with read_csv_auto(..., delim=';').
Traverse the network
DuckDB reads each network file, filters by the requested relationship codes, and joins hop-by-hop from egos to alters.
Aggregate
DuckDB joins the final reached persons to df_agg and computes the requested aggregates, grouped by the original sample person.
Join back to sample
Results are left-joined back onto the sample so every sample person remains in the output (missing networks produce null aggregates).

Contributing

Please refer to the repository’s CONTRIBUTING guide for issues and pull requests.

License and citation

netCBS is published under the MIT license.
For academic citation: Garcia-Bernardo, J. (2025). netCBS: Package to efficiently create network measures using CBS networks in the RA. Zenodo. https://doi.org/10.5281/zenodo.17992329

Contact

Developed and maintained by the ODISSEI Social Data Science (SoDa) team.
Questions or suggestions: please open an issue or contact via the ODISSEI SoDa website.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
cbsdata/Bevolking		cbsdata/Bevolking
netcbs		netcbs
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
soda_logo.png		soda_logo.png
tutorial.md		tutorial.md
tutorial_netCBS.ipynb		tutorial_netCBS.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

netCBS

Installation

Quick start

Inputs

Query format

Usage

About `agg_funcs` (important)

How it works

Contributing

License and citation

Contact

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

sodascience/netCBS

Folders and files

Latest commit

History

Repository files navigation

netCBS

Installation

Quick start

Inputs

Query format

Usage

About agg_funcs (important)

How it works

Contributing

License and citation

Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

About `agg_funcs` (important)

Packages