MATSUlab Issue Exchange Analysis

Metrics and evidence bundles from an analysis of the Akamatsu Lab (MATSUlab) discourse graph, built in Roam Research using the Discourse Graph extension.

This repository accompanies a study exploring how a shared issue board promotes idea exchange, structured knowledge production, and rapid researcher onboarding in a research lab.

What's in this repository

Evidence Bundles

The primary outputs are four evidence bundles in output/evidence_bundles/. An evidence bundle is a self-contained package that pairs a research finding with the data, figure, methods, and metadata needed to evaluate it. Each bundle follows the RO-Crate packaging standard and uses JSON-LD metadata with a discourse graph evidence vocabulary (dge:).

Each bundle contains:

File	Description
`evidence.jsonld`	Canonical metadata: evidence statement, observable, method, system, provenance, figure legend
`ro-crate-metadata.json`	RO-Crate 1.1 manifest listing all bundle contents
`fig*.png`	Static figure (primary visualization)
`fig*.html`	Interactive figure (Plotly or HTML/JS, where applicable)
`data/`	Underlying data files (JSON or CSV) sufficient to regenerate the figure
`methods_excerpt.md`	Relevant methods sections for the specific analysis

EVD 1 — Issue Conversion Rate (`evd1-conversion-rate/`)

29% of MATSUlab issues (n=445) were claimed as experiments.

Figure: Stacked bar chart showing the composition of all 445 issues (explicitly claimed, inferred, unclaimed)
Data: conversion_data.json

EVD 5 — Issue-to-Experiment-to-Result Flow (`evd5-issue-funnel/`)

Of 130 claimed experiments, 50 produced formal results (139 RES nodes), with 15% of claiming involving cross-person idea exchange.

Primary figure: Alluvial (Sankey) diagram showing researcher-level flow from Issue Created → Claimed By → Result Created
Supplemental figure: Aggregate conversion funnel bar chart
Data: funnel_summary.json, experiment_details.csv (anonymized)

EVD 6 — Time to Result (`evd6-time-to-result/`)

Among 50 experiments in the MATSUlab that produced formal results, the median time from issue claiming to first result was 12 days, with wide variance (n=50, IQR 0–50 days).

Figure: Swimmer plot showing experiment lifecycles from issue creation through claiming to result production (linear, log-scale, and interactive versions)
Data: data/time_to_result_data.json (summary statistics and per-experiment timing intervals)

EVD 7 — Undergraduate Researcher Onboarding (`evd7-student-onboarding/`)

Three undergraduate researchers tracked in this analysis each produced a formal result within ~4 months, with two reaching their first result within ~1 month.

Figure: Pin/stem timeline showing four milestones (lab start, first experiment, first plot, first result) for three researchers
Data: student_milestones.json

Analysis Notebook

notebooks/evd1_evd7_analysis.ipynb is a pre-executed Jupyter notebook that walks through the full analysis pipeline, from raw data loading through metric computation to each evidence bundle. It serves as a transparent trace from data to results. The notebook:

Loads and parses the discourse graph exports (JSON-LD + Roam JSON)
Shows issue classification, claiming detection, and attribution logic
Computes each metric with inline commentary
Generates the data underlying each evidence bundle

Note: The raw data files are not included in this repository (they contain identifiable information). The notebook is pre-executed with all outputs visible, so readers can follow the analysis without the source data.

Source Code

File	Purpose
`src/main.py`	Pipeline orchestrator — runs all steps end-to-end
`src/parse_jsonld.py`	Parse JSON-LD discourse graph export
`src/parse_roam_json.py`	Stream-parse Roam JSON export (block timestamps, experimental logs)
`src/calculate_metrics.py`	Merge data sources and compute all metrics
`src/generate_visualizations.py`	Generate static figures (conversion rate, time distributions, contributor breadth, idea exchange, funnel)
`src/handoff_visualizations.py`	Generate alluvial/Sankey flow diagrams
`src/experiment_lifecycle_visualizations.py`	Experiment lifecycle swimmer plots and result cascade visualizations
`src/student_timeline_analysis.py`	Student onboarding timeline extraction and visualization
`src/create_evidence_bundle.py`	Generate RO-Crate evidence bundles
`src/anonymize.py`	Central de-identification module (researcher name → pseudonym mapping)

Conversation Log

conversation_log.md documents the iterative prompt-response process between Matt Akamatsu and Claude that produced this pipeline. User prompts are reproduced verbatim; Claude responses are summarized. Together they constitute the specification: any output can be traced to the prompt that requested it.

How results trace from data to evidence bundles

Raw data (not in repo)
  Roam JSON export (~47 MB)          JSON-LD export (~11 MB)
        │                                    │
        ▼                                    ▼
  src/parse_roam_json.py              src/parse_jsonld.py
        │                                    │
        └──────────────┬─────────────────────┘
                       ▼
              src/calculate_metrics.py
              (merge, compute 5 metrics)
                       │
           ┌───────────┼───────────────┐
           ▼           ▼               ▼
  src/generate_    src/handoff_    src/student_timeline_
  visualizations   visualizations  analysis.py
           │           │               │
           └───────────┼───────────────┘
                       ▼
           src/create_evidence_bundle.py
           src/experiment_lifecycle_visualizations.py
                       │
        ┌──────────┬───┴────┬──────────┐
        ▼          ▼        ▼          ▼
    evd1-      evd5-    evd6-      evd7-
    conversion issue-   time-to-   student-
    -rate/     funnel/  result/    onboarding/

The Jupyter notebook (notebooks/evd1_evd7_analysis.ipynb) executes this same pipeline interactively, showing intermediate results at each step.

De-identification

Researcher names have been anonymized throughout all outputs:

Lab members are labeled R1–R11
Undergraduate researchers in EVD 7 are labeled Researcher A, B, C
The PI (Matt Akamatsu) remains identified as evidence bundle creator

The mapping is maintained in src/anonymize.py and applied consistently across all generated data files, visualizations, and notebook outputs.

Running the pipeline

pip install -r requirements.txt
python src/main.py

Requires the raw Roam Research exports in graph raw data/ (not included in this repository).

Source material

Contact The Discourse Graphs Project for read access to the following source material:

Experimental log
Result page: EVD 5
Result page: EVD 6
Result page: EVD 7
Raw data: MATSU lab graph in JSON-LD and JSON

License

This work is licensed under CC-BY-4.0. See LICENSE for the full text.

Attribution

Analysis and evidence bundles: Matt Akamatsu and Claude (Anthropic)
Review: Joel Chan
Discourse graph system: Discourse Graphs Project, Joel Chan, Matt Akamatsu
Lab discourse graph data: Akamatsu Lab, University of Washington
Discourse Graph extension: DiscourseGraphs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MATSUlab Issue Exchange Analysis

What's in this repository

Evidence Bundles

EVD 1 — Issue Conversion Rate (`evd1-conversion-rate/`)

EVD 5 — Issue-to-Experiment-to-Result Flow (`evd5-issue-funnel/`)

EVD 6 — Time to Result (`evd6-time-to-result/`)

EVD 7 — Undergraduate Researcher Onboarding (`evd7-student-onboarding/`)

Analysis Notebook

Source Code

Conversation Log

How results trace from data to evidence bundles

De-identification

Running the pipeline

Source material

License

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
notebooks		notebooks
output/evidence_bundles		output/evidence_bundles
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
conversation_log.md		conversation_log.md
requirements.txt		requirements.txt

License

DiscourseGraphs/MATSUlab-issue-exchange-analysis

Folders and files

Latest commit

History

Repository files navigation

MATSUlab Issue Exchange Analysis

What's in this repository

Evidence Bundles

EVD 1 — Issue Conversion Rate (evd1-conversion-rate/)

EVD 5 — Issue-to-Experiment-to-Result Flow (evd5-issue-funnel/)

EVD 6 — Time to Result (evd6-time-to-result/)

EVD 7 — Undergraduate Researcher Onboarding (evd7-student-onboarding/)

Analysis Notebook

Source Code

Conversation Log

How results trace from data to evidence bundles

De-identification

Running the pipeline

Source material

License

Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

EVD 1 — Issue Conversion Rate (`evd1-conversion-rate/`)

EVD 5 — Issue-to-Experiment-to-Result Flow (`evd5-issue-funnel/`)

EVD 6 — Time to Result (`evd6-time-to-result/`)

EVD 7 — Undergraduate Researcher Onboarding (`evd7-student-onboarding/`)

Packages