Metrics and evidence bundles from an analysis of the Akamatsu Lab (MATSUlab) discourse graph, built in Roam Research using the Discourse Graph extension.
This repository accompanies a study exploring how a shared issue board promotes idea exchange, structured knowledge production, and rapid researcher onboarding in a research lab.
The primary outputs are four evidence bundles in output/evidence_bundles/. An evidence bundle is a self-contained package that pairs a research finding with the data, figure, methods, and metadata needed to evaluate it. Each bundle follows the RO-Crate packaging standard and uses JSON-LD metadata with a discourse graph evidence vocabulary (dge:).
Each bundle contains:
| File | Description |
|---|---|
evidence.jsonld |
Canonical metadata: evidence statement, observable, method, system, provenance, figure legend |
ro-crate-metadata.json |
RO-Crate 1.1 manifest listing all bundle contents |
fig*.png |
Static figure (primary visualization) |
fig*.html |
Interactive figure (Plotly or HTML/JS, where applicable) |
data/ |
Underlying data files (JSON or CSV) sufficient to regenerate the figure |
methods_excerpt.md |
Relevant methods sections for the specific analysis |
29% of MATSUlab issues (n=445) were claimed as experiments.
- Figure: Stacked bar chart showing the composition of all 445 issues (explicitly claimed, inferred, unclaimed)
- Data:
conversion_data.json
Of 130 claimed experiments, 50 produced formal results (139 RES nodes), with 15% of claiming involving cross-person idea exchange.
- Primary figure: Alluvial (Sankey) diagram showing researcher-level flow from Issue Created → Claimed By → Result Created
- Supplemental figure: Aggregate conversion funnel bar chart
- Data:
funnel_summary.json,experiment_details.csv(anonymized)
Among 50 experiments in the MATSUlab that produced formal results, the median time from issue claiming to first result was 12 days, with wide variance (n=50, IQR 0–50 days).
- Figure: Swimmer plot showing experiment lifecycles from issue creation through claiming to result production (linear, log-scale, and interactive versions)
- Data:
data/time_to_result_data.json(summary statistics and per-experiment timing intervals)
Three undergraduate researchers tracked in this analysis each produced a formal result within ~4 months, with two reaching their first result within ~1 month.
- Figure: Pin/stem timeline showing four milestones (lab start, first experiment, first plot, first result) for three researchers
- Data:
student_milestones.json
notebooks/evd1_evd7_analysis.ipynb is a pre-executed Jupyter notebook that walks through the full analysis pipeline, from raw data loading through metric computation to each evidence bundle. It serves as a transparent trace from data to results. The notebook:
- Loads and parses the discourse graph exports (JSON-LD + Roam JSON)
- Shows issue classification, claiming detection, and attribution logic
- Computes each metric with inline commentary
- Generates the data underlying each evidence bundle
Note: The raw data files are not included in this repository (they contain identifiable information). The notebook is pre-executed with all outputs visible, so readers can follow the analysis without the source data.
| File | Purpose |
|---|---|
src/main.py |
Pipeline orchestrator — runs all steps end-to-end |
src/parse_jsonld.py |
Parse JSON-LD discourse graph export |
src/parse_roam_json.py |
Stream-parse Roam JSON export (block timestamps, experimental logs) |
src/calculate_metrics.py |
Merge data sources and compute all metrics |
src/generate_visualizations.py |
Generate static figures (conversion rate, time distributions, contributor breadth, idea exchange, funnel) |
src/handoff_visualizations.py |
Generate alluvial/Sankey flow diagrams |
src/experiment_lifecycle_visualizations.py |
Experiment lifecycle swimmer plots and result cascade visualizations |
src/student_timeline_analysis.py |
Student onboarding timeline extraction and visualization |
src/create_evidence_bundle.py |
Generate RO-Crate evidence bundles |
src/anonymize.py |
Central de-identification module (researcher name → pseudonym mapping) |
conversation_log.md documents the iterative prompt-response process between Matt Akamatsu and Claude that produced this pipeline. User prompts are reproduced verbatim; Claude responses are summarized. Together they constitute the specification: any output can be traced to the prompt that requested it.
Raw data (not in repo)
Roam JSON export (~47 MB) JSON-LD export (~11 MB)
│ │
▼ ▼
src/parse_roam_json.py src/parse_jsonld.py
│ │
└──────────────┬─────────────────────┘
▼
src/calculate_metrics.py
(merge, compute 5 metrics)
│
┌───────────┼───────────────┐
▼ ▼ ▼
src/generate_ src/handoff_ src/student_timeline_
visualizations visualizations analysis.py
│ │ │
└───────────┼───────────────┘
▼
src/create_evidence_bundle.py
src/experiment_lifecycle_visualizations.py
│
┌──────────┬───┴────┬──────────┐
▼ ▼ ▼ ▼
evd1- evd5- evd6- evd7-
conversion issue- time-to- student-
-rate/ funnel/ result/ onboarding/
The Jupyter notebook (notebooks/evd1_evd7_analysis.ipynb) executes this same pipeline interactively, showing intermediate results at each step.
Researcher names have been anonymized throughout all outputs:
- Lab members are labeled R1–R11
- Undergraduate researchers in EVD 7 are labeled Researcher A, B, C
- The PI (Matt Akamatsu) remains identified as evidence bundle creator
The mapping is maintained in src/anonymize.py and applied consistently across all generated data files, visualizations, and notebook outputs.
pip install -r requirements.txt
python src/main.pyRequires the raw Roam Research exports in
graph raw data/(not included in this repository).
Contact The Discourse Graphs Project for read access to the following source material:
- Experimental log
- Result page: EVD 5
- Result page: EVD 6
- Result page: EVD 7
- Raw data: MATSU lab graph in JSON-LD and JSON
This work is licensed under CC-BY-4.0. See LICENSE for the full text.
- Analysis and evidence bundles: Matt Akamatsu and Claude (Anthropic)
- Review: Joel Chan
- Discourse graph system: Discourse Graphs Project, Joel Chan, Matt Akamatsu
- Lab discourse graph data: Akamatsu Lab, University of Washington
- Discourse Graph extension: DiscourseGraphs