Skip to content

Metrics and evidence bundles from an analysis of the MATSUlab discourse graph: issue exchange, knowledge production, and researcher onboarding

License

Notifications You must be signed in to change notification settings

DiscourseGraphs/MATSUlab-issue-exchange-analysis

Repository files navigation

MATSUlab Issue Exchange Analysis

Metrics and evidence bundles from an analysis of the Akamatsu Lab (MATSUlab) discourse graph, built in Roam Research using the Discourse Graph extension.

This repository accompanies a study exploring how a shared issue board promotes idea exchange, structured knowledge production, and rapid researcher onboarding in a research lab.

What's in this repository

Evidence Bundles

The primary outputs are four evidence bundles in output/evidence_bundles/. An evidence bundle is a self-contained package that pairs a research finding with the data, figure, methods, and metadata needed to evaluate it. Each bundle follows the RO-Crate packaging standard and uses JSON-LD metadata with a discourse graph evidence vocabulary (dge:).

Each bundle contains:

File Description
evidence.jsonld Canonical metadata: evidence statement, observable, method, system, provenance, figure legend
ro-crate-metadata.json RO-Crate 1.1 manifest listing all bundle contents
fig*.png Static figure (primary visualization)
fig*.html Interactive figure (Plotly or HTML/JS, where applicable)
data/ Underlying data files (JSON or CSV) sufficient to regenerate the figure
methods_excerpt.md Relevant methods sections for the specific analysis

EVD 1 — Issue Conversion Rate (evd1-conversion-rate/)

29% of MATSUlab issues (n=445) were claimed as experiments.

  • Figure: Stacked bar chart showing the composition of all 445 issues (explicitly claimed, inferred, unclaimed)
  • Data: conversion_data.json

EVD 5 — Issue-to-Experiment-to-Result Flow (evd5-issue-funnel/)

Of 130 claimed experiments, 50 produced formal results (139 RES nodes), with 15% of claiming involving cross-person idea exchange.

  • Primary figure: Alluvial (Sankey) diagram showing researcher-level flow from Issue Created → Claimed By → Result Created
  • Supplemental figure: Aggregate conversion funnel bar chart
  • Data: funnel_summary.json, experiment_details.csv (anonymized)

EVD 6 — Time to Result (evd6-time-to-result/)

Among 50 experiments in the MATSUlab that produced formal results, the median time from issue claiming to first result was 12 days, with wide variance (n=50, IQR 0–50 days).

  • Figure: Swimmer plot showing experiment lifecycles from issue creation through claiming to result production (linear, log-scale, and interactive versions)
  • Data: data/time_to_result_data.json (summary statistics and per-experiment timing intervals)

EVD 7 — Undergraduate Researcher Onboarding (evd7-student-onboarding/)

Three undergraduate researchers tracked in this analysis each produced a formal result within ~4 months, with two reaching their first result within ~1 month.

  • Figure: Pin/stem timeline showing four milestones (lab start, first experiment, first plot, first result) for three researchers
  • Data: student_milestones.json

Analysis Notebook

notebooks/evd1_evd7_analysis.ipynb is a pre-executed Jupyter notebook that walks through the full analysis pipeline, from raw data loading through metric computation to each evidence bundle. It serves as a transparent trace from data to results. The notebook:

  • Loads and parses the discourse graph exports (JSON-LD + Roam JSON)
  • Shows issue classification, claiming detection, and attribution logic
  • Computes each metric with inline commentary
  • Generates the data underlying each evidence bundle

Note: The raw data files are not included in this repository (they contain identifiable information). The notebook is pre-executed with all outputs visible, so readers can follow the analysis without the source data.

Source Code

File Purpose
src/main.py Pipeline orchestrator — runs all steps end-to-end
src/parse_jsonld.py Parse JSON-LD discourse graph export
src/parse_roam_json.py Stream-parse Roam JSON export (block timestamps, experimental logs)
src/calculate_metrics.py Merge data sources and compute all metrics
src/generate_visualizations.py Generate static figures (conversion rate, time distributions, contributor breadth, idea exchange, funnel)
src/handoff_visualizations.py Generate alluvial/Sankey flow diagrams
src/experiment_lifecycle_visualizations.py Experiment lifecycle swimmer plots and result cascade visualizations
src/student_timeline_analysis.py Student onboarding timeline extraction and visualization
src/create_evidence_bundle.py Generate RO-Crate evidence bundles
src/anonymize.py Central de-identification module (researcher name → pseudonym mapping)

Conversation Log

conversation_log.md documents the iterative prompt-response process between Matt Akamatsu and Claude that produced this pipeline. User prompts are reproduced verbatim; Claude responses are summarized. Together they constitute the specification: any output can be traced to the prompt that requested it.

How results trace from data to evidence bundles

Raw data (not in repo)
  Roam JSON export (~47 MB)          JSON-LD export (~11 MB)
        │                                    │
        ▼                                    ▼
  src/parse_roam_json.py              src/parse_jsonld.py
        │                                    │
        └──────────────┬─────────────────────┘
                       ▼
              src/calculate_metrics.py
              (merge, compute 5 metrics)
                       │
           ┌───────────┼───────────────┐
           ▼           ▼               ▼
  src/generate_    src/handoff_    src/student_timeline_
  visualizations   visualizations  analysis.py
           │           │               │
           └───────────┼───────────────┘
                       ▼
           src/create_evidence_bundle.py
           src/experiment_lifecycle_visualizations.py
                       │
        ┌──────────┬───┴────┬──────────┐
        ▼          ▼        ▼          ▼
    evd1-      evd5-    evd6-      evd7-
    conversion issue-   time-to-   student-
    -rate/     funnel/  result/    onboarding/

The Jupyter notebook (notebooks/evd1_evd7_analysis.ipynb) executes this same pipeline interactively, showing intermediate results at each step.

De-identification

Researcher names have been anonymized throughout all outputs:

  • Lab members are labeled R1–R11
  • Undergraduate researchers in EVD 7 are labeled Researcher A, B, C
  • The PI (Matt Akamatsu) remains identified as evidence bundle creator

The mapping is maintained in src/anonymize.py and applied consistently across all generated data files, visualizations, and notebook outputs.

Running the pipeline

pip install -r requirements.txt
python src/main.py

Requires the raw Roam Research exports in graph raw data/ (not included in this repository).

Source material

Contact The Discourse Graphs Project for read access to the following source material:

License

This work is licensed under CC-BY-4.0. See LICENSE for the full text.

Attribution

About

Metrics and evidence bundles from an analysis of the MATSUlab discourse graph: issue exchange, knowledge production, and researcher onboarding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors