Skip to content

Latest commit

 

History

History
93 lines (72 loc) · 4.23 KB

File metadata and controls

93 lines (72 loc) · 4.23 KB

Character Energy Analysis

Every letter you write costs energy. Your fingers flex along one axis (~225°), your wrist pivots along another (~315°), and every curve, pen-lift, and stroke crossing adds to the bill. This project measures that cost.

We model each character as a physical stroke path using Hershey vector fonts, then compute a biomechanical production effort index — not Joules, but a dimensionless quantity that is monotone with plausible effort, internally consistent, and calibratable to observed handwriting kinematics.

What Gets Measured

For each of 65 characters (A-Z, a-z, punctuation) across Latin, Greek, and Cyrillic scripts:

Metric What it captures
Writing energy Two-axis motor cost (finger + wrist), direction-dependent
Ink distance Total path length, pen-down only
Curvature Integrated squared curvature (bending cost)
Pen lifts Number of stroke segments + pen-up travel distance
Distinctiveness Nearest-neighbor distance in shape space (confusability)
Perimetric complexity Ink² / enclosed area — how ornate the form is
Convex hull ratio How much of the bounding area the character fills
Topology Enclosed regions (Betti-1), endpoints, crossings

The directional cost model uses empirical biomechanics: finger flexion/extension is cheapest at ~225° (Thomassen & Teulings, 1983), wrist abduction at ~315° (Teulings & Maarse, 1984), with fingers costing ~1.4x wrist per unit distance (Van Galen & de Jong, 1995).

Key Findings

Writing systems optimize for a trade-off between cheapness and distinctiveness.

Pareto Frontier Left: the Pareto frontier — characters that are optimally cheap to write for their level of distinctiveness. Right: uppercase (cyan) vs lowercase (magenta) show different trade-off strategies.

Cross-Script Comparison Top-left: energy distributions across scripts. Top-right: cognate pairs (A, B, E, etc.) show near-identical energy in Latin vs Greek simplex fonts. Bottom-left: energy scales linearly with ink path. Bottom-right: Cyrillic complex (serif) has ~3x the perimetric complexity of simplex scripts.

Character Distributions Eight metrics across all 65 characters, sorted by rank. Each metric shows distinct distributional shape — energy and ink follow Zipf-like decay, while convex hull ratio is nearly uniform.

Transition Energy Top-left: characters with cheap exits tend to have expensive entries (and vice versa). Bottom-left: transition angles cluster around finger-axis and wrist-axis directions. Bottom-right: the most expensive bigrams by frequency × transition cost.

Topology Enclosed regions (Betti-1): most characters have 0 (open forms like C, L) or 1 (closed forms like O, D). Characters with more endpoints cost more energy. More crossings reduce confusability margin.

Frequency ≠ energy. Common letters are not systematically cheaper to write (r = +0.054). Writing systems don't appear to optimize individual character cost by usage frequency — they optimize the alphabet as a whole for the cheapness/distinctiveness trade-off.

Zipf vs Energy No correlation between letter frequency (Norvig 2013 English corpus) and writing energy. The alphabet is not a frequency-optimized code.

Usage

cd src/
python3 primitives.py                  # Core measurement functions
python3 measure.py                     # Extract strokes from Hershey fonts
python3 analyze.py                     # Energy distributions + Pareto frontier
python3 bigram_transition_analysis.py  # Between-character transition costs
python3 cross_script_analysis.py       # Latin vs Greek vs Cyrillic
python3 explore_correlations.py        # Metric correlation mining
python3 pareto_frontier.py             # Optimality analysis

Requirements

Python 3.8+, numpy, matplotlib, scipy

Related