Aspasia of Miletus (~470-400 BCE) taught Socrates the art of rhetoric and dialectic. Plato credits her in the Menexenus; Plutarch describes her as a thinker consulted on matters of state. She didn’t cross-examine for sport — she sharpened others' reasoning constructively.
This system is named for her because it corrects the corrector. StatistEase uses Socratic method (LLM routes questions, Julia answers). Aspasia audits that entire process — the teacher checking the student’s work. And like her namesake, Aspasia is not an annoying gadfly. She is a wise advisor who helps you feel more confident in your results, not less.
StatistEase computes statistics using Julia. An LLM routes questions to Julia functions and explains the results. No neural numbers — good.
But three problems remain:
-
Did Julia compute correctly? Software has bugs. Numerical libraries have edge cases. A single implementation is a single point of failure.
-
Was this the right test? The LLM chose a t-test. Should it have been Mann-Whitney? The LLM doesn’t know — it pattern-matches. It might be right. It might be confidently wrong.
-
Is the explanation accurate? The LLM says "large effect size" but Cohen’s d is 0.35. That’s small-to-medium. The number was correct; the interpretation was not.
Aspasia solves all three by providing an independent neurosymbolic audit from a completely separate codebase, in a completely different language, using a completely different reasoning engine.
This is the most important design decision in the project, and it is not overcomplicated — it is the minimum necessary for genuine independence:
| Property | StatistEase (Julia) | Aspasia (GNU Octave) |
|---|---|---|
Numerical backend |
OpenBLAS |
LAPACK/BLAS (system) |
Sorting algorithm |
Julia’s QuickSort |
Octave’s std::sort |
Statistical library |
StatsBase.jl |
Octave statistics package |
Floating-point path |
Julia’s LLVM codegen |
GCC/gfortran codegen |
Reasoning engine |
LLM tool calling |
Prolog + DeepProbLog |
Developer community |
Scientific computing |
Engineering + applied maths |
If both systems produce the same answer via different code paths, that answer is far more trustworthy than either system alone. If they disagree, that disagreement is valuable information — it reveals either a bug or a genuine numerical sensitivity.
Using the same language would mean the same library, the same bugs, the same blind spots. That is not independence. That is redundancy.
This also attracts different developers. Julia people and Octave/MATLAB people come from different disciplines, think differently about numerical problems, and catch different classes of bugs. Two independent communities competing toward the same goal — correctness — makes both systems better.
StatistEase Aspasia
┌────────────────────┐ ┌────────────────────┐
│ Julia computation │ │ Octave recompute │
│ (the numbers) │──── JSON ───►│ (same data, diff │
│ │ transaction│ code path) │
└────────┬───────────┘ └────────┬───────────┘
│ │
│ ┌────────▼───────────┐
│ │ Prolog ontology │
│ │ (was this the │
│ │ RIGHT test?) │
│ └────────┬───────────┘
│ │
│ ┌────────▼───────────┐
│ │ Interpretation │
│ │ audit (does the │
│ │ explanation match │
│ │ the numbers?) │
│ └────────┬───────────┘
│ │
▼ ▼
┌────────────────────────────────────────────────────────┐
│ USER SEES BOTH │
│ Result: t(38) = 2.847, p = .007, d = 0.90 │
│ Audit: VERIFIED — computation, test selection, and │
│ interpretation all check out. │
└────────────────────────────────────────────────────────┘"Did the computation produce the correct numbers?"
Aspasia independently recomputes every statistical result using GNU Octave. Different language, different BLAS, different floating-point code paths. If the results match within tolerance (1e-10), the numbers are verified.
"Was this the right test to run?"
A Prolog knowledge base encodes Stevens' measurement scales, test prerequisites, assumption requirements, and nonparametric alternatives. DeepProbLog extends this with probabilistic confidence.
This is genuinely neurosymbolic — the probabilities can be learned from data (neural) while the logical structure is fixed (symbolic).
"Does the LLM’s explanation accurately represent the result?"
Cross-references effect size labels against Cohen’s conventions, checks for p-value misinterpretation (ASA 2016 statement), detects significance inflation with large N, and flags missing assumption discussions.
Aspasia operates as an AUDITOR, not a gatekeeper:
-
It NEVER modifies StatistEase output
-
It NEVER prevents computation
-
It ALWAYS explains WHY it raises a concern
-
It tracks its own accuracy (precision and recall of challenges)
-
It learns from user feedback (Logtalk knowledge base)
A disagreement between two independent systems is not a failure — it is information. The magnitude, location, and nature of the disagreement tells you something about your data that neither system alone could reveal.
Before asking a human, Aspasia runs a systematic resolution protocol:
| Step | Method | Confidence |
|---|---|---|
1 |
NIST StRD reference values — certified answers to 15+ digits (McCullough & Wilson 1999) |
Definitive |
2 |
Arbitrary precision recomputation — Neumaier compensated summation at extended precision |
High |
3 |
Interval arithmetic — guaranteed enclosures; if both values fall inside, they’re compatible |
High |
4 |
Perturbation analysis — jitter inputs by 1 ULP; if output swings wildly, the problem is ill-conditioned and neither answer is reliable |
Diagnostic |
5 |
Symbolic verification — compute exact answer via sorted summation or CAS (Maxima) |
Definitive |
6 |
Escalate to human — with FULL evidence from steps 1-5 and both systems' working |
Last resort |
Most disagreements resolve at steps 1-3. Step 4 is particularly valuable: when it triggers, it means the data itself doesn’t support the precision being claimed — that’s a finding about the research, not a bug in the software.
When the resolution ladder exhausts automated methods:
-
StatistEase computes (Julia)
-
Aspasia audits (Octave + Prolog)
-
echidna arbitrates (formal proofs via GraphQL)
If echidna cannot resolve the dispute (because it’s a judgment call, not a mathematical fact), the system escalates to the human with full evidence from all three systems. It says:
"We tried our best but we are coming up against conflicts. Here is everything we checked, everything we found, and what each system thinks. You need to decide."
This is honest. It is more useful than silently picking one answer.
-
GNU Octave 8+ with the
statisticspackage -
SWI-Prolog 9+ (for ontological reasoning)
-
Logtalk 3+ (for knowledge base management)
-
Optional: DeepProbLog (for probabilistic logic)
octave --eval "pkg install -forge statistics"
octave --path src/verification:src/audit:src/interface \
--eval "audit_from_json('/path/to/transaction.json')"PMPL-1.0-or-later (Palimpsest License)
Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>