How It Works¶

eigenhelm evaluates code quality by measuring how closely a source file's structural properties resemble those found in curated, high-quality codebases.

The pipeline¶

Source code
    │
    ▼
┌─────────────────────┐
│  Feature extraction  │  tree-sitter AST + Lizard metrics → 69-dim vector
│  (VirtueExtractor)   │  Halstead, cyclomatic, WL hash, structural features
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  PCA projection      │  Project into trained eigenspace
│  (EigenspaceModel)   │  Measure drift + alignment against manifold
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  Aesthetic scoring   │  5-dimension weighted score
│  (AestheticCritic)   │  Entropy, Birkhoff, NCD, drift, alignment
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  Attribution         │  Map score to source locations
│  (Directives)        │  Generate actionable improvement suggestions
└─────────┬───────────┘
          │
          ▼
    Decision: accept / marginal / reject

Feature extraction¶

eigenhelm parses source code using tree-sitter and extracts a 69-dimensional feature vector per code unit:

Halstead metrics (5 dims): volume, difficulty, effort, vocabulary, length
Weisfeiler-Leman hash bins (64 dims): AST structural fingerprint capturing the distribution of subtree shapes

The WL hash captures structural patterns — repetitive code, unusual nesting, idiomatic constructs — without depending on naming or formatting.

Eigenspace projection¶

The feature vector is projected into a PCA eigenspace trained on curated high-quality corpora. This projection yields two measurements:

Manifold drift: How far the code sits from the quality manifold (reconstruction error)
Manifold alignment: How well the code aligns with the principal quality directions

Low drift + high alignment = code that structurally resembles elite examples.

Aesthetic scoring¶

The final score combines five dimensions with learned weights:

Dimension	Weight	Source
Manifold drift	0.30	PCA reconstruction error
Manifold alignment	0.30	Projection onto quality axes
Token entropy	0.15	Shannon entropy of token stream
Compression structure	0.15	Birkhoff aesthetic measure (zlib)
NCD exemplar distance	0.10	Compression distance to nearest exemplar

The score is normalized to [0.0, 1.0] and compared against calibrated thresholds to produce a classification.

Training¶

Models are trained on curated corpora of high-quality code:

Collect source files from elite repositories
Extract feature vectors for each file
Fit PCA to learn the quality manifold
Store exemplars for NCD comparison
Calibrate thresholds from the score distribution

See eh train for details on training custom models.