How It Works¶
eigenhelm evaluates code quality by measuring how closely a source file's structural properties resemble those found in curated, high-quality codebases.
The pipeline¶
Source code
│
▼
┌─────────────────────┐
│ Feature extraction │ tree-sitter AST + Lizard metrics → 69-dim vector
│ (VirtueExtractor) │ Halstead, cyclomatic, WL hash, structural features
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ PCA projection │ Project into trained eigenspace
│ (EigenspaceModel) │ Measure drift + alignment against manifold
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Aesthetic scoring │ 5-dimension weighted score
│ (AestheticCritic) │ Entropy, Birkhoff, NCD, drift, alignment
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Attribution │ Map score to source locations
│ (Directives) │ Generate actionable improvement suggestions
└─────────┬───────────┘
│
▼
Decision: accept / marginal / reject
Feature extraction¶
eigenhelm parses source code using tree-sitter and extracts a 69-dimensional feature vector per code unit:
- Halstead metrics (5 dims): volume, difficulty, effort, vocabulary, length
- Weisfeiler-Leman hash bins (64 dims): AST structural fingerprint capturing the distribution of subtree shapes
The WL hash captures structural patterns — repetitive code, unusual nesting, idiomatic constructs — without depending on naming or formatting.
Eigenspace projection¶
The feature vector is projected into a PCA eigenspace trained on curated high-quality corpora. This projection yields two measurements:
- Manifold drift: How far the code sits from the quality manifold (reconstruction error)
- Manifold alignment: How well the code aligns with the principal quality directions
Low drift + high alignment = code that structurally resembles elite examples.
Aesthetic scoring¶
The final score combines five dimensions with learned weights:
| Dimension | Weight | Source |
|---|---|---|
| Manifold drift | 0.30 | PCA reconstruction error |
| Manifold alignment | 0.30 | Projection onto quality axes |
| Token entropy | 0.15 | Shannon entropy of token stream |
| Compression structure | 0.15 | Birkhoff aesthetic measure (zlib) |
| NCD exemplar distance | 0.10 | Compression distance to nearest exemplar |
The score is normalized to [0.0, 1.0] and compared against calibrated thresholds to produce a classification.
Training¶
Models are trained on curated corpora of high-quality code:
- Collect source files from elite repositories
- Extract feature vectors for each file
- Fit PCA to learn the quality manifold
- Store exemplars for NCD comparison
- Calibrate thresholds from the score distribution
See eh train for details on training custom models.