Research Statement
When you solve "2 + 2," you don't stop and deliberate - the answer just arrives. But when you encounter a multi-step word problem, something shifts: you slow down, break it apart, reason carefully. Cognitive scientists call this the difference between System 1 (fast, heuristic) and System 2 (slow, deliberate) thinking. Modern transformer architectures, however, treat every input identically - routing a trivially simple problem through the same depth of computation as a complex one, with no mechanism to adapt reasoning depth to problem difficulty.
This paper asks: can we build that adaptivity into the architecture itself? We introduce THMI (Tiered Hierarchical Multi-Path Intelligence), a transformer that processes every input simultaneously through three parallel reasoning paths - a shallow heuristic path, a medium analytical path, and a deep deliberative path - each operating in its own representational space. Rather than telling the model which path to trust, a confidence-weighted ensemble lets the architecture learn to self-allocate, naturally leaning on deeper reasoning for harder problems without any explicit complexity supervision. The result is a system that doesn't just perform well - it does so in an interpretable, cognitively-motivated way.
Architecture
The THMI architecture consists of:
- Frozen FLAN-T5 Encoder (109M params) - generates shared representation h_shared (B × T × 768)
- Multi-Head Context Projection - splits into three parallel paths
- Shallow Path (System 1) - 1 layer, 4 heads, 256d, fast/heuristic
- Medium Path (System 2a) - 2 layers, 6 heads, 768d, analytical
- Deep Path (System 2b) - 4 layers, 8 heads, 768d, deliberative
- Confidence-Weighted Ensemble - w_i = c_i / (c_1 + c_2 + c_3)
- Trainable T5 Decoder - outputs arithmetic expressions, not direct answers
Simplified Explanation
Imagine you're a teacher grading math tests. For "what is 5 + 3?", you barely glance at it. But for a multi-step word problem, you slow down and work through it carefully. Standard AI models don't do this - they apply the same computation to every input. THMI fixes that.
Every math problem gets processed by three parallel reasoning paths at once. A confidence-weighted ensemble blends all three outputs. The model learns to assign higher confidence to deeper paths on harder problems automatically - no supervision needed.
Key Results
Performance Comparison
| Model | Validation Accuracy | 5-Fold CV |
|---|---|---|
| FLAN-T5 Baseline | 91.06% | 92.79% ± 0.68% |
| THMI (no memory) | 97.97% | 98.15% ± 0.43% |
| THMI (with memory) | 98.58% | 98.79% ± 0.39% |
Parameter Efficiency
| Model | Params | Accuracy |
|---|---|---|
| PMB (sequential) | 315M | 81.10% |
| MoD (adaptive routing) | 223M | 79.88% |
| THMI | 290M | 98.58% |
Key Findings
- Complexity-adaptive behavior emerges without supervision (Spearman's ρ = +0.594, p < 10⁻⁴⁰)
- Three paths disagree on 88% of examples - diversity drives the +8.4% ensemble gain
- Deep path rescues 60% of cases when both other paths fail (12.8× uplift)
- Episodic memory gives +8.95% accuracy boost at epoch 1 but converges to same final accuracy
- High-complexity inputs show higher lateral attention entropy and shallower energy decay slopes