THMI: Complexity-Graded Reasoning in Transformers

Research Statement

When you solve "2 + 2," you don't stop and deliberate - the answer just arrives. But when you encounter a multi-step word problem, something shifts: you slow down, break it apart, reason carefully. Cognitive scientists call this the difference between System 1 (fast, heuristic) and System 2 (slow, deliberate) thinking. Modern transformer architectures, however, treat every input identically - routing a trivially simple problem through the same depth of computation as a complex one, with no mechanism to adapt reasoning depth to problem difficulty.

This paper asks: can we build that adaptivity into the architecture itself? We introduce THMI (Tiered Hierarchical Multi-Path Intelligence), a transformer that processes every input simultaneously through three parallel reasoning paths - a shallow heuristic path, a medium analytical path, and a deep deliberative path - each operating in its own representational space. Rather than telling the model which path to trust, a confidence-weighted ensemble lets the architecture learn to self-allocate, naturally leaning on deeper reasoning for harder problems without any explicit complexity supervision. The result is a system that doesn't just perform well - it does so in an interpretable, cognitively-motivated way.

Architecture

The THMI architecture consists of:

Frozen FLAN-T5 Encoder (109M params) - generates shared representation h_shared (B × T × 768)
Multi-Head Context Projection - splits into three parallel paths
Shallow Path (System 1) - 1 layer, 4 heads, 256d, fast/heuristic
Medium Path (System 2a) - 2 layers, 6 heads, 768d, analytical
Deep Path (System 2b) - 4 layers, 8 heads, 768d, deliberative
Confidence-Weighted Ensemble - w_i = c_i / (c_1 + c_2 + c_3)
Trainable T5 Decoder - outputs arithmetic expressions, not direct answers

Simplified Explanation

Imagine you're a teacher grading math tests. For "what is 5 + 3?", you barely glance at it. But for a multi-step word problem, you slow down and work through it carefully. Standard AI models don't do this - they apply the same computation to every input. THMI fixes that.

Every math problem gets processed by three parallel reasoning paths at once. A confidence-weighted ensemble blends all three outputs. The model learns to assign higher confidence to deeper paths on harder problems automatically - no supervision needed.

Key Results

Performance Comparison

Model	Validation Accuracy	5-Fold CV
FLAN-T5 Baseline	91.06%	92.79% ± 0.68%
THMI (no memory)	97.97%	98.15% ± 0.43%
THMI (with memory)	98.58%	98.79% ± 0.39%

Parameter Efficiency

Model	Params	Accuracy
PMB (sequential)	315M	81.10%
MoD (adaptive routing)	223M	79.88%
THMI	290M	98.58%

Key Findings

Complexity-adaptive behavior emerges without supervision (Spearman's ρ = +0.594, p < 10⁻⁴⁰)
Three paths disagree on 88% of examples - diversity drives the +8.4% ensemble gain
Deep path rescues 60% of cases when both other paths fail (12.8× uplift)
Episodic memory gives +8.95% accuracy boost at epoch 1 but converges to same final accuracy
High-complexity inputs show higher lateral attention entropy and shallower energy decay slopes