Performance metrics across our model variants
AAII scores are estimated and based on our internal evaluation framework
Ultra Preview
65.1
AAII Score
Pro
61.0
AAII Score
Mini
45.0
AAII Score
Detailed scores for our Deca 2.5 model variants
87
Pro
83-84
Mini
82-83
Measured accuracy on 57 tasks
Ultra Preview
88
Pro
82-83
Mini
74-75
Google-Proof Q&A accuracy
Note: Scores for Pro and Mini models are estimated ranges based on internal testing
Our approach to evaluating model performance
We evaluate our models using a combination of direct testing and estimation based on internal benchmarks. The Artificial Analysis Intelligence Index (AAII) is our internal scoring system designed to aggregate performance across multiple benchmarks into a single comparable metric.
The Artificial Analysis Intelligence Index (AAII) is our internal scoring system that weights performance across multiple benchmarks to produce a single comparable score. These scores are estimates and should not be directly compared to other benchmarking systems.
The benchmarks and scores presented here are estimates and subject to change. The AAII scores are our internal metrics and may not be directly comparable to other benchmarking systems. Pro and Mini model scores are estimated based on internal testing and comparative analysis. All benchmarks are still in progress and final scores may differ.