Deca 2.5 Benchmarks

Performance metrics across our model variants

Deca 2.5 models show strong performance across benchmarks with our Artificial Analysis Intelligence Index (AAII)

AAII scores are estimated and based on our internal evaluation framework

Ultra Preview

65.1

AAII Score

Pro

61.0

AAII Score

Mini

45.0

AAII Score

Benchmark Performance

Detailed scores for our Deca 2.5 model variants

MMLU

Ultra Preview

87

Pro

83-84

Mini

82-83

Measured accuracy on 57 tasks

GPQA

Ultra Preview

88

Pro

82-83

Mini

74-75

Google-Proof Q&A accuracy

Note: Scores for Pro and Mini models are estimated ranges based on internal testing

Performance Visualization

AAII Scores Comparison

MMLU & GPQA Performance

Benchmark Methodology

Our approach to evaluating model performance

Methodology Summary

We evaluate our models using a combination of direct testing and estimation based on internal benchmarks. The Artificial Analysis Intelligence Index (AAII) is our internal scoring system designed to aggregate performance across multiple benchmarks into a single comparable metric.

Key Points:

  • Ultra model has been directly tested on MMLU and GPQA benchmarks
  • Pro and Mini scores are estimated based on internal testing and comparative analysis
  • AAII scores are our internal aggregation metric and not comparable to other benchmark suites
  • All scores are subject to change as we continue testing and refining our models

About AAII:

The Artificial Analysis Intelligence Index (AAII) is our internal scoring system that weights performance across multiple benchmarks to produce a single comparable score. These scores are estimates and should not be directly compared to other benchmarking systems.

Important Disclaimer

The benchmarks and scores presented here are estimates and subject to change. The AAII scores are our internal metrics and may not be directly comparable to other benchmarking systems. Pro and Mini model scores are estimated based on internal testing and comparative analysis. All benchmarks are still in progress and final scores may differ.