Testing & research
Independent testing, reproducible benchmarks, and continuous research.
Methodology
- Reproducibility – fixed seeds where possible, temperature control.
- Multiple runs – median (p50) and p95 across ≥5 runs.
- Token accounting – input/output tokens & effective cost.
- Transparent datasets – public prompts where licensing permits.
Release cadence
We re-run affected tests when models or prices change, and version results with change logs.
Community input
We incorporate developer feedback and real-world failure cases into future test suites.