Model Reviews
Frontier Models Score Below 50% on ITBench-AA for Enterprise IT Tasks
Artificial Analysis and IBM release ITBench-AA, a rigorous benchmark revealing that even top-tier LLMs like GPT-4o and Claude 3.5 Sonnet struggle with complex, agentic enterprise IT workflows.
Read more →