On Monday, Box Inc. (NYSE:BOX) CEO Aaron Levie said that advancements in artificial intelligence, from model training to enterprise deployment, depend fundamentally on how well systems can be evaluated and measured.
AI Evals Seen As Core Driver Of Model
In a post on X, Levie argued that "almost all AI model and agent progress is downstream from evals."
He added that improvements in open-weight models, post-training for specific domains, and agentic AI systems all depend on rigorous evaluation methods.
"Agent improvements in the applied AI layer is all about evals," he wrote, emphasizing that enterprise AI deployments capable of augmenting work also rely heavily on testing frameworks.
Levie further stated, "It’s all evals," underscoring his view that evaluation systems are central to the development of reliable AI agents.
AI Scaling Stalls Without Reliable Evals, Experts Say
Box CEO comments echoed a broader industry discussion highlighted by Garrett Lord, who said companies are struggling to move AI beyond pilot programs because they lack consistent ways to measure performance.
"Everyone is coming to the same realization: if you want production-quality agents that can actually do the work, it starts with evals," Lord wrote.
He added that firms often cannot "quantify how accurate their AI programs are," making scaling difficult.
AI Boom Fuels Meta And Nvidia Moves
Investors had increasingly prioritized AI exposure over profitability, with unprofitable small-cap companies outperforming profitable peers as enthusiasm for AI-linked growth strengthened.
Earlier, Meta Platforms Inc. (NASDAQ:META) introduced "AI Mode" in Facebook Search, using AI to generate answers from public content and expanding its push into AI-powered tools.
It includes photo and video editing, while positioning itself against traditional search engines.
Nvidia Corp. (NASDAQ:NVDA) CEO Jensen Huang unveiled an expanded AI infrastructure strategy, highlighting AI factories, full-stack systems, and agentic AI designed to generate business value, as the company moved beyond chips into broader computing platforms.
Disclaimer: This content was partially produced with the help of AI tools and was reviewed and published by Benzinga editors.
Photo courtesy: Alexander56891 from Shutterstock
Login to comment