How AI systems are tested, measured, and compared against each other
It’s World Quantum Day , so why wouldn’t quantum stocks be ripping? Shares of and its quantum computing peers surged Tuesday as IonQ disclosed new contracts and a new set of benchmarks, investors pou...
The flood of new AI models with increasingly advanced “reasoning” capabilities is forcing the AI industry to abandon early benchmark tests and invent new ones to test for many skills. To watch the evo...
If you want to know who’s up and who’s down in the AI model world, look no further than LMArena’s leaderboard . The startup has just raised a $150 million series A fundraising round, with a valuation ...
has announced a breakthrough that addresses a key challenge in developing superconducting gate-based quantum computers: how to gather a ton of quantum bits (or qubits) in the same place while keeping ...
How do you measure what an AI model can do? You ask it to spell strawberry, make a video of Will Smith eating spaghetti, or do some basic math. But, once you’ve exhausted all of the obvious tests, you...
One of the biggest fears fueling the public’s apprehension toward AI is that the technology will eventually take their jobs . We’ve already seen evidence that some roles like entry-level software deve...
This weekend, surprised everyone and released two flavors (“Maverick” medium and “Scout” small) of its highly anticipated Llama 4 AI model. Llama 4’s release is a big deal, as the company has been hy...
In December, OpenAI CEO Sam Altman announced that its new o3 “reasoning” model had, for the first time, achieved a winning score on the ARC-AGI benchmark, a notoriously difficult test that had stumped...
A consensus is emerging in AI circles that the way forward involves models that use “chain of reasoning” to get better performance, at the expense of costlier computing resources. This process involve...
OpenAI can terminate its $13 billion partnership with when the startup achieves artificial general intelligence — a squishy, debated term that generally means when AI can perform any tasks a human do...
What we’re talking about this week on the podcast: Mark Zuckerberg and Elon Musk teaming up against OpenAI, Databricks raising a massive Series J, and Google’s quantum-computing breakthrough sending s...
AI companies are eager to show how much “smarter” and more capable their latest large language models are. To highlight these improvements, companies point to scores on widely used standardized tests ...
When a car manufacturer develops a new vehicle, it delivers one to the National Highway Traffic Safety Administration to be tested. The NHTSA drives the vehicle into a head-on collision with a concret...
Bad news for the “emotion AI system” tasked with determining how you’re feeling in everything from job interviews to the classroom: Scientists can’t tell what anyone is feeling from their facial expre...
The state of AI As AI has gone mainstream, companies haven’t been shy in deploying the technology — applying it to manual and repetitive tasks and even citing it as a reason for mass layoffs . But ho...