AI Evaluation & Benchmarks

2026-04-14T16:13:56.554Z By: Nate Becker

Quantum stocks rip as IonQ discloses new contracts, hits benchmark milestones

It’s World Quantum Day , so why wouldn’t quantum stocks be ripping? Shares of and its quantum computing peers surged Tuesday as IonQ disclosed new contracts and a new set of benchmarks, investors pou...

https://sherwood.news/tech/quantum-stocks-rip-as-ionq-discloses-new-contracts-hits-benchmark-milestones/

2026-03-26T19:10:40.573Z By: Jon Keegan

The toughest AI benchmark just got a whole lot tougher

The flood of new AI models with increasingly advanced “reasoning” capabilities is forcing the AI industry to abandon early benchmark tests and invent new ones to test for many skills. To watch the evo...

https://sherwood.news/tech/the-toughest-ai-benchmark-just-got-a-whole-lot-tougher/

2026-01-06T20:55:49.495Z By: Jon Keegan

AI leaderboard maker LMArena hits $1.7 billion valuation

If you want to know who’s up and who’s down in the AI model world, look no further than LMArena’s leaderboard . The startup has just raised a $150 million series A fundraising round, with a valuation ...

https://sherwood.news/tech/ai-leaderboard-maker-lmarena-hits-usd1-7-billion-valuation/

2026-01-06T12:00:13.609Z By: Luke Kawa

D-Wave Quantum touts tech breakthrough that lets gate models scale

has announced a breakthrough that addresses a key challenge in developing superconducting gate-based quantum computers: how to gather a ton of quantum bits (or qubits) in the same place while keeping ...

https://sherwood.news/markets/d-wave-quantum-touts-tech-breakthrough-that-lets-gate-models-scale/

2025-11-19T16:48:09.608Z By: David Crowther

Gemini 3 is insanely good at visual reasoning... and running a vending machine

How do you measure what an AI model can do? You ask it to spell strawberry, make a video of Will Smith eating spaghetti, or do some basic math. But, once you’ve exhausted all of the obvious tests, you...

https://sherwood.news/tech/gemini-3-is-insanely-good-at-visual-reasoning-and-running-a-vending-machine/

2025-09-29T15:59:08.621Z By: Jon Keegan

How well can top AI models do these jobs?

One of the biggest fears fueling the public’s apprehension toward AI is that the technology will eventually take their jobs . We’ve already seen evidence that some roles like entry-level software deve...

https://sherwood.news/tech/how-well-can-top-ai-models-do-these-jobs/

2025-04-08T15:23:02.576Z By: Jon Keegan

Meta scrambling to defend its AI after Llama 4 benchmark bungle

This weekend, surprised everyone and released two flavors (“Maverick” medium and “Scout” small) of its highly anticipated Llama 4 AI model. Llama 4’s release is a big deal, as the company has been hy...

https://sherwood.news/tech/meta-scrambling-to-defend-its-ai-after-llama-4-benchmark-bungle/

2025-04-02T21:32:04.840Z By: Jon Keegan

OpenAI’s record-breaking test score might have cost $30,000 per puzzle

In December, OpenAI CEO Sam Altman announced that its new o3 “reasoning” model had, for the first time, achieved a winning score on the ARC-AGI benchmark, a notoriously difficult test that had stumped...

https://sherwood.news/tech/openais-record-breaking-test-score-might-have-cost-usd30-000-per-puzzle/

2025-03-10T17:06:14.089Z By: Jon Keegan

Quit the yapping: New AI technique could cut costs 90% by saying less

A consensus is emerging in AI circles that the way forward involves models that use “chain of reasoning” to get better performance, at the expense of costlier computing resources. This process involve...

https://sherwood.news/tech/quit-the-yapping-new-ai-technique-could-cut-costs-90-by-saying-less/

2024-12-26T17:24:23.505Z By: Rani Molla

Microsoft and OpenAI have agreed on a definition of artificial general intelligence

OpenAI can terminate its $13 billion partnership with when the startup achieves artificial general intelligence — a squishy, debated term that generally means when AI can perform any tasks a human do...

https://sherwood.news/tech/microsoft-and-openai-have-agreed-on-a-definition-of-artificial-general/

2024-12-20T16:59:02.965Z By: Nia Warfield

“Snacks Mix”: Zuck and Musk play nice, quantum goes to the moon, and our hot takes for 2025

What we’re talking about this week on the podcast: Mark Zuckerberg and Elon Musk teaming up against OpenAI, Databricks raising a massive Series J, and Google’s quantum-computing breakthrough sending s...

https://sherwood.news/business/snacks-mix-zuckerberg-musk-quantum-computing-2025-predictions/

2024-11-11T18:10:24.482Z By: Jon Keegan

If AI models can ace every test, it’s actually not a good thing

AI companies are eager to show how much “smarter” and more capable their latest large language models are. To highlight these improvements, companies point to scores on widely used standardized tests ...

https://sherwood.news/tech/ai-models-improved-benchmark-tests-needed/

2024-10-15T15:43:53.529Z By: Jon Keegan

The crash test dummies for new AI models

When a car manufacturer develops a new vehicle, it delivers one to the National Highway Traffic Safety Administration to be tested. The NHTSA drives the vehicle into a head-on collision with a concret...

https://sherwood.news/tech/ai-regulation-red-teaming-model-safety-checks/

2024-05-16T15:30:09.683Z By: Rani Molla

Scientists can’t figure out your emotions and AI can’t either

Bad news for the “emotion AI system” tasked with determining how you’re feeling in everything from job interviews to the classroom: Scientists can’t tell what anyone is feeling from their facial expre...

https://sherwood.news/tech/gen-ai-emotional-intelligence/

2024-04-17T15:46:05.688Z By: David Crowther

How do AI models stack up vs. humans on standardized benchmarks?

The state of AI As AI has gone mainstream, companies haven’t been shy in deploying the technology — applying it to manual and repetitive tasks and even citing it as a reason for mass layoffs . But ho...

https://sherwood.news/tech/how-do-ai-models-stack-up-vs-humans-on-standardized-benchmarks/

📊 AI Evaluation & Benchmarks