GPT-5 Launch Demo Plagued With Catastrophically Dumb Errors

OpenAI's attempt to show off its latest GPT-5 model's awesome performance states produced wildly embarrassing gaffes.

OpenAI’s GPT-5 is finally here and already powering ChatGPT, but it hasn’t made a great first impression.

In a livestream dedicated to the release, OpenAI tried to show off its newest large language model which CEO Sam Altman called a “significant step along the path to AGI”— but instead turned heads with some catastrophically dumb errors.

Across several examples, bar graphs intended to show off GPT-5’s awesome performance benchmarks, while appearing professional-looking, turned out to be horribly inaccurate nonsense upon closer inspection.

The gaffes were flagged on social media and highlighted by The Verge. The most egregious example is a bar graph comparing coding benchmark scores for GPT-5 compared to older models. Somehow, the bar for GPT-5’s score of 52.8 percent accuracy is nearly twice as tall as the bar for a score of 69.1 percent for the o3 model. Even more bafflingly, the 69.1 percent bar is the exact same size as another bar representing 30.8 percent for GPT-4o. Make it make sense!

OpenAI hasn’t confirmed if it used GPT-5 to generate the graphs — and at this point, it has every reason not to — but it’s an incredibly embarrassing mistake from a company that’s being valued in the region of half a trillion smackeroos.

It’s also a little poetic. Some research suggests that newer models could actually be getting dumber in key ways, hallucinating more frequently than earlier versions. One study even found that the longer these new reasoning models “think,” the more their performance deteriorates. Other research implicates the AI slop that’s increasingly poisoning the AI’s training data. Circling back to GPT-5’s bar graph, you have OpenAI trying to spin its lower score of 52.8 as actually being better than its predecessor’s.

Altman, playing it cool, tried to laugh off the blunder.

“[W]ow a mega chart screwup from us earlier,” he tweeted, in his typical lower-case patois. “wen GPT-6?!”

OpenAI corrected the charts in its blog post, but the originals are still there in the livestream.

Human error may or may not be to blame for the charts, but following GPT-5’s release, users were quick to expose how error-prone its image- and diagram-generating capabilities remain. One asked ChatGPT to draw a map of two cities in Virginia with their neighborhoods labeled, prompting it to return names that were complete gobbledygook

And in what should’ve been a layup for GPT-5, Ed Zitron of the “Where’s Your Ed At?” newsletter found that the AI couldn’t even nail a simple map of the US. Ever think of visiting “West Wigina,” “Delsware,” “Fiorata,” or “Rhoder land”? Or maybe “Tonnessee” and “Mississipo?”

The irony is that OpenAI bragged back in March that an update for its previous GPT-4o model meant that ChatGPT could now excel at generating texts in images.

“As you can tell now it’s very good at text,” one of the example generated images read. “Look at all this accurate text!”

Sounds like they might’ve spoken too soon. Or maybe AI models really are going backwards.

More on OpenAI: GPT-5 Users Say It Seriously Sucks

The post GPT-5 Launch Demo Plagued With Catastrophically Dumb Errors appeared first on Futurism.

Scroll to Top