OpenAI releases GPT-5.4 in Pro and Thinking editions

On Thursday, OpenAI released GPT-5.4, a new baseline model described as “the most capable and efficient frontier model we have for professional work.” In addition to the standard version, GPT-5.4 is also available as a thinking model (GPT-5.4 Thinking) or enhanced for high performance (GPT-5.4 Pro).

The API version of the model will be available with context windows of up to 1 million tokens, the largest context window ever available from OpenAI.

OpenAI also emphasized improving token efficiency, saying that GPT-5.4 was able to solve the same problems using significantly fewer tokens than its predecessor.

The new model comes with significantly improved benchmark scores, including record scores on PC usage benchmarks verified by OSWorld and WebArena Verified. The new model also scored a record 83 percent on OpenAI’s GDPR test for cognitive work tasks.

GPT-5.4 also took the lead Mercor’s APEX Agent Standarddesigned to test professional skills in law and finance, according to a statement issued by Merkur CEO, Brendan Foody.

“(GPT-5.4) excels at creating long-term deliverables such as slide decks, financial models and legal analysis,” Foody said in the statement, delivering peak performance while working faster and at a lower cost than competitive frontier models.

GPT-5.4 continues the company’s efforts to reduce hallucinations and factual errors. OpenAI said the new model was 33% less likely to make errors in individual prompts compared to GPT 5.2, and overall responses were 18% less likely to contain errors.

TechCrunch event

San Francisco, California
|
October 13-15, 2026

As part of the launch, OpenAI reworked how it manages the GPT-5.4 API version for tool calls, introducing a new system called Tool Search. Previously, system prompts would define definitions for all available widgets when the form was called – a process that could consume a lot of tokens as the number of available widgets increased. The new system allows models to search for tool definitions as needed, resulting in faster and cheaper requests in systems with many tools available.

OpenAI has also been included New safety assessment To test the model’s train of thought, continuous feedback that the model provides to demonstrate his or her thinking process through multi-step tasks. AI safety researchers have long worried that inference models can skew their train of thought, and The test appears It can happen under the right circumstances.

A new OpenAI evaluation shows that phishing is less likely to occur in the Thinking version of GPT-5.4, “suggesting that the model lacks the ability to hide its causes and that CoT monitoring remains an effective security tool.”

Leave a ReplyCancel Reply