On Tuesday, Google introduced Gemini 2.5, a new set of AI reasoning models that take a moment to “think” before responding to a question.
As the first release in its new model family, Google is introducing Gemini 2.5 Pro Experimental, a multimodal reasoning AI model that the company asserts is its most advanced to date. This model can be accessed on Tuesday in the company’s developer platform, Google AI Studio, and in the Gemini app for subscribers of the $20-a-month AI plan, Gemini Advanced.
Going forward, Google states that all of its new AI models will inherently possess reasoning capabilities.
Ever since OpenAI introduced the first AI reasoning model, o1, in September 2024, the technology sector has been competing to develop models that can match or surpass its performance. Today, Anthropic, DeepSeek, Google, and xAI possess AI reasoning models that utilize additional computational resources and time to verify facts and reason through issues prior to providing a response.
AI models have reached new levels of performance in math and coding tasks thanks to reasoning techniques. Many in the tech industry believe that reasoning models will play a crucial role in AI agents—autonomous systems capable of performing tasks with minimal human involvement. These models, however, come at a higher cost.
Before, Google tried out AI reasoning models and launched a “thinking” version of Gemini in December. However, Gemini 2.5 marks the company’s most earnest effort to surpass OpenAI’s “o” series of models.
Google asserts that Gemini 2.5 Pro surpasses its earlier frontier AI models and several leading rival AI models across multiple benchmarks. Particularly, Google stated that Gemini 2.5 was designed to be outstanding at producing visually appealing web apps and agentic coding applications.
According to Google, Gemini 2.5 Pro achieved a score of 68.6% on the code-editing evaluation Aider Polyglot, surpassing leading AI models from OpenAI, Anthropic, and the Chinese AI lab DeepSeek.
However, on the SWE-bench Verified test that assesses software development skills, Gemini 2.5 Pro achieved a score of 63.8%. This result is better than those of OpenAI’s o3-mini and DeepSeek’s R1 but falls short of Anthropic’s Claude 3.7 Sonnet, which scored 70.3%.
Google reports that Gemini 2.5 Pro achieves a score of 18.8% on Humanity’s Last Exam, a multimodal test comprising thousands of crowdsourced questions across mathematics, humanities, and natural sciences, outperforming most competing flagship models.
Initially, Google announced that Gemini 2.5 Pro comes with a context window of 1 million tokens, allowing the AI model to process approximately 750,000 words at once. It exceeds the total length of the entire “Lord of The Rings” book series. In the near future, Gemini 2.5 Pro will accommodate input lengths that are twice as long.
Google has not released the API pricing for Gemini 2.5 Pro. The company announced that it will provide more information in the upcoming weeks.