This page contains press release content distributed by XPR Media. Members of the editorial and news staff of the USA TODAY Network were not involved in the creation of this content.

Gemini 3 Flash Crushes ChatGPT-5.2 in Accuracy Test – ORCA Benchmark Update

Distributed by EIN Presswire

New ORCA results show Gemini leading in practical math, but no AI matches the consistency of a simple calculator.

Calculators are predictable, always giving the same answer. AI is different; Mathematically, a model can get a question right today and wrong tomorrow.”

— Dawid Siuda, Researcher at ORCA

KRAKOW, POLAND, March 3, 2026 /EINPresswire.com/ — The results are in for the second ORCA (Omni Research on Calculation in AI) Benchmark, and the leaderboard looks very different than it did two months ago. Gemini 3 Flash has surged to the top, becoming the first model to solve nearly three-quarters of real-world math and logic problems correctly.

While the industry often focuses on academic tests, the ORCA Benchmark uses 500 practical questions and the kind of “messy” math people actually deal with every day. In this latest run, Gemini 3 Flash hit an accuracy rate of 72.8%, a significant jump from its previous performance. Meanwhile, ChatGPT-5.2 and DeepSeek V3.2 showed modest, steady gains, while Grok 4.1 saw its scores slip.

The “Calculator” Problem
Despite the high scores, the study highlights a lingering frustration for AI users: inconsistency. Unlike a standard calculator, which gives the same answer every time, these AI models are probabilistic. They “predict” answers rather than calculating them through fixed rules.
The ORCA team tracked this using a new “instability metric.” It turns out that even when models are wrong, they aren’t even consistently wrong.
– ChatGPT-5.2 changed its answer on 65% of persistent errors.
– Gemini 3 Flash proved more “stubborn,” changing only 46% of the time.
– DeepSeek was the most erratic, shifting its response 69% of the time.

“A calculator is predictable. Ask it the same question today or next year, and the answer stays the same,” says Dawid Siuda, researcher at ORCA. “AI doesn’t work that way. These systems are predicting the next likely word based on patterns. Mathematically, it’s possible for a model to get a question right today and wrong tomorrow.”

Winners and Losers by Category
The progress wasn’t even across the board. Gemini saw massive gains in biology, chemistry, and physics, but actually dropped slightly in engineering. DeepSeek V3.2, now out of its “alpha” phase, saw its biology scores skyrocket from 11% to 44%. On the flip side, Grok 4.1 struggled, losing ground in health, sport, and statistics.

What This Means for Users
The data shows that while AIs are getting better at rounding numbers and formatting results, they still trip up on core arithmetic. Calculation errors now account for 39.8% of all mistakes made across the models tested.

The takeaway? AI is a powerful assistant, but it’s not a replacement for a human eye or a calculator.

About ORCA Benchmark
The ORCA Benchmark (Omni Research on Calculation in AI) is an initiative by Omni Calculator designed to provide a genuine assessment of the mathematical capabilities of today’s leading AI chatbots. For our second iteration, we tested four prominent models: ChatGPT-5.2 (OpenAI), Gemini 3 Flash (Google), Grok-4.1 (xAI), and DeepSeek V3.2.
Our methodology prioritizes accessibility; we only test new models that offer a free tier to the public. This is why Anthropic’s Claude 4.5 Sonnet was not retested this round (as it has not been updated since our first report), and why DeepSeek V3.2 was included again—while the name remains the same, the model has transitioned from an alpha version to a stable release, resulting in a performance jump of over 3 percentage points. By avoiding paid-tier or private prototypes, ORCA provides a genuine assessment of the mathematical tools available to the average user right now.

For the full report, visit https://www.omnicalculator.com/reports/orca-ai-benchmark-2026-update

Reyhaneh Mansouri
Omni Calculator sp. z o.o.
+48 730 061 124
email us here
Visit us on social media:
LinkedIn

Legal Disclaimer:

EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Information contained on this page is provided by an independent third-party content provider. XPRMedia and this Site make no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact pressreleases@xpr.media

New DBIA Primer Provides Targeted Guidance for Progressive Design-Build Projects

Free resource responds to rapid growth as PDB now represents one-third of design-build procurements nationwide These

March 3, 2026

Alaska’s Wilderness Place Lodge Marks 25 Years of Family Ownership and Expansion of Remote Adventure Packages

Are you looking to book a memorable, all-inclusive wilderness escape to Alaska? Wilderness Place Lodge offers ultimate

March 3, 2026

SPORTIME Randall’s Island Launches State-of-the-Art Padel Courts

Three New Indoor Courts Designed to Deliver New York's Best Padel Experience We are launching Padel at our Flagship

March 3, 2026

TrustLogix Showcases Enterprise-Grade AI Agent Security at Gartner Data & Analytics Summit

Live demonstrations at Booth #834 show how organizations move AI agents from pilot to production with least-privilege,

March 3, 2026

Appraisal Institute Expands Global Access to Online Education Through WAVO Global Education Hub

Full Appraisal Institute education catalog now available to valuation professionals worldwide CHICAGO, IL, UNITED

March 3, 2026

KAMASI WASHINGTON TO HEADLINE INAUGURAL SANTA MONICA INTERNATIONAL JAZZ FESTIVAL, CURATED BY ARTISTIC DIR STANLEY CLARKE

Kamasi Washington headlines inaugural Santa Monica International Jazz Festival; Stewart Copeland joins Stanley

March 3, 2026

Tax season is a stress test: Smart Start helps business owners take control beyond filing

More than 60 Omaha business owners and entrepreneurs have completed Hayes & Associates’ online financial literacy

March 3, 2026

10 Years in the Making: New Guide Reimagines Siem Reap as a Long-Stay Hub for Ancient History Adventurers

Essential Siem Reap's 2026 editions offers a research-driven, on-the-ground guide to one of Southeast Asia's most

March 3, 2026

‘Books for Good’ Scholarship by J&Y Law Supports Student Community Leaders Across California

J&Y Law awards a $300 monthly Books for Good Scholarship to students making a positive impact in their communities

March 3, 2026

Transitus Capital Adds Investment Banking Veteran Sterling Smith as Managing Director

Seasoned investment banker brings 25+ years of M&A and transaction advisory experience to the firm Sterling’s

March 3, 2026

Amy Shira Teitel’s ‘Fighting for Space’ Honors Two Women Who Broke Barriers and Redefined the Space Age

This Women’s History Month, historian Amy Shira Teitel highlights the overlooked pioneers who fought for women’s place

March 3, 2026

Milwaukee’s Samad’s House Hosts Black Balloon Day 2026

MKE Mayor Cavalier Johnson and County Executive David Crowley, with community, civic, and faith leaders, will address

March 3, 2026

Pantherforge on Track to Generate $1.4M in Annual Revenue Following a Year of Strategic Travel and Team Development

Pantherforge is on track to generate $1.4M in annual revenue, driven by strong 2025 growth in team travel, training,

March 3, 2026

Alan Stalcup Scholarship Recognizes Central Catholic High School Student Jacob Martinez for Academic Achievement

The scholarship's newest recipient exemplifies the discipline and leadership Stalcup created the award to support.

March 3, 2026

Strategic Development and Investment Advisory Opportunities in Southern Nevada Introduced by YH Luxury Real Estate

LAS VEGAS, NV, UNITED STATES, March 3, 2026 /EINPresswire.com/ — As Southern Nevada continues to evolve into one of

March 3, 2026

AI Adoption Becomes Standard in the Creator Economy, Report Finds

AI now powers most creators’ workflows, but those who focus on authentic storytelling and building community trust stay

March 3, 2026

FoodieLand, The Nation’s Largest Cultural Food Festival Returns to Houston, Texas in 2026 with 250+ Tastemakers

FoodieLand, the nation’s largest cultural food festival, is returning, bringing its larger-than-life culinary

March 3, 2026

Lori Briley Fairchild Earns Females of Fiction Finalist Recognition for Empowering Hockey Novel No Girls Allowed

Lori Briley Fairchild’s novel empowers girls to break barriers and pursue their dreams in traditionally male-dominated

March 3, 2026

Dale Laszig’s Blog Series in Vending Times Explores Classic and Modern Payments

Industry journalist highlights payments innovation, resilience and evolving role in vending These machines were built

March 3, 2026

WhiteSands Treatment Center Publishes New Resource on Effexor Withdrawal Risks and Clinical Safety

Fort Myers, Florida – March 03, 2026 – PRESSADVANTAGE – WhiteSands Treatment Center has published a new educational

March 3, 2026

Nervous Dental Patient Care Congleton Cheshire Dentist Dr Mehdi Yazdi Recommends Sedation Appointments at Crown Bank Dental

CONGLETON, UK – March 03, 2026 – PRESSADVANTAGE – Crown Bank Dental Congleton has announced the availability of

March 3, 2026

Silverback AI Chatbot Announces Continued Development of AI Chatbot Platform to Support Structured Digital Communication

New York, New York – March 03, 2026 – PRESSADVANTAGE – Silverback AI Chatbot has announced continued development and

March 3, 2026

Big Easy SOD Publishes Watering Guide to Help New Orleans Homeowners Maintain Healthy Zoysia Lawns

March 03, 2026 – PRESSADVANTAGE – Big Easy SOD published a detailed watering guide for zoysia grass lawns. The

March 3, 2026

NOLLYWOOD ACTION THRILLER, SON OF THE SOIL, SET FOR U.S THEATRICAL RELEASE, BEGINNING MARCH 6, 2026

Award-Winning UK–Nigeria Action Thriller Expands North American Footprint, Following U.S. Premiere at the Pan African

March 3, 2026

Clear View Consultants Relocates to Chicago, Sets Leadership Promotion Goals for First Year

Clear View Consultants announced its relocation to Chicago, IL, as a strategic step in the organization’s expansion and

March 3, 2026

EVERGLOW Releases 4th Mini Album, CODE

NEW YORK, NY, UNITED STATES, March 3, 2026 /EINPresswire.com/ — Korean girl group EVERGLOW has released their 4th mini

March 3, 2026

Northern Virginia Realtor Seeks Community Partners to Host Free Downsizing Education Program for Seniors

Philippa Main is offering local organizations a free downsizing education program designed to help seniors navigate one

March 3, 2026

DLM-Distribution Highlights Simplified Engineering with the Mini-T HR Campervan

The Mini-T HR Campervan reflects growing demand for intuitive operation and simplified system design in the Class B RV

March 3, 2026

Hot Shot’s Secret™ Unveils Blue Diamond S8 Severe Duty Gear Oil SAE 75W-80 for Superior Protection and Fuel Savings

Engineered for Heavy-duty fleets, Severe Duty Equipment, and eMobility – Maximizing Fuel Savings, Service Intervals,

March 3, 2026

Strive Denver LoHi to Celebrate its Grand Opening with the Denver Metro Chamber

Strive Workspaces is invites the community to an Open House Event at Strive Denver LoHi, its newest coworking space in

March 3, 2026

Sunnyhill’s March Matchness Brings Golf, NCAA Basketball, and Community Together to Power Independence

Community golf tournament at Topgolf aims to raise the final $10,000 for backup generators protecting Sunnyhill

March 3, 2026

MAKZ Insurance Agency Positions Itself as Go-To Force for Builder’s Risk, Landlord, Daycare and Church Property Coverage

SUGAR LAND, TX, UNITED STATES, March 3, 2026 /EINPresswire.com/ — MAKZ Insurance Agency, a Texas-based firm

March 3, 2026

Human Touch® Wins Four 2026 Platinum ADEX Awards for Design Excellence

AirTech ZG PRO, Gravis ZG Chair, Novo Flex, and Super Novo X Recognized for Innovation, Performance, and Design LONG

March 3, 2026

Community Choice Aggregation Program to Commence in Laconia, NH for Residents and Small Businesses

Launching April 1st, the proven CCA program will enable the community to save on electricity and plan budgets more

March 3, 2026

Next Wave Promotions Announces Leadership Expansion Initiative and Continued Organizational Growth

Next Wave Promotions plans to double its leadership team, creating new career growth and advancement opportunities

March 3, 2026

Giorgio Rossini Joins the Quiet Power of ‘Bear-Lee: A Story’

LOS ANGELES, CA, UNITED STATES, March 3, 2026 /EINPresswire.com/ — For Giorgio Rossini, acting has always been about

March 3, 2026

The Brookbush Institute Publishes a NEW Article: ‘Muscular Endurance Training Deprioritized’

The Brookbush Institute continues to enhance education with new articles, new courses, a modern glossary, an AI Tutor,

March 3, 2026

How OneDose and Council Bluffs Fire Department Are Taking Steps to Modernize EMS Medication Safety

OneDose partners with Council Bluffs Fire to modernize EMS dosing, reduce errors, and support safer, weight-based

March 3, 2026

Study: Modern Dell laptop and peripherals drive faster productivity, better collaboration, and lower power use

Principled Technologies (PT) hands-on testing found that a Dell Pro 13 Premium and new peripherals outperformed an

March 3, 2026

Mrs. USA Earth® Secures Federal Trademark Registration, Strengthening the Mrs. Earth®️ Platform’s Protected Brands

USPTO registration affirms nationwide brand protection and positions the Mrs. Earth® International platform for

March 3, 2026