Grok 4.1 'Sees Real Time'
Grok-4 This Is Engineering : Pexels

Grok-4 has achieved a 126 IQ score in the latest TrackingAI benchmark, placing it in the number two position and performing remarkably close to Google's newly released Gemini 3 Pro.

The result highlights the model's continuing strength despite having launched more than four months ago, with data cited by XFreeze on X stating that Grok-4 still outperforms nearly every top model currently available.

The latest figures have intensified industry interest in how established models are keeping pace with rapid AI advancements.

TrackingAI Benchmark Shows Tight Competition

TrackingAI, a widely watched benchmark used to assess reasoning performance across frontier models, has become a key reference point for evaluating AI capability.

Its scoring system uses a human-comparable IQ scale to measure analytical reasoning, consistency and problem-solving across a wide range of test categories.

The 126 IQ score places Grok-4 firmly in the upper tier of modern AI systems and indicates a high level of cognitive performance in tasks designed to mirror human logic.

Gemini 3 Pro currently holds a marginal lead in the benchmark, but the narrow gap suggests a more competitive landscape than expected.

The TrackingAI dataset shows that Grok-4 is positioned closely enough that small variations in future evaluations could shift rankings.

Some commentary on AI benchmarking suggests that IQ-style metrics offer a simpler way to compare model reasoning performance, even though the underlying assessments remain highly technical.

Grok-4's Consistency Stands Out Months After Launch

Elon Musk also highlighted the benchmark results on X, resharing the XFreeze post and writing 'Not bad for now' in response to Grok-4's showing.

What makes the result notable for researchers and developers is the timing. Grok-4 was released over four months ago, yet its performance continues to match or exceed many of the newest models entering the market.

This stability has become a point of interest for organisations that rely on predictable performance in long-term deployments.

Grok-4 continues to place near the top of publicly reported TrackingAI results, demonstrating consistent performance months after its launch.

As AI developers race to release upgraded systems, the endurance shown by Grok-4 suggests that its architecture and training approach have allowed it to retain relevance far longer than some expected.

Grok-4 Versus Gemini 3 Pro in Performance Race

The comparison between Grok-4 and Gemini 3 Pro has become one of the most closely watched matchups in the AI field this year.

While Gemini 3 Pro is a newer model designed to push reasoning and general intelligence forward, the TrackingAI figures indicate that Grok-4 is delivering results that are close enough to influence perceptions of leadership in the sector.

For enterprise users and researchers, performance proximity between the two models provides more choice and competitive pressure.

It also shows that benchmark leadership is no longer guaranteed by release date alone, with older models demonstrating an ability to keep pace under certain evaluation conditions.

Wider Implications for AI Research and Model Development

The strong performance also has wider implications for the industry. As IQ-style benchmarks become more prominent in capability discussions, models like Grok-4 illustrate how reasoning benchmarks can remain relevant indicators of practical usefulness.

High scores in these evaluations often correspond with improved handling of complex tasks such as data interpretation, structured problem-solving and multi-step reasoning.

With developers placing greater emphasis on reasoning and continuous improvement, the close margin between Grok-4 and Gemini 3 Pro signals an increasingly competitive environment across major AI research organisations.

The TrackingAI benchmark results show that the race to lead in general intelligence remains intense, with Grok-4's continued strength reinforcing that top-tier performance is no longer dominated solely by the newest releases.