Elon Musk Hails Grok Vision After Netizens Claim It Runs Smoothly Across iOS and Android — But How Does It Actually Work?
Musk now boldly predicts that the upcoming Grok 5 model could achieve AGI

Elon Musk recently praised Grok's new vision capabilities, following widespread claims from users on social media that the AI is performing well on both Apple and Android devices. This positive reception has sparked significant interest, but it also raises a crucial question.
Beyond the smooth performance and the celebrity endorsement, what is the actual mechanism powering Grok's operations? We explore the technology behind xAI's latest innovation.
Grok Vision: Smooth Performance and High Praise
In a reply to an X post from user X Freeze (@amXFreeze), Musk celebrated Grok Vision, claiming, 'Grok Vision can understand pretty much anything you point the camera at'. Responding to a post by the X user that encouraged people to 'Try Grok Vision on both iOS and Android', the billionaire's praise came after X Freeze wrote, Just point your phone and ask... It's incredibly good'.
The X user went on to add, 'Grok analyses what you see, explains it, translates text, and even finds products and answers all your questions.... It's smart and fast and super easy to use'.
This praise from the public and the billionaire himself brings us to the core of the matter: what exactly is Grok Vision, and how does this technology function?
xAI, the artificial intelligence company founded by Elon Musk, introduced a new update to its Grok chatbot in April, giving it the ability to 'see' the world. This feature enables the chatbot to process and understand visual information, marking a major advance in how people interact with AI.
Grok Vision can understand pretty much anything you point the camera at https://t.co/9IIpuhc32Z
— Elon Musk (@elonmusk) September 26, 2025
Ebby Amir, a member of xAI's technical staff, announced the news in an X post, stating: 'Introducing Grok Vision, multilingual audio, and real-time search in Voice Mode. Available now'.
The Technology Powering Grok's Visual Leap
Grok's upgraded chatbot now uses advanced computer vision technology to analyse images and videos, providing responses that are aware of their context. For example, users can upload a photo of a product, and Grok can identify it, suggest potential uses, or even recommend similar items. This new capability connects the text-based nature of AI with real-world applications, making Grok more useful and intuitive.
The new vision feature opens up possibilities in numerous sectors, including e-commerce, education, and healthcare, by improving user experiences through visual understanding. From helping to diagnose medical conditions by analysing images to assisting with design projects, Grok's new ability is poised to transform how AI is woven into our daily routines.
This update from xAI positions the company as a formidable contender in the artificial intelligence arena, directly challenging industry leaders such as OpenAI and Google.
Grok 5: Musk's Next Major AI Move?
Regarding Grok's capabilities, Elon Musk recently made a bold prediction that the upcoming Grok 5 model could achieve artificial general intelligence (AGI).
Responding to an X post that celebrated Grok 4's record-breaking results on AGI benchmarks — where it beat its own previous top score in open program synthesis — the entrepreneur commented: 'I now think @xAI has a chance of reaching AGI with @Grok 5'. He then added, 'Never thought that before'.
I now think @xAI has a chance of reaching AGI with @Grok 5. Never thought that before. https://t.co/FaBUYegl3D
— Elon Musk (@elonmusk) September 17, 2025
The original X user, @amXFreeze, continued by stating that 'No other model even comes close and has not passed Grok 4 previous raw performance. Currently, Grok is more closer to AGI than any other AI models'.
Musk's high praise and user endorsements confirm that Grok Vision is a significant technical leap. However, as the chatbot continues to evolve and chase the elusive goal of AGI, the focus must remain on the mechanics — because understanding how this technology works is more critical than the smooth experience alone.
© Copyright IBTimes 2025. All rights reserved.