Elon Musk's AI company is releasing a new version of its chatbot, Grok
Following user claims of seamless performance across devices, Elon Musk praised Grok Vision, the new xAI feature launched in April that enables the chatbot to interpret images using computer vision. AFP News

Elon Musk recently praised Grok's new vision capabilities, following widespread claims from users on social media that the AI is performing well on both Apple and Android devices. This positive reception has sparked significant interest, but it also raises a crucial question.

Beyond the smooth performance and the celebrity endorsement, what is the actual mechanism powering Grok's operations? We explore the technology behind xAI's latest innovation.

Grok Vision: Smooth Performance and High Praise

In a reply to an X post from user X Freeze (@amXFreeze), Musk celebrated Grok Vision, claiming, 'Grok Vision can understand pretty much anything you point the camera at'. Responding to a post by the X user that encouraged people to 'Try Grok Vision on both iOS and Android', the billionaire's praise came after X Freeze wrote, Just point your phone and ask... It's incredibly good'.

The X user went on to add, 'Grok analyses what you see, explains it, translates text, and even finds products and answers all your questions.... It's smart and fast and super easy to use'.

This praise from the public and the billionaire himself brings us to the core of the matter: what exactly is Grok Vision, and how does this technology function?

xAI, the artificial intelligence company founded by Elon Musk, introduced a new update to its Grok chatbot in April, giving it the ability to 'see' the world. This feature enables the chatbot to process and understand visual information, marking a major advance in how people interact with AI.

Ebby Amir, a member of xAI's technical staff, announced the news in an X post, stating: 'Introducing Grok Vision, multilingual audio, and real-time search in Voice Mode. Available now'.

The Technology Powering Grok's Visual Leap

Grok's upgraded chatbot now uses advanced computer vision technology to analyse images and videos, providing responses that are aware of their context. For example, users can upload a photo of a product, and Grok can identify it, suggest potential uses, or even recommend similar items. This new capability connects the text-based nature of AI with real-world applications, making Grok more useful and intuitive.

The new vision feature opens up possibilities in numerous sectors, including e-commerce, education, and healthcare, by improving user experiences through visual understanding. From helping to diagnose medical conditions by analysing images to assisting with design projects, Grok's new ability is poised to transform how AI is woven into our daily routines.

This update from xAI positions the company as a formidable contender in the artificial intelligence arena, directly challenging industry leaders such as OpenAI and Google.

Grok 5: Musk's Next Major AI Move?

Regarding Grok's capabilities, Elon Musk recently made a bold prediction that the upcoming Grok 5 model could achieve artificial general intelligence (AGI).

Responding to an X post that celebrated Grok 4's record-breaking results on AGI benchmarks — where it beat its own previous top score in open program synthesis — the entrepreneur commented: 'I now think @xAI has a chance of reaching AGI with @Grok 5'. He then added, 'Never thought that before'.

The original X user, @amXFreeze, continued by stating that 'No other model even comes close and has not passed Grok 4 previous raw performance. Currently, Grok is more closer to AGI than any other AI models'.

Musk's high praise and user endorsements confirm that Grok Vision is a significant technical leap. However, as the chatbot continues to evolve and chase the elusive goal of AGI, the focus must remain on the mechanics — because understanding how this technology works is more critical than the smooth experience alone.