ChatGPT Can Now See, Hear, And Speak To Users: Breaking Boundaries

On Monday, Sam Altman’s artificial intelligence startup OpenAI rolled out new upgrades for its widely popular popular generative AI chatbot – ChatGPT – that will allow it to interact verbally with users and also respond to queries including images.

According to a blog post from the company, the updated voice and image capabilities will offer “a new, more intuitive type of interface” that enables users to strike a conversation in their own voice with the large language model (LLM) or show the AI what they are talking about.

The latest upgrade marks a milestone in OpenAI’s path towards developing an artificial general intelligence model that is able to perceive and process information from multiple nodes.

ChatGPT Can Converse With Users In Human-Like Voice And Also Recognize Images

OpenAI provided video demonstrations showcasing three different use cases of ChatGPT’s voice and image capabilities. In the first scenario, the chatbot was given guided prompts of a children’s bedtime story about a “super-duper sunflower hedgehog” named Larry. ChatGPT went on to describe the story’s hedgehog protagonist and gave more details about its home and friends.

In the second example, a picture of a bicycle was uploaded to the ChatGPT smartphone app with a prompt requesting help to lower the bike seat. The AI responded with a step-by-step guide on how to get it done that included recommended tools, similar pictures uploaded by other users, and text inputs.

The company also described situations where users could upload pictures of the what’s in their refrigerators or pantry and ChatGPT would give them suggestions on meals that can be prepared using the available ingredients. Users can also upload pictures of landmarks to the ChatGPT app and the AI will strike up conversations by providing historic facts.

Spotify Is Using OpenAI’s Text-To-Speech Model To Translate Podcasts

OpenAI has implemented a new voice technology powered by a novel text-to-speech model that is well-capable of recreating human voices from hearing just a few seconds of real speech. Since developers are aware of the potential risk posed by the technology, where it can used by bad actors to impersonate public figure or commit fraud, the feature will only be made available on voice chat.

Users can choose from 5 realistic voices that were created by OpenAI with the help of professional voice actors.

Furthermore, Swedish music streaming platform Spotify is incorporating ChatGPT’s voice service to pilot its Voice Translation feature that will help podcasters convert their podcasts into additional languages in their own voices.

OpenAI Implements Image Processing Model DALL-E Into ChatGPT

ChatGPT can now recognize images, thanks to OpenAI’s text-to-image processing model DALL-E, which was integrated into the chatbot with the launch of DALL-E 3 last week. The new version of the image-generating AI can create high-fidelity pictures from text prompts while being able to understand complex contexts and concepts expressed in natural language.

The new features are part of the company’s GPT-Vision program, which is focused on expanding the capabilities of its Generative Pre-training Transformer (GPT) AI into a complete virtual assistant that could function as a key accessibility tool for users.

OpenAI is working with Be My Eyes – a mobile app designed for assisting blind and low-vision people, to understand the uses and limitations of ChatGPT’s speech and vision capabilities.

Learn more : OpenAI Releases Web Crawler – Here’s How To Keep It Out of Your Website

Microsoft Collaborates With OpenAI To Launch ‘Copilot’ AI Assistant For Windows

Microsoft – the largest backer of OpenAI – is integrating a number of the company’s advanced generative AI features into its own products and services. The tech giant had previously implemented ChatGPT and DALL-E into the Bing search engine.

At its annual Surface launch event held last week, the Seatle-based firm announced it was embedding Copilot – Microsoft’s own generative AI assistant developed in collaboration with OpenAI – into the Windows 11 operating system.

Copilot can be accessed from anywhere across the Windows desktop and will enhance users’ experience on the Paint app and Microsoft 365 Suite applications such as Word, Excel, and Powerpoint. The AI-powered Windows 11 debuted on September 26.

Microsoft made a $10 billion investment into OpenAI back in January with the intention of leading the AI assistant race. The Seatle-based tech giant is in the ring with Google, who is working on the Bard AI, and Amazon with its recently upgraded Alexa, which now has generative AI abilities.

Learn more : ChatGPT Founder Launches Crypto That Verfies Whether User Is A Human Or A Bot