The chatbot AI can now “see, hear and speak”. In a major upgrade from OpenAI, ChatGPT gains voice chat and can discuss images.
“We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about,” OpenAI said in its official blog post.
OpenAI also said that it will include voice chat powered by a novel text-to-speech model able to mimic human voices and the ability to discuss images. The new features seem to be part of something called GPT Vision.
The OpenAI API’s new GPT-4 with Vision capability enables the GPT-4 model to recognize photos and provide information about them. In the API, this functionality is occasionally referred to as GPT-4V or gpt-4-vision-preview. In the past, language model systems were restricted to processing text as their only input modality. This limited the domains in which models such as GPT-4 could be applied for a large number of application cases. All developers that have access to GPT-4 through the gpt-4-vision-preview model and the new Chat Completions API—which now accepts picture inputs—can now access GPT-4 with Vision.
Microsoft, aiming to lead the AI assistant race, invested $10 billion plus into OpenAI. Meanwhile, Microsoft 365 Chat applies OpenAI’s natural language mastery to automate complex work tasks.
“OpenAI’s goal is to build AGI that is safe and beneficial,” OpenAI wrote in its announcement. “We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future.”