The company said GPT-4o, where the o stands for omni, is “a step towards much more natural human-computer interaction”.
The new AI model accepts any combination of text, audio, and image as input. It can then generate any combination of text, audio, and image outputs.
It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds. GPT-4o also offers improvements to vision and audio understanding compared to existing models.
OpenAI demoed the capabilities of the new model in a livestream, with the company showing capabilities including two GPT-4os interacting and singing, and how to use GPT-4o for interview prep.
GPT-4o’s text and image capabilities started to roll out from yesterday in ChatGPT, OpenAI’s chatbot.
The company said it is making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits. A new version of Voice Mode with GPT-4o in alpha will be available within ChatGPT Plus in the coming weeks.
Over the last year, vendors selling through the Channel have launched numerous offerings using ChatGPT. That includes PromptVoice’s AI integration for the provisioning of audio recordings and Logitech’s AI prompt builder.