Sep 25, 2023 2 min read

New ChatGPT features: pictures and voice commands

OpenAI is introducing innovative features that allow users to interact with the bot by speaking aloud or uploading a picture, in addition to the existing text input method. These features will initially be available to paying ChatGPT users in the coming two weeks and will be extended to other users shortly thereafter.

Voice Interaction: A Familiar Touch

The newly introduced voice interaction feature is designed to provide a user-friendly experience, akin to interacting with Alexa or Google Assistant. Users can simply tap a button, speak their question, and ChatGPT will convert it to text, process it, and respond in a synthesized voice. OpenAI's advanced Whisper model facilitates the speech-to-text conversion, promising improved and more coherent responses due to enhanced underlying technology.

Human-Like Audio Synthesis

OpenAI is also rolling out a new text-to-speech model capable of generating human-like audio from text and a few seconds of sample speech. Users will have the option to choose from five different voices for ChatGPT. The potential applications of this technology are vast, with OpenAI collaborating with Spotify to translate podcasts into various languages while maintaining the original voice of the podcaster.

However, the ability to create synthetic voices poses potential risks, including impersonation and fraud. OpenAI is addressing these concerns by restricting the broad use of this model, limiting it to specific use cases and partnerships.

Image Search: A Step Towards Multimodal Interaction

The image search feature allows users to upload a picture, and ChatGPT will analyze and respond to the query. This feature, reminiscent of Google Lens, is complemented by the app’s drawing tool, allowing users to clarify their queries further. The iterative nature of ChatGPT enables users to refine their queries and receive more accurate responses, enhancing the overall user experience.

OpenAI has implemented limitations on ChatGPT’s ability to analyze and make direct statements about people due to privacy and accuracy considerations, preventing the realization of sci-fi visions of AI identifying individuals on sight.

Balancing Innovation and Responsibility

OpenAI continues to explore the expansion of ChatGPT’s capabilities while maintaining a focus on mitigating potential problems and downsides. The company is navigating the delicate balance between innovation and responsibility by imposing deliberate limitations on the new models.

As ChatGPT evolves into a truly multimodal and versatile virtual assistant, the challenges of maintaining ethical guardrails will intensify. OpenAI remains committed to advancing the frontiers of AI technology while addressing the emerging ethical considerations and potential risks.

Conclusion

The integration of voice interaction and image search features marks a significant stride in OpenAI’s journey to make ChatGPT a more interactive and user-friendly platform. While the advancements bring forth exciting possibilities, they also underscore the importance of responsible innovation in the rapidly evolving landscape of artificial intelligence.