Google's latest AI tool substitutes visual cues for text

“Whisk,” Google’s newest artificial intelligence product, allows users to upload photographs and receive a merged, AI-generated image without having to enter any language to specify what they want.

Images of subjects, settings, and styles can be entered by users, and Whisk will merge them into a single image.

Google stated in a blog post that Whisk is a “creative tool” for instant inspiration rather than a “traditional image editor.” In essence, Whisk is not meant to be a sophisticated professional tool; rather, it is meant to be a fun AI function.

Even while critics caution that the absence of controls over the advancement of AI presents risks to mankind, major tech firms like Google and OpenAI are rushing to produce consumer goods that can demonstrate applications for the trendy new technology.

Since OpenAI first introduced Dall-E, a tool for creating text-to-images, in 2021, the idea of AI-generated art has taken over social media and become a popular theme in consumer goods. Building on the well-liked idea of text-to-image generators, Google’s Whisk is an image-to-image generator.

By altering their inputs and combining the categories, users of Whisk can “remix” the final image to create other images, such as stickers, enamel pins, or plush toys. Although creating an image is not necessary, users can add text if they wish to guide certain aspects.

According to a statement from Thomas Iljic, director of product management at Google Labs, “Whisk is designed to allow users to remix a subject, scene, and style in new and creative ways, offering rapid visual exploration instead of pixel-perfect edits.”

The generative AI created by DeepMind, the AI research center that Google purchased in 2014, serves as the foundation for Google’s Whisk.

Whisk operates by combining Imagen 3, the most recent text-to-image generator from DeepMind, with Google’s fundamental AI product, Gemini, which made its debut in December 2023.

Gemini creates a caption for users’ uploaded photos, which is then put into Imagen 3. Although the finished image may deviate from the prompt, the technique captures the “essence” of the topic rather than an exact reproduction, allowing for remixing.

For instance, Google stated in a blog post that the created image may differ from the prompt images in terms of height, hairdo, or skin tone.

Google initially encountered criticism when it released Gemini’s text-to-image generator in February since it generated historically incorrect graphics.

According to the company, Whisk is still in its early phases of development and is first accessible to US users as a webpage on Google Labs.

Additionally, OpenAI recently unveiled Sora, a text-to-video generator, underscoring the competitiveness for consumer goods.

Wedbush Securities’ managing director and senior stock analyst, Dan Ives, told CNN that Whisk is just another of Google’s “flex the muscles” moments in the AI and tech battle.

As part of Google’s “treasure chest” of new goods for 2025, which also includes a new Android operating system developed in partnership with Samsung and Qualcomm, Ives stated that “DeepMind is a key asset for Google.”