Google Whisk is a new way to create AI images using image prompts. Here’s how to try it

By James On Dec 22, 2024

Google Whisk uses images as input instead of text-based directions
It is built on Google’s generative AI model Imagen 3
The experimental tool is free to try for users in the US

Google’s new AI tool makes it easier to create and remix your visual concepts. Instead of asking you to describe what you have in mind, Whisk lets you enter three image prompts: one for subject, one for scene, and one for style. Whisk takes care of the rest, making it a more intuitive way to experiment with different ideas.

While most of the best AI image generators require you to write a detailed prompt, Whisk takes care of that behind the scenes. When you place images into the web-based Whisk interface for inspiration, Google’s Gemini model automatically analyzes them and writes a detailed caption for each. These are then fed into the Imagen 3 model to create an appropriate image.

For example, you can add an image of a car as the subject and a photo of a rural landscape as the scene. You could add a watercolor style to see what Whisk creates. Press the button and you’ll get a few images based on your input.

From here it’s easy to remix the images. The interface allows you to provide additional text-based details to customize the results. You can also easily add different source images or roll the dice if you need inspiration. New results appear in pairs in the feed, making it an intuitive way to generate ideas. You can also choose to refine images by displaying the text prompt and adding more details.

Beat it up

Introducing Whisk: Less Questions, More Play | Google Labs – YouTube

Look

While Whisk is designed to eliminate the need for text-based clues, Google offers the option to refine the written clues, as the results don’t always match the source material.

In one blog post About the experimental tool, Google explains that Whisk “captures the essence of your subject, not an exact replica.” It’s only as effective as Gemini’s analysis of the images you submit. While this is generally very impressive, it also doesn’t sink in: you’d expect Whisk to take one detail out of an image, where it focuses on another detail.

The message further explains: “Since Whisk only extracts a few key features from your image, it may produce images that differ from your expectations. For example, the generated subject may have a different height, weight, hairstyle or skin color. We understand that these features may be crucial to your project and that Whisk may miss the mark, so we let you view and edit the underlying prompts at any time.

Even with these shortcomings, Whisk is an interesting application of Google’s existing AI tools. The underlying generative models are the same as if you were chatting with Gemini through the text interface. However, by relying on image input, Whisk is a more accessible and intuitive way for visual creators to play with their ideas.

Based on early feedback from digital creatives, Google calls Whisk “a new type of creative tool” intended for “fast visual exploration, not pixel-perfect editing.”

How to try Google Whisk

Google Whisk is currently only available to US users. If you’re based there, you can try it out via your web browser at labs.google/whisk.

The experimental tool is completely free to play with. Data from your experience with Whisk is fed back to Google to help refine and develop future AI products.