How well can AI models solve (and create) rebus puzzles?

GPT-4: when prompted with: “create me a rebus puzzle for “Visual Word Puzzle”“

What does it mean for an AI to be creative?

Last year, I wrote an article about measuring creativity in Large Language Models (LLMs) using several word-based creativity tests.

Since then, AI has developed rapidly and is capable of processing and creating both text and image. These models, sometimes referred to as “Multimodal Large Language Models” (MLLMs), are extremely powerful and have advanced abilities to understand complex textual and visual inputs.

In this article, I explore one way to measure creativity in two popular MLLMs: OpenAI’s GPT-4 Vision and Google’s Gemini Pro Vision. I use rebus puzzles, which are word puzzles that require combining both visual and language cues to solve.

Creativity is extremely multi-faceted and difficult to define as a single trait. Therefore, in this article, I aim not to measure creativity in general, but to evaluate one very specific aspect of creativity.

Note [modified from my earlier article]: These experiments aim not to measure how creative AI models are, but rather to measure the level of creative process present in their model generations. I am not claiming that AI models possess creative thinking in the same way humans do. Rather, I aim to show how the models respond to particular measures of creative processes.

A rebus puzzle is a picture representation of common words or phrases. They often involve a combination of visual and spatial cues. For example, below are six examples of rebus puzzles (answers are at the end of the article).

Examples rebus puzzles from Normative Data for 84 UK English Rebus Puzzles (CC BY). One example is shown for each of the six “types” categorized in the paper.