Exploring the Vulnerabilities in Detecting AI-Generated Media

High-level illustration of how invisible watermarking works. Image by author.

Watermarks are all over the internet — and for obvious reasons. How else could you protect your art or photography from ending up in someone’s PowerPoint presentation without crediting the creator? The simplest way of addressing this problem is to create visible watermarks like the one below.

Example of a visible watermark. Image by author based on DALL-E 3.

The primary downside of this method is that it can compromise the art itself. No one would purchase and use the cat image like this. Therefore, while mitigating unauthorized copies, perceptible watermarks can also discourage the target audience from using the art.

In the music domain, perceptible watermarks are also common in free Hip-Hop beats. Beat producers often insert a voice sample with their brand name right before the first verse starts. This can serve either as a safeguard against illegal downloads or as a marketing tool when the beat is free-to-use.

An example of a Hip-Hop beat with an audible watermark at ~10 seconds. “Solitude” by Direct Beats.

For stock photos and Hip-Hop beats alike, a common practice is to place watermarks on the online previews and send the original product to clients after payment. However, this is also prone to misuse. As soon as the watermark-free product is purchased, it can be copied and reuploaded to the internet.

Protection of Intellectual Property

Imperceptible watermarks come with a distinct advantage: You can prove ownership over any digital copy of your product without negatively affecting product quality. It’s like a piece of paper with invisible ink on it. The paper is fully functional, but it carries a secret message that can be revealed at any time.

Example of an imperceptible watermark. Lemon juice can be used as invisible ink. It can be made visible through heat. Watch this video for a demonstration. Image by author.

With this technology, creators can encode any kind of message within their works. More importantly, as they have access to the decoder, they can always assert ownership over any digital copy of their original work. Another emerging opportunity for rights-holders is to use web crawlers to search the web and report any detected misuse.

Detection of AI-Generated Content

Another valuable application for imperceptible watermarks is for detecting AI-generated content. The advent of ChatGPT and similar AI tools has raised concerns about the potential overflow of dangerous AI-generated content on the internet. Tech companies like Meta or Google are bringing forward imperceptible watermarking systems as technological breakthroughs to mitigate this problem. Their tools can add watermarks to images or music without any noticeable change in quality.

In principle, this is a noteworthy development. With imperceptible watermarks, only the owner of the technology can decode and detect the presence of such watermarks. Using our example from above, Meta & Google own both the invisible ink and the means to reveal it. This allows them to accurately detect and filter content generated with their own tools on their platforms (e.g. Instagram, YouTube). Through collaborations, even independent platforms like X (former Twitter) could use this tech to limit AI-generated misinformation or other harmful content.

AI providers like Meta or Google are building their own watermarking systems to detect their own generated content — or sell others the ability to do so. Image by author.

Although imperceptible watermarks sound promising and are being promoted by big tech companies, they are far from perfect. In fact, many of these watermarks can be reliably removed using smart AI algorithms. But how can AI remove something that is imperceptible?

Removing Perceptible Watermarks

Let’s start by understanding how perceptible watermarks can be removed with AI. Let me propose a simple approach: Start by collecting hundreds of thousands of images from the web. Then, automatically add artificial watermarks to these images. Make sure they resemble real watermarks and cover a wide variety of fonts, sizes, and styles. Then, train an AI to remove watermarks by repeatedly showing it pairs of the same image — once with and once without the watermark.

While there are certainly more sophisticated approaches, this illustrates the ease with which watermarks can be removed if the AI is trained to recognize their appearance or sound. There are numerous tools online that allow me to easily remove the watermark from my cat image above:

Watermark removed using watermarkremover.io. In this example, both the image and the watermark are artificial. Please don’t use such tools to undermine the intellectual property of others.

Removing Imperceptible Watermarks

To employ this simple approach from above, you need to provide the AI with the “before and after” examples. However, if the watermarks are imperceptible, how can find these examples? Even worse, we can’t even tell if a watermark is present or not just by looking at an image or listening to a song.

To solve this problem, researchers had to get creative. Zhao et al., 2023 came up with a two-stage procedure.

  1. Destroy the watermark by adding random noise to the image
  2. Reconstruct the real image by using a denoising algorithm
Two-stage procedure for removing imperceptible watermarks on images. Adapted from Zhao et al., 2023.

This is brilliant, because it challenges the intuition that, in order to remove a watermark, you must be able to detect it. This approach can’t locate the watermark. However, if the only goal is to remove the watermark, simply destroying it by adding enough white noise to the image is quick and effective.

Of course, after adding noise, you might have broken the watermark, but you end up with a noisy picture. The most fascinating part is how the authors then reconstructed the original image from the noise. For that, they used AI diffusion models, such as the ones used in DALL-E 3 or Midjourney. These models generate images by iteratively turning random noise into realistic pictures.

How diffusion models generate images from noise. Taken from David Briand.

As a side effect, diffusion models are also incredibly effective denoising systems, both for images and for audio. By leveraging this technology, anyone can remove imperceptible watermarks using this exact two-step procedure.

Photo by Anthony Tori on Unsplash

Yes and no. On the one hand, it seems likely that any imperceptible watermarking system invented so far can be broken by bad actors through one method or the other. When I posted about this problem on Linkedin for the first time, one person commented: “It’s the adblocker blocker blocker game all over again”, and I couldn’t agree more.

The obvious defence against the attack approach proposed by Zhao et al. (2023) is to develop an invisible watermarking system that is robust to it. For instance, we could train our watermarking system in a way that current SOTA diffusion models cannot reconstruct the image well after removing the watermark with random noise. Or we could try to build a watermark that is robust to random noise attacks. In either case, new vulnerabilities would quickly be found and exploited.

So are imperceptible watermarks simply useless? In a recent article, Sharon Goldman argues that while watermarks might not stop bad actors, they could still be beneficial for good actors. They are a bit like metadata, but encoded directly into the object of interest. Unlike MP3 metadata, which may be lost when the audio is converted to a different format, imperceptible watermarks would always be traceable, as they are embedded directly in the music itself.

However, if I am honest with myself, I was hopeful that imperceptible watermarks could be a viable solution to flagging and detecting AI-generated content. Apparently, I was wrong. These watermarks will not prevent bad actors from flooding the internet with harmful AI-generated content, by and large.

Image generated by the author using DALL-E 3.

Development of Countermeasures

As highlighted above, developing countermeasures to known attack algorithms is always an option. In many cases, however, it is easier for the attackers to iterate on their attack algorithms than for the defenders to develop safeguards. Still, we can’t neglect the possibility that we might discover a new approach to watermarking that isn’t as easily breakable. It is therefore definitely worth investing time and resources into further research on this topic.

Legal Consequences Against Watermark Attackers

While generating images with AI and uploading them to a social media platform is generally not considered illegal, purposefully removing watermarks from AI-generated images could very well be. Having no legal expertise myself, I can only argue that it would make sense to threaten legal consequences against such malicious actions.

Of course, the normal users resharing images they found online should be excluded from this. However, purposefully removing watermarks to spread misinformation is clearly immoral. And even if legal pressure will not eradicate misuse (it never has), it can be one mitigating factor.

Rethinking Proofs of Ownership

Many approaches exist around how blockchain technology and/or smart contracts could help prove ownership in the digital age. A blockchain, in simple terms, is a information storage that tracks interactions between members of a network. Each transaction can be uniquely identified and can’t be manipulated at any later point in time. Adding smart contracts to this network allows us to connect transactions to binding responsibilities that are automatically fulfilled once the transaction is done.

In less abstract terms, blockchains and smart contracts could be used in the future to automate ownership checks or royalty payments for intellectual property in any shape or form. So far, no such system has found widespread adoption. Still, we might be only a few technical breakthroughs away from these technologies becoming invaluable assets in our economies.

Digital watermarks have been used since the the early days of the internet to prevent misuse of intellectual property such as images or music. Recently, it has been discussed as a method for flagging and detecting AI generated content. However, it turns out that AI is not only great at generating fake images. It is just as good at removing any kind of watermark on these images, rendering most detection systems useless.

It is clear that we can’t let this discourage us in searching for alternative ways of proving ownership in the age of AI. By developing concrete technical and legal countermeasures and, at the same time, exploring how blockchains and/or smart contracts could be leveraged in the future, we might just figure out how to solve this important problem.