University of Waterloo researchers proved that any AI watermark can be removed at least 50% of the time while preserving content quality, according to Zhao et al., University of Waterloo, 2023. Only 38% of AI image generators implement adequate watermarking, according to a Partnership on AI audit, 2025. With the EU AI Act requiring AI content labeling by August 2026, organizations need to understand what watermarking can and cannot do. This article covers the technical limitations, current deployment, and verification approaches that work regardless of whether a watermark survives.
- Any AI watermark is removable at least 50% of the time, per University of Waterloo research.
- Only 38% of AI image generators implement adequate watermarking or content provenance.
- Text watermarking via token distribution shifts is even less robust than image watermarking.
- The EU AI Act requires AI content labeling by August 2026 but does not mandate watermarking specifically.
- Verification APIs provide content-level accuracy checks that work regardless of whether a watermark is present.
How AI Watermarking Works
AI watermarking embeds a hidden signal into generated content that humans can’t perceive but machines can detect. The signal acts as a fingerprint: when a detector encounters watermarked content, it reads the embedded signal and confirms AI origin.
For images, watermarking modifies pixel values at a level below human perception. Google DeepMind’s SynthID, the most prominent image watermarking system, uses a trained neural network to embed and detect these signals, according to Google DeepMind, 2023. The embedding network learns to modify pixels in ways that survive common transformations (JPEG compression, resizing, color adjustment) while remaining invisible. The detection network learns to identify those modifications even after transformation.
For text, watermarking takes a fundamentally different approach. Since text is discrete (individual tokens rather than continuous pixel values), the watermark is embedded in the statistical distribution of token choices. During generation, the model slightly adjusts the probability of selecting certain tokens at each position. Over a passage of sufficient length, these adjustments create a detectable statistical pattern. Google open-sourced SynthID’s text watermarking component in October 2024, according to Google DeepMind, 2024.
For audio, watermarking modifies the waveform in imperceptible ways, similar to image watermarking but in the frequency domain. SynthID supports audio watermarking for AI-generated speech and music.
The appeal of watermarking is clear: if it works perfectly, any AI-generated content carries an embedded proof of origin that persists through editing, sharing, and re-distribution. The problem is that it doesn’t work perfectly.
Why Watermarks Fail: The Impossibility Result
The University of Waterloo research, published by Zhao, Pang, et al. in 2023, established a theoretical and practical impossibility result for AI watermarking, according to Zhao et al., 2023. Their core finding: for any watermarking scheme, an attacker can remove the watermark from at least 50% of watermarked content while maintaining content quality, provided the attacker has access to an unwatermarked generative model.
The attack exploits a fundamental tension. A watermark must be imperceptible (or the content quality degrades), but it must also be robust (or it can be easily removed). These two properties work against each other. Making the watermark more robust requires stronger modifications to the content, which makes it more perceptible. Making it more imperceptible requires weaker modifications, which makes it easier to remove.
Practical removal techniques include:
- Re-encoding or re-compression. JPEG re-compression at different quality levels can degrade image watermarks. Re-encoding audio or video through a different codec has similar effects.
- Regeneration attacks. Passing watermarked content through a second AI model (e.g., using an image-to-image model to reproduce the content) strips the original watermark entirely while producing visually identical output.
- Adversarial perturbations. Small, computed modifications that specifically target the watermark’s detection mechanism. These perturbations are imperceptible but destroy the watermark signal.
- Paraphrasing for text. Rephrasing AI-generated text through a different model eliminates the token distribution patterns that text watermarks rely on. Even manual editing or translation disrupts the statistical signature.
A 2024 study by researchers at UIUC and Princeton tested 7 leading text watermarking methods and found that all could be defeated by paraphrasing attacks while maintaining over 90% content similarity with the original, according to Kirchenbauer et al. (survey), 2024. Text watermarking, which relies on subtle token distribution shifts across sequences, is fundamentally less robust than image watermarking because text is discrete and easily rephrased.
Current Deployment: Who Watermarks and Who Doesn’t
Despite the theoretical limitations, several major AI companies have deployed watermarking systems in production.
Google DeepMind (SynthID)
SynthID is the most comprehensive watermarking deployment. It covers images (Imagen), text (Gemini), audio (speech synthesis), and video (Veo), according to Google DeepMind, 2023. Google applies SynthID to AI-generated content across its product suite, including Google Search AI overviews and YouTube. The text watermarking component was released as open source in October 2024 under the name SynthID Text.
Adobe (C2PA, not watermarking)
Adobe chose C2PA Content Credentials over imperceptible watermarking. Photoshop, Lightroom, and Firefly embed cryptographic provenance metadata rather than hidden signals in the content itself, according to Adobe Content Authenticity Initiative. This approach has different trade-offs: C2PA metadata can be stripped by removing metadata, but it can’t be attacked by regeneration or perturbation techniques.
OpenAI
OpenAI adds C2PA metadata to images generated by DALL-E 3 and GPT-4o’s image generation capabilities. They also participated in the Coalition for Content Provenance and Authenticity. For text generation, OpenAI investigated watermarking but paused deployment, citing concerns about effectiveness and impact on output quality, according to The Verge, 2024.
Midjourney, Stability AI, and others
Midjourney adds visible metadata tags to generated images but does not implement imperceptible watermarking. Stability AI’s Stable Diffusion is open-source, meaning anyone can strip metadata by running the model locally. The open-source nature of many image generators fundamentally undermines watermarking at the generation layer: if users can modify the model, they can disable the watermark.
The Partnership on AI, a consortium including Google, Microsoft, Apple, Amazon, Meta, and OpenAI, audited the ecosystem in 2025 and found that only 38% of AI image generators implement adequate watermarking or content provenance measures, according to Partnership on AI, 2025. The remaining 62% either add minimal metadata, rely on voluntary user disclosure, or provide no labeling at all.
Text Watermarking: Even Less Robust Than Images
Text watermarking deserves special attention because AI-generated text is the content type most likely to contain factual claims, and therefore the type where detection matters most for content integrity.
Text watermarks work by biasing the token sampling process during generation. At each token position, the watermarking algorithm divides the vocabulary into “green” and “red” lists based on a secret key and the preceding tokens. The model is then biased toward selecting green-list tokens. Over a passage of 200+ tokens, this bias creates a detectable statistical signature, according to Kirchenbauer et al., 2023.
The problems with this approach:
Short text fragments don’t carry enough signal. A 50-token response (roughly one paragraph) provides too few data points for reliable detection. The watermark requires hundreds of tokens to produce a confident detection result. For applications where AI generates short responses, watermarking is ineffective.
Paraphrasing destroys the signal. Because the watermark depends on specific token choices at specific positions, rephrasing the text with different words in different order eliminates the pattern. A study demonstrated that GPT-4-based paraphrasing removed detectable watermarks from all tested watermarking methods while preserving content meaning, according to Sadasivan et al., 2024.
Translation eliminates the watermark entirely. Translating AI-generated text to another language and back produces content with no trace of the original token distribution pattern. In a globalized content ecosystem, this is a trivial bypass.
Quality degradation is measurable. Biasing token selection toward green-list tokens necessarily reduces the model’s effective vocabulary at each position. Research has shown perplexity increases of 5-15% in watermarked text compared to unwatermarked text, meaning the model produces slightly less natural and less optimal responses when watermarking is active.
These limitations explain why OpenAI, despite developing a text watermarking system, chose not to deploy it broadly. The technology is not yet reliable enough to serve as the primary mechanism for AI content identification.
Regulatory Requirements: What the EU AI Act Actually Mandates
The EU AI Act’s Article 50 requires that AI-generated content be disclosed and labeled, but the regulation is deliberately technology-neutral on how labeling is implemented, according to the EU AI Act full text. The law does not mandate watermarking specifically. It requires:
- That providers of AI systems generating synthetic content ensure the output is marked in a machine-readable format as artificially generated or manipulated.
- That deployers of AI systems generating text published for informing the public must disclose that the content was AI-generated.
- That deepfake content (AI-generated audio, video, or image that resembles existing persons, places, or events) be labeled as AI-generated.
Compliant approaches include:
- Imperceptible watermarking (SynthID, custom implementations)
- C2PA Content Credentials (cryptographic metadata)
- Visible disclosure labels (“Generated by AI”)
- Metadata tagging in file formats (EXIF, IPTC, XMP fields)
The regulation takes full effect on August 2, 2026. Organizations deploying AI in the European market need a labeling strategy, but they don’t need to solve the watermarking robustness problem. They need to implement a compliant labeling method that works for their content type and distribution channel.
Verification as a Complementary Layer
Watermarking and provenance standards like C2PA address one question: “was this content generated by AI?” They don’t address the question that matters most for content consumers: “is this content accurate?”
A perfectly watermarked AI-generated article that contains fabricated statistics and false claims is still harmful. A C2PA-signed image from a credible source that is captioned with inaccurate descriptions is still misleading. Detection and provenance tell you the content’s origin. Verification tells you the content’s accuracy.
Verification APIs operate at the claim level. They take a specific factual statement, check it against independent sources, and return a verdict with supporting evidence. This approach works regardless of whether the content carries a watermark, C2PA credentials, or no provenance information at all. For details on how claim-level verification works in practice, see our guide on what is a verification API.
The Webcite verification API checks claims against real-world sources and returns structured results:
import requests
response = requests.post(
"https://api.webcite.co/api/v1/verify",
headers={
"x-api-key": "your-api-key",
"Content-Type": "application/json"
},
json={
"claim": "Only 38% of AI image generators implement adequate watermarking",
"include_stance": True,
"include_verdict": True
}
)
result = response.json()
print(result["verdict"]["result"]) # "supported"
print(result["verdict"]["confidence"]) # 87
print(result["citations"][0]["url"]) # source URL
For production content integrity, the most robust strategy combines three layers:
- Provenance: C2PA Content Credentials proving creation context and chain of custody
- Detection: watermarking or AI classifiers identifying synthetic content
- Verification: claim-level fact-checking confirming factual accuracy
No single layer is sufficient. Watermarks can be removed. C2PA metadata can be stripped. AI classifiers produce false positives. But verification against independent sources works on any content, regardless of its provenance status. Over 67% of enterprise AI users cite accuracy as their top concern, according to McKinsey, 2025. Verification addresses that concern directly.
For teams building content integrity pipelines, see our guide on C2PA Content Credentials for the provenance layer, and the EU AI Act compliance guide for regulatory requirements.
Frequently Asked Questions
Can AI watermarks be removed?
Yes. University of Waterloo researchers proved that any AI watermark can be removed at least 50% of the time without significantly degrading content quality. Common removal techniques include image re-encoding, compression, cropping, applying adversarial perturbations, and regenerating content through a second model. Text watermarks based on token distribution shifts are even more fragile than image watermarks.
What is SynthID?
SynthID is Google DeepMind’s AI watermarking system that embeds imperceptible signals into AI-generated images, audio, text, and video. For images, it modifies pixel values in ways invisible to humans but detectable by the SynthID classifier. For text, it adjusts token sampling probabilities. Google open-sourced the text watermarking component in October 2024.
Does the EU AI Act require AI watermarking?
The EU AI Act requires that AI-generated content be labeled in a machine-readable format by August 2026, but it does not mandate any specific technical approach. Watermarking is one option. C2PA Content Credentials, metadata tagging, and visible disclosure are also compliant methods. The regulation is technology-neutral on implementation.
What percentage of AI image generators use watermarking?
Only 38% of AI image generators implement adequate watermarking or content provenance features, per the Partnership on AI’s 2025 audit. Major platforms like Midjourney and Stable Diffusion offer metadata-based labeling but not robust imperceptible watermarking. Google’s Imagen and DeepMind tools use SynthID, and Adobe Firefly uses C2PA Content Credentials.
What is the alternative to watermarking for detecting AI content?
Alternatives include C2PA Content Credentials (cryptographic provenance metadata embedded at creation), AI text classifiers (statistical detection of AI writing patterns), and verification APIs that check factual claims against real sources. No single approach is sufficient alone. Production systems should combine provenance, detection, and verification for comprehensive content integrity.