Google DeepMind SynthID AI Watermarking Technology Open-Sourced to Businesses and Developers

In Technology
October 24, 2024
Google DeepMind SynthID AI Watermarking Technology Open-Sourced to Businesses and Developers


Google DeepMind open-sourced a new technology to watermark AI-generated text on Wednesday. Dubbed SynthID, the artificial intelligence (AI) watermarking tool can be used across different modalities including text, images, videos, and audio. However, currently, it is only offering the text watermarking tool to businesses and developers. The company aims for a wider adoption of the tool so that AI-generated content can be easily detected. Individuals and enterprises can access the tool via the Mountain View-based tech giant’s updated Responsible Generative AI Toolkit.

Google DeepMind Open-Sources AI Text Watermarking Technology

In a post on X (formerly known as Twitter), the official handle of Google DeepMind announced making SynthID’s text watermarking capability freely available to developers and businesses. Apart from the Responsible GenAI Toolkit, it can also be downloaded from Google’s Hugging Face listing.

AI-generated text has already begun crowding the Internet. Amazon Web Services AI lab published a study earlier this year which claimed that as much as 57.1 percent of all sentences online that have been translated into two or more languages might be generated using AI tools.

While AI chatbots filling up the Internet with gibberish AI-generated text might appear to be a case of harmless spamming, there is a darker side to it. In the hands of bad actors, AI tools can be used to mass-generate misinformation or misleading content. With a significant portion of social discourse occurring online, such actions could impact real-life events such as elections and be used to create propaganda against public figures.

Out of all modalities, gauging AI-generated text has proven to be the most difficult task so far. This is largely because watermarking the words is not possible, and even if it was, bad actors could always rephrase the content using a second output cycle.

However, Google DeepMind’s SynthID uses a novel way to watermark AI-generated text. The tool uses machine learning to predict the words that could appear after a specific word in a sentence. For instance, consider the sentence “John was feeling extremely tired after working the entire day.” Here, only a limited number of words can appear after the word “extremely”.

Based on analysis of content generation styles of various AI models, SynthID can predict the word that will appear after “extremely” and replace it with another synonym which exists in its database. The watermarking tool will embed such words throughout the entire content piece. Later, when the tool checks for AI-generated content, it looks for the number of such words to determine its authenticity.

Notably, for images and videos, SynthID adds a watermark directly into the pixels of the frames so they remain invisible but can still be detected in the tool. For audio, the audio waves are first converted into a spectrograph, and the watermark is added to that visual data. These capabilities are currently not available to anyone outside of Google.