close
close

Google offers its AI watermarking technology as a free, open-source toolkit

Google offers its AI watermarking technology as a free, open-source toolkit

Google also points out that this type of watermarking works best when the LLM distribution contains a lot of “entropy,” i.e. ). In situations where an LLM “almost always returns the exact same answer to a given prompt” – such as basic factual questions or models tuned to a lower “temperature” – watermarking is less effective.

A diagram explaining how SynthID's text watermark works.

A diagram explaining how SynthID's text watermark works.


Image credit: Google / Nature

Google says SynthID builds on previous similar AI text watermarking tools by introducing a so-called tournament sampling approach. During the token generation loop, this approach puts each potential candidate token through a multi-stage, bracket-style tournament, where each round is “scored” by a different random watermarking feature. Only the ultimate winner of this process makes it into the final edition.

Can you tell it's Folgers?

Changing the token selection process of an LLM with a random watermarking tool could obviously have a negative impact on the quality of the text generated. But in its article, Google shows that SynthID can be “non-biasing” at either the level of individual tokens or short text sequences, depending on the specific settings used for the tournament algorithm. Other settings can increase the “distortion” caused by the watermark tool while increasing the detectability of the watermark, Google says.

To test how possible watermark bias might affect the perceived quality and usefulness of LLM output, Google passed “a random fraction” of Gemini requests through the SynthID system and compared them to non-watermarked counterparts. For a total of 20 million responses, users gave watermarked answers 0.1 percent more thumbs-up ratings and 0.2 percent fewer thumbs-down ratings, which is unlikely to be the case in a large number of real-world LLM interactions shows a noticeable difference.

Research from Google shows that SynthID is more reliable than other AI watermarking tools, but its success rate depends heavily on length and entropy.

Research from Google shows that SynthID is more reliable than other AI watermarking tools, but its success rate depends heavily on length and entropy.


Image credit: Google / Nature

Google's testing also showed that its SynthID detection algorithm successfully recognized AI-generated text significantly more often than previous watermarking schemes such as Gumbel sampling. However, the extent of this improvement – ​​and the overall rate at which SynthID can successfully recognize AI-generated text – depends heavily on the length of the text in question and the temperature setting of the model used. For example, SynthID was able to recognize nearly 100 percent of Gemma 7B-1T's 400 token-long AI-generated text samples at a temperature of 1.0, compared to about 40 percent for 100 token-long samples of the same model at a temperature of 0.5 temperature.

Leave a Reply

Your email address will not be published. Required fields are marked *