Google DeepMind scientists have developed a tool capable of adding watermarks to texts generated by large language models, thereby improving the ability to identify and track artificially created content.
These large language models (LLMs) are widely used artificial intelligences that can generate text for chatbots, writing assistance, and other purposes, but identifying and attributing the content produced by this AI to a specific source can be challenging.
In an article published in the journal Nature, researchers Sumanth Dathathri and Pushmeet Kohli from Google DeepMind present a strategy called SynthID-Text that uses a novel sampling algorithm to apply watermarks to AI-generated text.
According to the researchers, this tool uses the algorithm to subtly bias the word choice of the LLM, inserting a signature that can be recognized by the associated detection software. This strategy has demonstrated improved effectiveness in the detectability of the watermarks compared to previous approaches.
The watermark proposed by SynthID-Text can help identify synthetic text and limit its misuse, whether accidental or deliberate. Furthermore, the scientists explain that this tool has a negligible impact on the computing power required to run the LLM, facilitating its implementation.
Pablo Haya from the Linguistic Computing Laboratory at the Autonomous University of Madrid mentions that this technical solution is robust and necessary for improving the identification of AI-generated text. Haya highlights the importance of technologies that facilitate the identification of authorship of documents, especially at a time when current systems have low accuracy rates in this task.