The 'Harry Potter and the Cursed Child' logo at the Palace Theatre in London, in 2019. Photo:
Jakub Porzycki / NurPhoto / Gettyimages.ru

Mark Russinovich and Ronen Eldan, researchers at Microsoft, developed a technique that allows the large language models (LLMs) that power generative artificial intelligence (AI) chatbots to be altered or edited to selectively remove any information related to copyrighted content that appears in the data they've been trained on.

The researchers managed to make an LLM developed by Meta able to selectively forget about direct references to J.K. Rowling's Harry Potter books, including their characters and storylines. The proposed novel technique does not sacrifice the overall decision-making and analysis capability of the AI system and allowed the LLM to unlearn a subset of training data, without having to retrain from scratch.

A Legal and Ethical Challenge

Premiere of 'Harry Potter and the Deathly Hallows: Part 2' at Avery Fisher Hall in New York, in 2011. Photo:
Dimitrios Kambouris / Gettyimages.ru

According to the study's authors, LLMs are trained by analyzing massive internet datasets that often contain copyrighted information, private data, biased content, false data, and even toxic or harmful elements. This poses legal and ethical challenges for the developers and users of these models, as well as the original authors and publishers.

Evaluation of the technique for unlearning

Russinovich and Eldan tested the technique in the task of unlearning the Harry Potter books with the LLM Llama2-7b, recently developed by Meta. In a not-yet-peer-reviewed paper published on arXiv, the authors explain that while the model took more than 184,000 hours of processing to pre-train, they were able to find that, in about an hour of fine-tuning, they were able to effectively erase the model's ability to generate or retrieve Harry Potter-related content. The model's performance was virtually unaffected.

A three-step algorithm

The Microsoft researchers detailed that the technique consists of three main components. First, they identified the tokens by creating a reinforced model.

"We create a model whose knowledge of unlearned content is reinforced by further fine-tuning of target data (such as Harry Potter) and see what odds of tokens have increased significantly. These are likely to be tokens related to the content we want to avoid generating," they wrote.

Second, they replaced idiosyncratic expressions in the target data with generic counterparts in the target data so that the model then generates alternate labels for these tokens. Finally, they fine-tuned the model with these alternative labels. "In essence, whenever the model encounters context related to the target data, it 'forgets' the original content," the authors explain.

Why Harry Potter?

Print of the press tour for the world premiere of the exhibition 'Harry Potter: Visions of Magic on the Odysseus'. Germany (2023). Photo: xChristophxHardtx / www.globallookpress.com

Specialists point out that, in recent times, the reference to Harry Potter is becoming more and more frequent in AI studies. With J.K. Rowling's work, "the abundance of scenes, dialogue, and emotional moments make it very relevant to the specific area of natural language processing," Leila Wehbe, a researcher at Carnegie Mellon University in Australia, told Bloomberg this week.

Another reason is that "Harry Potter is popular with younger researchers," he added. "They would have read them when they were children or teenagers, thinking about them when choosing a corpus of written or spoken text," the expert concluded.

(With information from RT en Español)

See also:

Harry Potter Returns: HBO Max Announced A Special For 2022