Opinion: AI comes for the journalists
Opinion by Seán O’Connor
(CNN) — Editor’s Note: Seán O’Connor is a professor of law and the faculty director of the Center for Intellectual Property x Innovation Policy at George Mason University’s Antonin Scalia Law School. The views expressed in this commentary are their own. View more opinion at CNN.
When artificial intelligence firms set their systems to scrape and ingest millions of painstakingly produced news stories, is it comparable to art students learning to paint by recreating the “Mona Lisa” — or unfair misappropriation?
The New York Times claims the latter in a recent lawsuit filed in federal court in Manhattan, joining other creators and copyright owners now challenging AI companies’ use of their works without permission. Defendants OpenAI and Microsoft Corp. will almost certainly respond that training ChatGPT and similar systems on millions of the Times’ and others’ copyrighted works is “fair use” under the law.
Indeed, OpenAI already previewed a fair use defense in a motion to dismiss a separate ongoing lawsuit brought by comedian Sarah Silverman and other authors against Meta in federal court in San Francisco last year based on a similar scenario of ChatGPT reproducing substantial portions of the books after being trained on them. While that suit has not been faring particularly well — the judge recently granted Meta’s motion to dismiss all but one of Silverman et al.’s claims — it was largely based on different theories from those in the Times’ case.
Some academics have argued a theory of “fair learning” to justify the reproduction of copyright materials wholesale in generative AI training sets on the analogy that this is similar to humans privately reproducing copyrighted works to study and learn from them, which is generally held to be non-infringing or fair use. But these AI outputs are substantially similar to specific Times articles, according to the paper. This use is anything but fair.
Without the fair use defense, GenAI firms are likely liable for copyright infringement. This gives the Times and other publishers both the right to share in the profits GenAI will make off the publishers’ materials and the ability to negotiate “guardrails” for how their materials are used or end up in GenAI outputs.
Responding to the Times suit, a spokeperson for Open AI said the tech firm respected the rights of content creators and owners and was working with them to ensure they benefit from AI technology and new revenue models: “Our ongoing conversations with the New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development. We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.” Microsoft didn’t respond to a request for comment on the suit.
When art students copy the “Mona Lisa,” they seek to understand how Leonardo da Vinci executed his artistic vision. Their goals are to develop tools to express their own vision, in their own original style, not slavishly duplicate the style of another.
OpenAI and its ilk, by contrast, engineer their generative AI systems to replicate existing human creations and styles. Generative AI is so-called for the text, images and other expression created in response to prompts from users.
I analyzed this business model in the context of music. Apps such as Jukebox and MuseNet — two other OpenAI projects — promote their ability to create “new” works in the styles of specifically named artists and composers. Whether there is a long game that turns instead to generating unique output is unclear.
On its best days, the nascent industry promotes a vision of a tool that helps humans create distinctive works. But right now generative AI is limited to mashups of existing styles (in part, because the systems need to be trained on existing materials).
Breakthrough creativity is not just a rejiggering of current stylistic inputs where each remains recognizable. It is, instead, a wholly new stylistic creation that only hints at its influences. “Frank Sinatra singing an Ed Sheeran song,” as generated by current generative AI systems would sound just like what its title describes. The listener would hear what sounds like Sinatra’s actual voice as if he is singing a cover of an Ed Sheeran song—meaning the kind of melodies, chord changes, and phrasing that typify Sheeran’s songs—even though it is not any actual Sheeran song.
By contrast, when human musicians create their own new style built from the styles of other musicians they admire and emulate, the result does not sound like one of their influences singing the song of another influence. For example, singer-songwriter Brandi Carlile famously has been clear about the deep influence of earlier artists including Joni Mitchell and Elton John on her own style. Yet, with the exception of when Carlile has in fact covered a Joni or an Elton song, her original songs do not sound directly like either the performance or compositional style of her two idols. So the output of skilled creative humans sounds like something new while the output of AI sounds like awkward juxtapositions of the different humans’ work that it was trained on.
On its worst days, the generative AI industry seems intent on replacing human creativity altogether. AI systems will churn out new works on their own internal prompts at scale for all tastes and budgets. Can any valuable aesthetic or authentic new style emerge from this?
When it comes to news, people might ask: isn’t news “just the facts”? And, under copyright law, facts are unprotectable. Further, if text is “functional,” like a recipe, then it’s not protectable either. However, even if news reporting is merely factual and functional, the US Supreme Court’s 1918 decision in International News Service v. Associated Press still holds that it is a misappropriation to immediately reproduce noncopyrightable news stories.
At the same time, journalism isn’t “just the facts.” It’s also storytelling. Readers want insightful, original points of view and analysis all wrapped up in attractively stylized passages. In some cases, norm-shattering styles such as Hunter S. Thompson’s “gonzo journalism” may even give readers a fresh way of understanding world events.
Generative AI is intentionally set up to replicate the style of established journalists. Doing so gives its output the qualities readers expect from news and commentary. In practice, it also means that generative AI is outputting text not only in the style of known writers but is also exactly reproducing previously published passages. The Times’ complaint documents a number of these instances.
Can a generative AI reproduction of previously published news and commentary be fair use? I don’t think so. Some journalists become more widely read than others not just because they publish first or have better insights but also because they express their ideas well. When generative AI free-rides on these stylistic successes, it fails the four-part statutory test for fair use: purpose and character of the use (e.g., commercial or noncommercial); nature of the copyrighted work; amount and substantiality of the portion used compared to overall work; and effect of the use on the market for the original. Courts often use a test of “transformative use” as shorthand for some or all of these factors. Is the allegedly infringing work using the reproduced portions in a different manner than the original work did?
Generative AI’s use is not “transformative” in that it is not commenting on or critiquing the original story or shifting it to a different medium, purpose or context. It is, instead, reproducing substantial parts of others’ work simply to compete in the same market channels as the original.
Even more problematic for a world flooded with misinformation, generative AI is “hallucinating” stories while making them appear to be from legitimate publications of respected news outlets. “Hallucinating” is the name for when generative AI fabricates facts and stories that either do not exist or have been altered to a falsity, but presents them in a convincing manner (e.g., presents a legal court citation that matches the technical format but no such case actually exists). Thus, generative AI is infringing trademark rights as well as misattributing stories and ideas.
Ultimately, generative AI is engaged in exactly the opposite of what human learning is supposed to achieve. Rather than mastering the styles of other experts to develop new and better ones, it is a snaking hose flailing around uncontrollably, spewing thoughtless sequences of text based solely on probabilities that one word comes after another in human expression.
The thoughtful expressions of skilled human creators have been co-opted into an egregious firehose of inanity that threatens to upend not only the creative industries but also democracy and our very senses of truth and reality. While copyright infringement may seem the least of our worries, enforcing intellectual property rights is the best start to reining in generative AI for the good of humanity.
The-CNN-Wire
™ & © 2024 Cable News Network, Inc., a Warner Bros. Discovery Company. All rights reserved.