Expanding publisher concerns towards OpenAI: Can AI news dilemmas be resolved?

Growing worries among publishers about OpenAI's impact on AI news. Can these concerns be addressed effectively?

May 3, 2024 - 12:02

May 4, 2024 - 14:19

Expanding publisher concerns towards OpenAI: Can AI news dilemmas be resolved?

News outlets express frustration with generative AI and web scraping.

News outlets express frustration with generative AI and web scraping. This week, a group of publishers filed a lawsuit against Microsoft and OpenAI, accusing them of using copyrighted articles to train ChatGPT and Copilot without permission or payment. The coalition includes major newspapers owned by AldenGlobal Capital (AGC) like the New York Daily News and Chicago Tribune. The lawsuit argues that the defendants violated the law by utilizing journalism from these newspapers to develop ChatGPT and Copilot without compensation. AGC's legal action underscores concerns that generative AI poses a serious threat to news publishers, competing not only with virtual assistants like ChatGPT but also generating news content itself.

The ethics of Large Language Model (LLM) development

While generative AI relies on copyrighted materials for robust training, there's a legal and ethical argument for compensating copyright holders. This week's complaint highlights publishers' investments in real-world reporting, which is then utilized by AI models like ChatGPT and Copilot without permission or compensation, undermining publishers' core businesses. The inclusion of excerpts from conversations with these chatbots illustrates the issue. While sympathizing with publishers and journalists is understandable, the legal status of using copyrighted data for LLM training remains somewhat ambiguous.

The fragile foundation of Generative AI

Generative AI relies heavily on quality written content for training, often obtained through web scraping or curated repositories like Common Crawl. However, much of this content is copyrighted. Instead of seeking permission from copyright holders beforehand, many LLM developers have used this material without authorization, risking legal repercussions.

Is there even a third alternative?

While the outcome of this lawsuit remains uncertain, LLM vendors face a clear dilemma: engage in a legal battle with publishers or offer compensation for access to copyrighted content. OpenAI is pursuing partnerships, such as its recent agreement with The Financial Times, to use their content for AI model training. However, publications wary of AI's impact on their business may resist short-term payouts. Embracing partnerships with AI companies could be risky for publishers concerned about their long-term viability. The rise of AI-generated news and language models as alternative sources further complicates the landscape, potentially diverting traffic from traditional publishers. Hence, partnerships like OpenAI's with The Financial Times are likely to be rare and costly.