Nvidia facing allegations of using YouTube and Netflix videos for AI training

Nvidia faces allegations of downloading YouTube and Netflix videos for AI training, including academic-use content, using VMs to avoid detection and bans.

Aug 7, 2024 - 16:07

Nvidia facing allegations of using YouTube and Netflix videos for AI training

According to the report, Nvidia accessed videos from a vast collection of YouTube content intended exclusively for academic use.

Nvidia is under scrutiny for allegedly downloading a vast number of videos from YouTube, Netflix, and other platforms to train its AI systems. A report by 404 Media reveals that the company acquired these videos to enhance AI models used in products such as the Omniverse 3D world generator and the GR00T project for digital humans. The report, based on leaked documents and communications, suggests that Nvidia instructed employees to gather videos from various sources, including MovieNet, video game footage libraries, and the WebVid dataset on GitHub.

Despite ethical and legal objections raised by some employees, the report claims that these practices were sanctioned by the highest levels of Nvidia's leadership. The extensive use of scraped videos from diverse sources raises questions about the company's compliance with copyright and fair use regulations in AI training.

According to the report, Nvidia accessed videos from a vast collection of YouTube content intended exclusively for academic use. The company reportedly argued that these videos, part of the HD-VG-130M library with 130 million YouTube videos, were suitable for commercial AI applications despite their academic usage license. To bypass detection and avoid potential bans from YouTube, Nvidia allegedly employed virtual machines (VMs) with rotating IP addresses to systematically download the content. This method allowed the company to covertly acquire a significant amount of material for training its AI systems while evading restrictions intended for academic-only use. The practice raises concerns about the ethical and legal implications of using such restricted content for commercial purposes.