Is Microsoft's new AI leader correct in advocating for 'Freeware' web content?

Microsoft's new AI chief advocates for web content to be 'freeware'. Is this approach the right one?

Jul 4, 2024 - 09:44

Is Microsoft's new AI leader correct in advocating for 'Freeware' web content?

Discussions about AI often feature prominent tech companies emphasizing ethics, governance, and responsible AI use.

Earlier this year, Mustafa Suleyman, co-founder of Google DeepMind, made waves by joining Microsoft to lead a new team focused on consumer AI products like Copilot, Bing, and Edge, reporting directly to CEO Satya Nadella. Recently, Suleyman stirred controversy in an interview with CNBC's Andrew Ross Sorkin, suggesting that publicly available data on the open web, crucial for training AI models, should be considered "freeware". His remarks sparked criticism, raising concerns about tech giants' data practices and their intentions to profit from freely accessible internet information.

Discussions about AI often feature prominent tech companies emphasizing ethics, governance, and responsible AI use. However, recent scrutiny over AI data training and privacy policies suggests a pattern akin to Mark Zuckerberg's approach: public pledges contrasting with actions that may undermine privacy. The Microsoft AI CEO's assertion that web content should be viewed as 'freeware' further intensified the debate. He argued that content on the open web has historically been treated as fair use since the 1990s, implying it's freely accessible for replication and reproduction. This stance raises questions about the ethical boundaries of data usage, especially concerning user-generated content considered by Microsoft as free training data. The CEO's interpretation of copyright law diverges from Microsoft's own policies, highlighted by past legal actions against copyright infringements. The evolving landscape of data indexing and usage rights continues to provoke legal and ethical deliberations, particularly regarding web scraping and indexing practices contested in courts.

Microsoft's controversial data collection tool

In recent years, there has been a rush among big tech companies to gather vast amounts of data for AI development, including scraping global data and converting YouTube videos into usable transcripts. Concerns are mounting about the future availability of such data for AI training, projected to diminish by 2026. Microsoft's Recall feature, reminiscent of something from "Black Mirror," captures screenshots of users' activities every few seconds on Windows PCs, storing them locally to provide a comprehensive visual search of their digital history. This capability has raised privacy alarms, prompting scrutiny from regulatory bodies like the Information Commissioner's Office (ICO), which emphasizes the need for transparency and safeguards in data use. Critics argue that such technologies give tech giants unrestricted access to personal data, including messages, location, and online behavior, to optimize algorithms for targeted advertising.