Reddit CEO Steve Huffman has announced that Microsoft and other companies need to pay to search and scrape data from Reddit, following deals made with Google and OpenAI. Huffman highlighted the importance of having control over Reddit’s data usage and display, which led to blocking companies unwilling to negotiate terms.
Key Points
- Licensing Agreements: Reddit has successfully negotiated agreements with Google and OpenAI to control how its data is used. Huffman is now urging Microsoft, Anthropic, and Perplexity to follow suit and pay for accessing Reddit’s data.
- Blocking Uncooperative Companies: Reddit has updated its robots.txt file to prevent web crawlers without agreements from accessing its site. This change means that Reddit results are now only visible in Google searches, where a deal is in place.
- Microsoft’s Use of Reddit Data: Huffman accused Microsoft of using Reddit’s data to train AI models and display content in Bing results without permission. He noted that Reddit’s data has been sold through the Bing API to other search engines.
- Public Data Debate: Huffman criticized the viewpoint expressed by Microsoft AI CEO Mustafa Suleyman, who referred to public internet data as “freeware.” Huffman emphasized the need for licensing agreements to ensure fair use and compensation for Reddit’s content.
- Response from Microsoft: Microsoft’s head of search, Jordi Ribas, stated that Reddit has blocked Bing from crawling its site, impacting competition. Microsoft spokesperson Caitlin Roulston added that Microsoft respects the directives of websites regarding content use.
Industry Context- Reddit
- Shift in Value Exchange: Huffman highlighted a shift in the traditional value exchange between search engines and content providers. With the merging of search, summarization, and AI training, the exchange of crawling for traffic has become less clear.
- Reddit’s Position: Reddit aims to establish a model similar to its agreement with OpenAI, where its data can be used under specific terms. None of Reddit’s current licensing deals grant exclusive use rights.
- Media Industry Trends: Reddit joins traditional media publishers in seeking compensation for the use of their content in generative AI models. This reflects a broader industry trend toward valuing and monetizing digital content.
Industry Reactions
- Anthropic’s Stance: Anthropic, one of the companies mentioned by Huffman, stated that Reddit has been on its block list since mid-May, and no Reddit URLs have been added to its crawler since then. They affirmed their commitment to respecting the robots.txt protocol for web crawling.
Steve Huffman’s call for licensing deals reflects a growing concern among content providers about the fair use of their data by AI models and search engines. As the digital landscape evolves, the need for clear agreements and compensation for data usage becomes increasingly important.