‘Potential bank robbers’: Reddit sues puzzlers and data companies over AI scraping

Reddit has filed a lawsuit against AI research developer Perplexity and companies that buy AI training data from it, alleging that the data companies are illegally scraping its content, violating its copyright protections.

The lawsuit was foot Wednesday in the U.S. District Court for the Southern District of New York. In addition to Perplexity, three data companies are named as defendants: Oxylabs UAB, AWMProxy, and SerpApi.

In the filing, Reddit said the data companies circumvented Reddit and Google’s technological barriers by accessing nearly three billion search engine results pages (SERPs) in a two-week period in July using techniques to mask their identities and locations. Reddit described them as “would-be bank robbers who, knowing they couldn’t get into the bank vault, broke into the armored truck carrying the money instead.”

Don’t miss any of our unbiased technical content and lab reviews. Add CNET As Google’s preferred source.

Reddit said it traced that illegally collected data to Perplexity, which is why it previously issued a cease and desist letter. Perplexity is still listed as a client of one of the data companies, SerpApi, it said Websitealong with Meta, Samsung, and Nvidia.

CNET

Reddit is one of the most popular platforms on the Internet, along with the company Preparing reports Over 110 million daily active users and over 22 billion posts and comments. As such, it has become one of the most popular sources of the type of human-generated data that AI companies seek to obtain. Reddit has made deals with OpenAI and Google To license its data. Anthropic has also been sued for misusing its data.

Perplexity was also recently sued for copyright infringement by Encyclopedia BritannicaWhich owns the Merriam-Webster dictionary.

Al-Hira did not immediately respond to a request for comment.

Copyright is one of the most controversial legal issues for AI companies. They need massive amounts of human-generated content — like Reddit posts — to train and improve their AI models. Much of this content is protected by copyright, which usually requires the company to negotiate with the rights holder in order to license and use it.

While some AI companies have struck multi-million-dollar licensing deals with publishers like Axel Springer, others have argued that their use of copyrighted material is fair use and therefore does not require them to pay. A series of lawsuits are swirling over the details in court dead and Anthropic Score fair use wins this summer. (Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging that it infringed Ziff Davis’s copyrights in training and operating its AI systems.)

Leave a ReplyCancel Reply

Trending now