Reddit起诉Perplexity数据抓取指控

Mashable 10月24日 04:47

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

Reddit指控知名AI公司Perplexity进行未经授权的数据抓取，以供其AI程序使用。Reddit在诉状中列出Perplexity与AWMProxy、Oxylabs和SerpApi等数据抓取公司存在业务往来，声称Perplexity利用这些公司从Reddit获取数据。尽管Reddit与其他AI公司签订了协议，但并未与Perplexity达成协议。Reddit曾向Perplexity发出停止函，但Perplexity并未停止抓取行为，反而加剧了引用。Reddit通过创建一个仅Google可爬取的“测试帖”来验证指控，结果显示Perplexity的“答案引擎”在数小时内生成了该帖内容，证实了数据抓取行为。Perplexity对此进行辩护，强调其致力于用户自由公平地获取公共知识，并将坚决维护开放性和公共利益。

🔍 Reddit指控Perplexity与AWMProxy、Oxylabs和SerpApi等数据抓取公司存在业务往来，利用这些公司从Reddit抓取未经授权的数据，以供其AI程序使用。

📝 尽管Reddit与其他AI公司签订了协议，但并未与Perplexity达成协议。Reddit曾向Perplexity发出停止函，但Perplexity并未停止抓取行为，反而加剧了引用。

🔬 Reddit通过创建一个仅Google可爬取的“测试帖”来验证指控，结果显示Perplexity的“答案引擎”在数小时内生成了该帖内容，证实了数据抓取行为。

🛡️ Perplexity对此进行辩护，强调其致力于用户自由公平地获取公共知识，并将坚决维护开放性和公共利益，拒绝容忍对开放性和公共利益的威胁。

Reddit claims it caught Perplexity doing something it shouldn't have.

The popular message board website filed a lawsuit against Perplexity, a notable AI firm, alleging that Perplexity engaged in improper data scraping to feed its AI program. The complaint (courtesy of The Verge) lists Perplexity alongside three data scraping firms: AWMProxy, Oxylabs, and SerpApi. According to Reddit, Perplexity does business with at least one of these companies, allegedly using them to get data from Reddit without the site's permission.

While Reddit has signed agreements with other AI companies in the recent past, it has not done so with Perplexity. Reddit claims that it once sent a cease-and-desist letter to Perplexity for scraping Reddit content. Per Reddit's complaint, after the letter was sent, Perplexity started citing Reddit even more than before, not less. Where this really gets juicy is how Reddit claims it caught Perplexity in the alleged act of stealing data. In Reddit's words:

"To confirm this hypothesis, Reddit created a “test post” – the equivalent of a digital “marked bill” – that could only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet. Within hours, queries to Perplexity’s “answer engine” produced the contents of that test post. The only way that Perplexity could have obtained that Reddit content and then used it in its “answer engine” is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data into its answer engine."

Perplexity provided a statement defending itself to The Verge.

“Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge,” the company told The Verge. “Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

We'll have to wait and see how the lawsuit pans out, but at least Reddit's tactic for allegedly catching Perplexity in the act is funny, if nothing else.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签