AI companies are reportedly still scraping websites despite protocols meant to block them

Joystiq

Joystiq News
Perplexity, a company that describes its product as "a free AI search engine," has been under fire over the past few days. Shortly after
Please, Log in or Register to view URLs content!
accused it of stealing its story and republishing it across multiple platforms,
Please, Log in or Register to view URLs content!
reported that Perplexity has been ignoring the Robots Exclusion Protocol, or robots.txt, and has been scraping its website and other Condé Nast publications. Technology website
Please, Log in or Register to view URLs content!
also accused the company of scraping its articles. Now,
Please, Log in or Register to view URLs content!
has reported that Perplexity isn't the only
Please, Log in or Register to view URLs content!
that's bypassing robots.txt files and scraping websites to get content that's then used to train their technologies.

Reuters said it saw a letter addressed to publishers from TollBit, a startup that pairs them up with AI firms so they can reach licensing deals, warning them that "AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites." The robots.txt file contains instructions for web crawlers on which pages they can and can't access. Web developers have been using the protocol since 1994, but compliance is completely voluntary.


TollBit's letter didn't name any company, but
Please, Log in or Register to view URLs content!
says it has learned that
Please, Log in or Register to view URLs content!
and
Please, Log in or Register to view URLs content!
— the creators of the ChatGPT and Claude chatbots, respectively — are also bypassing robots.txt signals. Both companies previously proclaimed that they respect "do not crawl" instructions websites put in their robots.txt files.

During its investigation, Wired discovered that a machine on an Amazon server "certainly operated by Perplexity" was bypassing its website's robots.txt instructions. To confirm whether Perplexity was scraping its content, Wired provided the company's tool with headlines from its articles or short prompts describing its stories. The tool reportedly came up with results that closely paraphrased its articles "with minimal attribution." And at times, it even generated inaccurate summaries for its stories — Wired says the chatbot falsely claimed that it reported about a specific California cop committing a crime in one instance.

In an interview with
Please, Log in or Register to view URLs content!
, Perplexity CEO Aravind Srinivas told the publication that his company "is not ignoring the Robot Exclusions Protocol and then lying about it." That doesn't mean, however, that it isn't benefiting from crawlers that do ignore the protocol. Srinivas explained that the company uses third-party web crawlers on top of its own, and that the crawler Wired identified was one of them. When Fast Company asked if Perplexity told the crawler provider to stop scraping Wired's website, he only replied that "it's complicated."

Srinivas defended his company's practices, telling the publication that the Robots Exclusion Protocol is "not a legal framework" and suggesting that publishers and companies like his may have to establish a new kind of relationship. He also reportedly insinuated that Wired deliberately used prompts to make Perplexity's chatbot behave the way it did, so ordinary users will not get the same results. As for the inaccurate summaries that the tool had generated, Srinivas said: "We have never said that we have never hallucinated."

This article originally appeared on Engadget at
Please, Log in or Register to view URLs content!


Please, Log in or Register to view URLs content!


Console Bang News!
 
The practice of praying regularly is an important aspect of various religions around the world, including Islam. In Islam, Muslims are required to perform five daily prayers at specified times throughout the day and night. The precise prayer timings are crucial for Muslims to observe, as they are believed to have a spiritual significance and are a way to connect with Allah.

The five daily prayers in Islam are Fajr (dawn), Dhuhr (midday), Asr (afternoon), Maghrib (evening), and Isha (night). The precise timings for these prayers are determined by the position of the sun in the sky and they vary each day. The times for each prayer are calculated based on the movement of the sun, with specific formulas used to determine the exact moment when each prayer should be performed.

The importance of observing the precise prayer timings in Islam cannot be overstated. Muslims believe that performing their prayers at the designated times is a form of obedience to Allah and helps them maintain a strong connection to their faith. In addition, praying at the specified times is believed to bring blessings and spiritual benefits to the individual.

To ensure that they are observing the correct prayer timings
Please, Log in or Register to view URLs content!
, many Muslims use prayer timetables or apps that provide accurate information on when each prayer should be performed. These timetables are often based on astronomical calculations and are updated regularly to reflect the changing position of the sun throughout the year.

In some countries, mosques also use loudspeakers to announce the call to prayer, known as the Adhan, which alerts Muslims to the beginning of each prayer time. This helps remind individuals to pause their daily activities and devote time to prayer, strengthening their connection to Allah and their faith.

Overall, the precise prayer timings in Islam are an important aspect of the religion that helps Muslims maintain their spiritual connection and strengthen their faith. By observing the designated times for prayer, Muslims can experience a sense of peace and tranquility, as well as a closer relationship with Allah.
 

Users who are viewing this thread

Top