US court fully legalized website scraping and technically prohibited it
On September 9, the U.S. 9th circuit court of Appeals ruled (Appeal from the United States District Court for the Northern District of California) that web scraping public sites does not violate the CFAA (Computer Fraud and Abuse Act).
This is a really important decision. The court not only legalized this practice, but also prohibited competitors from removing information from your site automatically if the site is public. The court confirmed the clear logic that the entry of the web scraper bot is not legally different from the entry of the browser. In both cases, the “user” requests open data — and does something with it on their side.
Now many site owners are trying to put technical obstacles to competitors who completely copy their information that is not protected by copyright. For example, ticket prices, product lots, open user profiles, and so on. Some sites consider this information “their own”, and consider web scraping as “theft”. Legally, this is not the case, which is now officially enshrined in the US.
The decision was made during the trial of LinkedIn (owned by Microsoft) against a small data analysis company called hiQ Labs.
HiQ linked data from publicly available LinkedIn user profiles and then used it to consult employers whose employees posted their resumes on the site.
LinkedIn has tolerated hiQ activity for several years, but in 2017 sent the company a request (a cease-and-desist letter) to stop automated data collection from profiles. Among other things, LinkedIn claimed that hiQ violated the computer fraud and abuse act (CFAA), the main American law against hackers. Adopted more than 30 years ago, this law prohibits “access to a computer without authorization or with exceeding access rights.”
The requirement has become an existential threat to hiQ, since the LinkedIn website is the main data source for hiQ. The Analytics firm had no choice but to sue LinkedIn. It sought not only to legalize web scraping, but also to ban technical obstacles.
In 2017, the court of first instance sided with hiQ. The defendant filed an appeal, and yesterday the 9th circuit court of Appeals agreed with the lower court — it stated that the computer fraud and abuse act does not apply to information available to the General public.
“The CFAA is adopted to prevent deliberate intrusion on someone else’s computer — in particular, computer hacking,” the court said. The court notes that participants in the process have repeatedly drawn analogies with physical crimes, such as breaking and entering. According to the judges, this means that the CFAA applies only to information or computer systems that are initially closed to the public — usually indicated by the requirement of authorization at the entrance.
The court noted that the CFAA law was originally passed in the 1980s specifically to protect certain categories of computers containing military, financial, or other sensitive data. But when the law was extended to more computers in 1996, a Senate report said its goal was to ” increase privacy.” In other words, its purpose is to protect private, private information.
HiQ only takes information from public LinkedIn profiles. By definition, any member of the public has the right to access this information.
Most importantly, the appeals court also upheld a lower court ruling that prohibits LinkedIn from interfering with hiQ’s web scraping of its site. This fundamentally changes the balance of power in dealing with such cases in the future.
Perhaps this is a specific feature of American legislation. In this case, hiQ argued that LinkedIn’s technical measures to block web scraping interfere with hiQ’s contracts with its own customers who rely on this data. In legal jargon, this is called” malicious interference with a contract”, which is prohibited by American law.
In Russia, protecting your site from bots, including web scrapers, is considered normal practice, even if the site owner does not own intellectual property rights to published information (for example, user profiles).
ThumbOne · 03.02.2020 at 04:29
One would imagine it remains perfectly legal to restrict web scraping if it is acting intentionally or as a side effect as a DoS (Denial of Service). That is, robot traffic can interfere with a sites own business, which is the case they made for permitting it.
Selective robot denial is also common practice and will remain so especially when both parties (target site and scraper) are in agreement over it. This crops up mostly with search engines, as in it’s perfectly fine for a site to say all these pages are public, but these ones don’t really need indexing in a search engine, waste of your time and ours or even, we’d just rather you didn’t.