New research reveals the scale of the growing problem of web scraping on some of the world’s largest websites.
Smartproxy’s report on the most scraped websites of 2024 shows that social media pages account for more than a quarter (27%) of the most scraped sites.
In 2023 and the first three months of 2024, bots were most interested in search engines like Google (42%), but social media accounts and community forums together accounted for a third (34%) of observed scraping instances.
Google is the most scraped website
Although disturbing, many of the most frequently scraped sites are fortunately not targets for personal data mining. The leading ones are search engines and e-commerce.
“This trend underscores the urgent need for real-time search data across industries, including the ever-growing AI field, where data plays a critical role in training AI models,” said Vytautas Savickas, CEO of Smartproxy.
“Additionally, e-commerce platforms contribute to a large portion of the most scraped targets, reflecting the industry’s push for competitive intelligence needed for dynamic pricing strategies.”
E-commerce sites, which account for about a fifth (18%) of scraping requests, represent a growing segment. Smartproxy noted that shopping trends are emerging and real-time data is becoming increasingly important as consumers seek more competitive prices.
The report also details spikes in e-commerce, with shopping periods such as Black Friday (+64%), Christmas (+46%) and Amazon Prime Day (+22%) all seeing significant spikes.
“Businesses are ramping up their scraping efforts during these times to harness the value of data generated by the influx of online shoppers looking for discounts and special offers,” Savickas added.