Imagine if a mob of people came into your store with no intention of buying and blocked real users from entering. That is what Microsoft and Amazon are doing to our digital storefronts.
This is a screenshot of my inbox with error alerts on Friday morning. Microsoft (to power its MSN search engine) was hitting our sites hundreds of times per minute. This scraping takes up bandwidth from real users and customers. It stops customers from being able to make purchases at indie stores online. At the time Microsoft was scraping our servers, a store owner in Pennsylvania emailed me and said: "I can't get on our website. I'm getting a timeout message."
Microsoft says it abides by web developer rules about how often to crawl a site. We tell Microsoft to crawl at a reasonable rate of once per second. One can clearly see here that it doesn't abide by the rules. (To show that this actor is MSN, in the screenshot I've shown on the right where the IP resolves to: msnbot-157-55-39-207.search.msn.com.)
I am also seeing massive scraping of our sites from Amazon. Do you know how Alexa can answer your questions so quickly? Because Amazon is scraping all the web's data, storing it in a database, and then using it to quickly answer: you. Last week, Amazon sent our server 3,200 scrape requests in an hour. In the screenshot, you can a file with approximately 3,200 rows. Each row represents an Amazon bot scrape request. The screenshot shows the 60 Bridge pages Amazon scraped. Those pages are all belong to indie stores using Bridge. The purpose: scrape data for Alexa. If you know someone that uses Alexa, they are helping create logjams in your accessing sites quickly.
I believe the goal of big tech companies is to suck up all the data--even if that means soaking up our bandwidth and inhibiting real visitors from using our sites. I believe there is an arms race amongst big tech to scrape all sites regardless of the impact on the site owner. We’re just bodies to be rolled over in their war against each other. It’s Google vs. Microsoft vs Amazon. ...oh, did I mention Facebook scrapes sites, too? :)
View Post at Bridge