Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Craigslist IP Blocks #1

Open
singlerider opened this issue Nov 8, 2015 · 1 comment
Open

Craigslist IP Blocks #1

singlerider opened this issue Nov 8, 2015 · 1 comment

Comments

@singlerider
Copy link
Owner

After a given amount of requests in a short timeframe, Craigslist will autoblock access to segments of the service being scraped. Craigslist also blocks known Tor connections by default and this application will return an error of "403" (unauthorized) if either of the two conditions are met. The best combination will likely be doing a sleep(n) with a randomized float in between 5 to 10 seconds between request and using external proxies. After enough testing and time between my IP getting blocked or by resolving this another way, I will close this issue with the best method.

@singlerider
Copy link
Owner Author

Craigslist uses Captcha checkboxes to prevent bot scrapes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant