A humble SEO spider and link checker
pip install -r requirements.txt
- Rename
config-sample.json
toconfig.json
- Set the
origin_domain
value inconfig.json
to your site
config.json
"origin_domain": "example.com",
python seaspider.py
This is on by default. It prevents the spider from crawling URLs outside a given domain.
config.json
"allow_outside_starting_domain": false,
"origin_domain": "example.com",
"operation_mode": "domain_scan"
The max_crawl_depth
setting controls how many levels of links the spider will recursively crawl (crawls a page, harvests its links, starts crawling each of those links, repeating until the max depth is reached).
⚠ Warning: Depending on the number of total links, unique links your crawl network generates, the computational expense of traversing the entire network increases exponentially as the max crawl depth increases.
config.json
"max_crawl_depth": 5,