Further development is going in https://github.com/Megaputer/inepta, see examples in examples
folder
This repository contains examples of web scrapers used in Internet Source
node to demonstrate their use cases and capabilities.
- Install the newest version of python from https://python.org/downloads. Python 3.7+ is required.
- Download this repository (here we placed it to D drive, so full path is
D:\python-scraper-examples
) - Open
Command Prompt
and navigate to the repository root folder - Create virtual environment
python -m venv env
- Install scraper dependencies
env\Scripts\pip install -r requirements.txt
- Download chromium browser for
webapp_scraper
:
env\Scripts\python -m playwright install chromium
- Register web scrapers in
PolyAnalyst
:- Navigate to
Server settings
inPolyAnalyst Administrative Tool
- Open
Web scrapers
context menu and click onAdd item
- Enter the scraper name in the
Name
field. This name will be displayed in the drop-downScraper
menu in theInternet Source
node wizard - Enter a command in the
Command
field. For example,
D:\python-scraper-examples\env\Scripts\python.exe D:\python-scraper-examples\megaputer_blog.py
- Click
Save changes
to apply new settings
- Navigate to
- Add
Internet Source
node to workspace - Choose one of scrapers registered earlier in the drop-down
Scraper
menu - Set parameters if selected scraper supports them
- Execute node
This project is licensed under the MIT License - see the LICENSE file for details