Skip to content

Megaputer/python-scraper-examples

Repository files navigation

python scraper examples

Further development is going in https://github.com/Megaputer/inepta, see examples in examples folder

This repository contains examples of web scrapers used in Internet Source node to demonstrate their use cases and capabilities.

Installation

  1. Install the newest version of python from https://python.org/downloads. Python 3.7+ is required.
  2. Download this repository (here we placed it to D drive, so full path is D:\python-scraper-examples)
  3. Open Command Prompt and navigate to the repository root folder
  4. Create virtual environment
python -m venv env
  1. Install scraper dependencies
env\Scripts\pip install -r requirements.txt
  1. Download chromium browser for webapp_scraper:
env\Scripts\python -m playwright install chromium
  1. Register web scrapers in PolyAnalyst:
    • Navigate to Server settings in PolyAnalyst Administrative Tool
    • Open Web scrapers context menu and click on Add item
    • Enter the scraper name in the Name field. This name will be displayed in the drop-down Scraper menu in the Internet Source node wizard
    • Enter a command in the Command field. For example,
    D:\python-scraper-examples\env\Scripts\python.exe D:\python-scraper-examples\megaputer_blog.py
    
    • Click Save changes to apply new settings

Usage

  • Add Internet Source node to workspace
  • Choose one of scrapers registered earlier in the drop-down Scraper menu
  • Set parameters if selected scraper supports them
  • Execute node

License

This project is licensed under the MIT License - see the LICENSE file for details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages