Web-Crawler-for-NL2SQL

Create Python 3 virtual environment.

python3 -m venv myvenv
source myvenv/bin/activate

Create postgresql DB and run setup the database

psql -U username -d myDataBase -a -f db_setting.sql

This program requires 3 arguments.

component name: url_queue / web_downloader / parser / sql_extractor / table_extractor / nl_extractor
input file type: sql / list
input file name

1. Component name

url_queue: It executes crawling using Google Search. You should use filename.
web_downloader: It executes crawling with the ouputs of the url_queue.
parser: It parses a html and filters the html with content keywords.
sql_extractor: It extracts SQL from the output of the parser and filters the html which doesn't have any SQLs. It follows the syntax of sqlite3.
table_extractor: It extracts tables from the output of the sql_extractor.
nl_extractor: It extracts tables from htmls which not filtered.

2. Input file type

3. Input file name

In the file, there is a list or a sql as an input.

url_queue

It needs keywords

You can test with run.sh script or the following commands

Provide feedback