Skip to content

Commit

Permalink
updated extended cmds
Browse files Browse the repository at this point in the history
  • Loading branch information
ahmedshahriar committed May 14, 2022
1 parent 8762b9e commit ea97d92
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 22 deletions.
34 changes: 24 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# bd-medicine-scraper
[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) ![Django CI](https://github.com/ahmedshahriar/bd-medicine-scraper/actions/workflows/django-ci.yml/badge.svg) [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/ahmedshahriarsakib/bangladesh-medicine-analytics) [![Open in Visual Studio Code](https://open.vscode.dev/badges/open-in-vscode.svg)](https://github.dev/ahmedshahriar/bd-medicine-scraper)
[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/) ![Django CI](https://github.com/ahmedshahriar/bd-medicine-scraper/actions/workflows/django-ci.yml/badge.svg) [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/ahmedshahriarsakib/bangladesh-medicine-analytics) [![Open in Visual Studio Code](https://img.shields.io/static/v1?logo=visualstudiocode&label=&message=Open%20in%20Visual%20Studio%20Code&labelColor=2c2c32&color=007acc&logoColor=007acc)](https://github.dev/ahmedshahriar/bd-medicine-scraper)

## Overview
Welcome to the bd-medicine-scraper repository!
Expand All @@ -11,8 +11,15 @@ I also customized the django admin panels, added additional features such as -
- custom filtering (alphabetical, model property)
- bulk actions (export to csv)

I customized the scrapy command to run scrapy spiders from django command line. (ex- `python manage.py spider_name`)\
Integrated custom django command to export models to csv. (ex- `python manage.py export_model_name export_data_path`)\
Other Customizations:
- custom scrapy command to run scrapy spiders from django command line. (ex- `python manage.py <spider_name>`)
- custom django commands
- to export models to csv. (`python manage.py <export_model_name> <export_data_path>`)
```
python manage.py export_medicine_data /home/ahmed/Desktop/medicine_data.csv
- to export generic monograph PDFs
```
python manage.py export_generics_monograph
I also added proxy configuration to scrapy.
Expand Down Expand Up @@ -66,15 +73,15 @@ Here is a list of the CSV files with their featured columns:
- Package Size (unit price)
2. manufacturer.csv (245 entries)
- name
3. indication.csv (2000+ entries)
3. indication.csv (2k+ entries)
- name
4. generic.csv (~1700-1800 entries)
- name
- monographic link (PDF URL)
- drug class
- indication
- generic details such as "Indication description", "Pharmacology description", "Dosage & Administration description" etc.
5. drug class.csv (452 entries)
5. drug class.csv (~400 entries)
- name
6. dosage form.csv (~120 entries)
- name
Expand All @@ -85,13 +92,20 @@ Here is a list of the CSV files with their featured columns:
## Tests
Workflow script - [django-ci.yml](https://github.com/ahmedshahriar/bd-medicine-scraper/blob/dev/.github/workflows/django-ci.yml)
Run the tests using:\
`coverage run --omit='*/venv/*' manage.py test`
Run the tests using:
```
coverage run --omit='*/venv/*' manage.py test
```
or
`python manage.py test`
```
python manage.py test
```
Check the coverage\
`coverage html`
Check the coverage
```
coverage html
```
## Built With
Expand Down
9 changes: 5 additions & 4 deletions crawler/management/commands/export_csv.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import csv
import logging
import datetime

from crawler.models import Medicine, Generic, DosageForm, DrugClass, Indication, Manufacturer
from django.core.management import BaseCommand
from django.utils.autoreload import logger


class Command(BaseCommand): # see https://gist.github.com/2724472
Expand All @@ -13,7 +13,8 @@ class Command(BaseCommand): # see https://gist.github.com/2724472
def add_arguments(self, parser):
parser.add_argument('model_name',
type=str,
help='model name for the csv export')
help='model name for the csv export, e.g. medicine, generic, dosage_form, drug_class, '
'indication, manufacturer')

parser.add_argument('outfile',
nargs='?',
Expand All @@ -23,7 +24,7 @@ def add_arguments(self, parser):
def handle(self, *args, **options):
model_name = options['model_name']
export_file = f"{options['outfile']}.csv" if options['outfile'] else '{}.csv'.format(model_name)
print("Exporting... %s" % model_name)
logger.info("Exporting... %s" % model_name)

model_dict = {'medicine': Medicine, 'generic': Generic, 'dosage_form': DosageForm, 'drug_class': DrugClass,
'indication': Indication, 'manufacturer': Manufacturer}
Expand All @@ -48,4 +49,4 @@ def handle(self, *args, **options):
value = value.strftime('%d/%m/%Y')
data_row.append(value)
writer.writerow(data_row)
print(f.name, "exported")
logger.info(f.name, "exported")
8 changes: 4 additions & 4 deletions crawler/management/commands/export_generics_monograph.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import logging
import os
from pathlib import Path

Expand All @@ -11,14 +10,15 @@


class Command(BaseCommand):
help = "Export Generic Monograph to PDFs"
help = "Export Generic Monograph to PDFs. This command will download the drug monograph PDFs from the URLs listed " \
"on generic data. "

def handle(self, *args, **options):
logger.info("Export Generic Monograph to PDFs")
try:
monograph_links = (
Generic.objects.values_list("monograph_link", flat=True).exclude(monograph_link__isnull=True)
.exclude(monograph_link__exact=''))
.exclude(monograph_link__exact=''))
logger.info("Total monograph links: {}".format(len(monograph_links)))
for monograph_link in monograph_links:
if monograph_link:
Expand All @@ -35,7 +35,7 @@ def handle(self, *args, **options):
response = requests.get(monograph_link)
dirname = 'monograph-data/'
os.makedirs(os.path.dirname(dirname), exist_ok=True)
with open(Path(dirname+str(monograph_link).split("/")[-1]+'.pdf'), 'wb') as f:
with open(Path(dirname + str(monograph_link).split("/")[-1] + '.pdf'), 'wb') as f:
f.write(response.content)

except Exception as ge:
Expand Down
6 changes: 2 additions & 4 deletions crawler/management/commands/med_generic_mapper.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import logging
import os

from django.core.management.base import BaseCommand
from crawler.models import Generic, Medicine
from django.utils.autoreload import logger


class Command(BaseCommand):
Expand All @@ -20,4 +18,4 @@ def handle(self, *args, **options):
medicine.generic = generic
medicine.save()
except Exception as ge:
logging.info(ge)
logger.info(ge)

0 comments on commit ea97d92

Please sign in to comment.