Skip to content

Commit

Permalink
Update draco.md
Browse files Browse the repository at this point in the history
  • Loading branch information
vdelannee authored Jan 8, 2025
1 parent 92802ad commit 7b24d76
Showing 1 changed file with 19 additions and 2 deletions.
21 changes: 19 additions & 2 deletions docs/tools/draco.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Draco: a Chemical Data Extractor

[Draco] is a tool which aims to extract chemical data (Molecules, Reactions, procedures, etc.) from documents. In its first version, Draco focuses at extracting molecules (fragments and complete structures) from PDF documents
[Draco] is a tool that aims to extract chemical data (Molecules, Reactions, procedures, etc.) from documents. In its first version, Draco focuses at extracting molecules (fragments and complete structures) from PDF documents.

## File Inputs

Expand All @@ -12,14 +12,18 @@ Upload a PDF file containing chemical structures to extract to the Data Hub.

### Output Files

Draco produces an .xlsx (Excel) file which contains all the extracted chemical strucutres for a given document. Each row contains the extracted image, the predicted image, the predicted SMILES, the confidence score and the confidence score associated to each token in the SMILES
Draco produces a .xlsx (Excel) file that contains all the extracted chemical structures for a given document. Each row includes the extracted image, the predicted image, the predicted SMILES, the confidence score, and the confidence score associated with each token in the SMILES.

## Running Draco on Deep Origin

To run Draco on Deep Origin, follow these steps:

### 1. Create a database to store input and output files

Create a Column containing the PDF files and an output column, which will store the output files. The type of these columns is File.

![draco_database_example](https://github.com/user-attachments/assets/926a4f06-3c27-4b4b-9b47-79fc98e96723)

### 2. Start a tool run on Deep Origin

To start a tool run, use:
Expand Down Expand Up @@ -56,3 +60,16 @@ To wait for the tool run to finish, use:
from deeporigin.tools.utils import wait_for_job
wait_for_job("9f7a3741-e392-45fb-a349-804b7fca07d7")
```

## Example
Using the database shown above in section "1. Create a database to store input and output files", the code to extract molecules from the uploaded document (patent_test.pdf) is:
```python
from deeporigin.tools import run

job_id = run.draco(
database_id="draco_use_case",
row_id="draco-use-case-1",
input_file_column_name="Document"
output_column_name="Result"
)
```

0 comments on commit 7b24d76

Please sign in to comment.