diff --git a/docs/tools/draco.md b/docs/tools/draco.md index db2cf1c..be6f494 100644 --- a/docs/tools/draco.md +++ b/docs/tools/draco.md @@ -1,6 +1,6 @@ # Draco: a Chemical Data Extractor -[Draco] is a tool which aims to extract chemical data (Molecules, Reactions, procedures, etc.) from documents. In its first version, Draco focuses at extracting molecules (fragments and complete structures) from PDF documents +[Draco] is a tool that aims to extract chemical data (Molecules, Reactions, procedures, etc.) from documents. In its first version, Draco focuses at extracting molecules (fragments and complete structures) from PDF documents. ## File Inputs @@ -12,7 +12,7 @@ Upload a PDF file containing chemical structures to extract to the Data Hub. ### Output Files -Draco produces an .xlsx (Excel) file which contains all the extracted chemical strucutres for a given document. Each row contains the extracted image, the predicted image, the predicted SMILES, the confidence score and the confidence score associated to each token in the SMILES +Draco produces a .xlsx (Excel) file that contains all the extracted chemical structures for a given document. Each row includes the extracted image, the predicted image, the predicted SMILES, the confidence score, and the confidence score associated with each token in the SMILES. ## Running Draco on Deep Origin @@ -20,6 +20,10 @@ To run Draco on Deep Origin, follow these steps: ### 1. Create a database to store input and output files +Create a Column containing the PDF files and an output column, which will store the output files. The type of these columns is File. + +![draco_database_example](https://github.com/user-attachments/assets/926a4f06-3c27-4b4b-9b47-79fc98e96723) + ### 2. Start a tool run on Deep Origin To start a tool run, use: @@ -56,3 +60,16 @@ To wait for the tool run to finish, use: from deeporigin.tools.utils import wait_for_job wait_for_job("9f7a3741-e392-45fb-a349-804b7fca07d7") ``` + +## Example +Using the database shown above in section "1. Create a database to store input and output files", the code to extract molecules from the uploaded document (patent_test.pdf) is: +```python +from deeporigin.tools import run + +job_id = run.draco( + database_id="draco_use_case", + row_id="draco-use-case-1", + input_file_column_name="Document" + output_column_name="Result" +) +```