deeporiginbio · sg-s · Dec 16, 2024 · Dec 13, 2024 · Dec 13, 2024 · Dec 13, 2024
@@ -0,0 +1,6 @@
+This document describes how to discover and list tools on the Deep Origin platform. 
+
+First, available tools are listed on the panel on left, with documentation on each tool and how to run them. 
+
+!!! info "Coming soon"
+    More tools will be added soon. The ability to list tools from the API is coming soon.
@@ -0,0 +1 @@
+This document describes how to run tools on the Deep Origin platform. 
@@ -0,0 +1,12 @@
+# Tools on Deep Origin
+
+This page describes how you can run tools on Deep Origin.
+
+## What are tools? 
+
+Tools are containerized scientific tools that can be run on the Deep Origin platform. The Deep Origin API (and this python client) make it easy to run tools on the Deep Origin platform, and wire up inputs and outputs. 
+
+
+## Can I make my own tools?
+
+This ability is coming soon. 
@@ -0,0 +1,139 @@
+# AutoDock Vina
+
+[AutoDock Vina](https://vina.scripps.edu/) is a molecular docking tool widely used in computational drug discovery. It predicts the binding mode of a small molecule (ligand) to a protein (receptor) by modeling their interactions. Vina is known for its high performance, accuracy, and user-friendly interface. It calculates the binding affinity and provides docking poses of the ligand in the receptor’s active site.
+
+Vina is commonly used to screen potential drug candidates, study protein-ligand interactions, and explore binding mechanisms. It employs a scoring function to evaluate the strength of binding and an efficient optimization algorithm to search for the best docking configuration
+
+
+## File Inputs 
+
+
+
+### 1. Receptor File
+
+- Format: [PDBQT](https://userguide.mdanalysis.org/2.6.0/formats/reference/pdbqt.html) (Protein Data Bank, with charges and torsions).
+- This file represents the target (e.g., a protein or DNA structure).
+
+Upload this file to the Data Hub. Typically, you would upload this file to a database dedicated to runs with this tool, in a column named `receptor`.
+
+### 2. Ligand File
+
+- Format: [PDBQT](https://userguide.mdanalysis.org/2.6.0/formats/reference/pdbqt.html).
+- Represents the small molecule (e.g., a drug or compound).
+- Prepared by calculating torsions and assigning charges using AutoDock Tools or MGLTools.
+- Other input formats (e.g., PDB or MOL2) must be converted to PDBQT using preparation tools before use in AutoDock Vina.
+
+Upload this file to the Data Hub. Typically, you would upload this file to a database dedicated to runs with this tool, in a column named `ligand`.
+
+## Parameters
+
+This section describes the parameters for a tool run, that are passed in the `start_run` function.
+
+
+### Search Space
+
+Defines the area of the receptor where docking will occur. This is critical for focusing on the active site or binding pocket.
+
+#### `center_x, center_y, center_z`:
+
+- Coordinates of the center of the search box, specified in Ångstroms.
+- Should be based on the binding site of the receptor (obtained from experimental data or visual inspection).
+
+#### `size_x, size_y, size_z`:
+
+- Dimensions of the search box along each axis (in Ångstroms).
+- Determines the search region’s size. A larger box covers more area but increases computation time.
+- For most cases, sizes between 20-30 Ångstroms per side are typical for a flexible ligand.
+
+### Docking Parameters
+
+#### `energy_range`
+
+- The energy difference (in kcal/mol) between the best pose and the worst acceptable pose.
+- Smaller values prioritize only low-energy poses; larger values allow more diverse poses.
+
+#### `exhaustiveness`:
+
+- Determines the thoroughness of the search.
+- Higher values increase the number of sampling attempts, improving accuracy but requiring more computational time.
+- Default is 8; lower values (e.g., 1-4) are faster but less exhaustive.
+
+#### `num_modes`:
+
+- The maximum number of docking poses to generate.
+- Vina will output up to this many unique poses for analysis.
+
+## Running Vina on Deep Origin
+
+To run AutoDock Vina on Deep Origin, follow these steps:
+
+
+### 1. Create a database to store input and output files
+
+### 2. Start a tool run on Deep Origin
+
+For this tool run, we will use the following parameters:
+
+```python
+search_space = {
+    "center_x": 15.190,
+    "center_y": 53.903,
+    "center_z": 16.917,
+    "size_x": 1.1,
+    "size_y": 1.1,
+    "size_z": 1.1,
+}
+
+docking = {
+    "energy_range": 0.3,
+    "exhaustiveness": 1,
+    "num_modes": 9,
+}
+```
+
+To start a tool run, use:
+
+```python
+from deeporigin.tools import autodock_vina
+
+job_id = autodock_vina.start_run(
+    database_id="<your-db-name>",
+    row_id="<row-name>",
+    search_space=search_space,
+    docking=docking,
+    output_column_name="<output-column-name>",
+    receptor_column_name="<receptor-column-name>",
+    ligand_column_name="<ligand-column-name>",
+)
+```
+
+`start_run` returns the ID of the tool run, that can be used to monitor the status of the run and terminate it if needed. 
+
+`start_run` prints a message that looks like:
+
+```bash
+🧬 Job started with ID: 9f7a3741-e392-45fb-a349-804b7fca07d7
+```
+
+To monitor the status of the tool run, use:
+
+```python
+from deeporigin.tools.utils import query_run_status
+query_run_status("9f7a3741-e392-45fb-a349-804b7fca07d7")
+```
+
+To wait for the tool run to finish, use:
+
+```python
+from deeporigin.tools.utils import wait_for_job
+wait_for_job("9f7a3741-e392-45fb-a349-804b7fca07d7")
+```
+
+
+## Outputs
+
+### Docked Poses
+
+- Format: PDBQT.
+- Contains the ligand poses docked in the receptor.
+- Each pose includes the coordinates, torsional flexibility, and orientation of the ligand.
@@ -69,6 +69,13 @@ nav:
   - How to:
     - Install variables and secrets: how-to/compute-hub/variables.md
     - Get info about your workstation: how-to/compute-hub/workstation-info.md
+- Tools: 
+  - tools/index.md
+  - How to:
+    - List tools: how-to/tools/list-tools.md
+    - Run tools: how-to/tools/run-tools.md
+  - Available tools:
+    - Autodock Vina: tools/vina.md
 - Changelog: changelog.md
 - Support: https://www.support.deeporigin.com/servicedesk/customer/portals
 

@@ -36,7 +36,7 @@ dependencies = [
     "pyjwt",
     "cryptography",
     "python-box",
-    "do-sdk-platform==1.0.0",
+    "do-sdk-platform==1.1.0",
 ]
 dynamic = ["version"]
 

@@ -0,0 +1,12 @@
+"""bridge module to interact with the platform tools api"""
+
+import sys
+
+from deeporigin.platform.utils import add_functions_to_module
+
+methods = add_functions_to_module(
+    module=sys.modules[__name__],
+    api_name="ClustersApi",
+)
+
+__all__ = list(methods)
@@ -0,0 +1,105 @@
+"""module to run AutoDock Vina on Deep Origin"""
+
+from beartype import beartype
+from deeporigin.config import get_value
+from deeporigin.data_hub import api
+from deeporigin.platform import tools
+from deeporigin.tools.utils import _get_cluster_id, _resolve_column_name
+
+
+@beartype
+def start_run(
+    *,
+    database_id: str,
+    row_id: str,
+    search_space: dict,
+    docking: dict,
+    output_column_name: str,
+    receptor_column_name: str,
+    ligand_column_name: str,
+) -> str:
+    """starts an AutoDock Vina run
+
+    Args:
+        database_id (str): database ID or name of the database to source inputs from and write outputs to
+        row_id (str, optional): row ID or name of the row to source inputs from and write outputs to.
+        search_space (dict): search space parameters. Must include keys 'center_x', 'center_y', 'center_z', 'size_x', 'size_y', and 'size_z'
+        docking (dict): docking parameters. Must include keys 'energy_range', 'exhaustiveness', and 'num_modes'
+        output_column_name (str): name of the column to write output file to
+        receptor_column_name (str): name of the column to source the receptor file from
+        ligand_column_name (str): name of the column to source the ligand file from
+
+    Returns:
+        str: job ID of the run
+
+
+    """
+
+    if docking.keys() != {"energy_range", "exhaustiveness", "num_modes"}:
+        raise ValueError(
+            "docking must be a dictionary with keys 'energy_range', 'exhaustiveness', and 'num_modes'"
+        )
+
+    if search_space.keys() != {
+        "center_x",
+        "center_y",
+        "center_z",
+        "size_x",
+        "size_y",
+        "size_z",
+    }:
+        raise ValueError(
+            "search_space must be a dictionary with keys 'center_x', 'center_y', 'center_z', 'size_x', 'size_y', and 'size_z'"
+        )
+
+    if not database_id.startswith("_database"):
+        data = api.convert_id_format(hids=[database_id])
+        database_id = data[0].id
+
+    db = api.describe_database(database_id=database_id)
+    cols = db.cols
+
+    # resolve columns
+    output_column_id = _resolve_column_name(output_column_name, cols)
+    receptor_column_id = _resolve_column_name(receptor_column_name, cols)
+    ligand_column_id = _resolve_column_name(ligand_column_name, cols)
+
+    if not row_id.startswith("_row"):
+        data = data = api.convert_id_format(hids=[row_id])
+        row_id = data[0].id
+
+    json_data = {
+        "toolId": "deeporigin/autodock-vina",
+        "inputs": {
+            "receptor": {
+                "rowId": row_id,
+                "columnId": receptor_column_id,
+                "databaseId": database_id,
+            },
+            "ligand": {
+                "rowId": row_id,
+                "columnId": ligand_column_id,
+                "databaseId": database_id,
+            },
+            "searchSpace": search_space,
+            "docking": docking,
+        },
+        "outputs": {
+            "output_file": {
+                "rowId": row_id,
+                "columnId": output_column_id,
+                "databaseId": database_id,
+            },
+        },
+        "clusterId": _get_cluster_id(),
+    }
+
+    response = tools.execute_tool(
+        org_friendly_id=get_value()["organization_id"],
+        execute_tool_dto=json_data,
+    )
+
+    job_id = response.id
+
+    print(f"🧬 Job started with ID: {job_id}")
+    return job_id
@@ -0,0 +1,87 @@
+"""this module contains utility functions used by tool execution"""
+
+import functools
+import time
+
+from beartype import beartype
+from deeporigin.config import get_value
+from deeporigin.platform import clusters, tools
+
+
+def query_run_status(job_id: str):
+    """determine the status of a run, identified by job ID
+
+    Args:
+        job_id (str): job ID
+
+    Returns:
+        One of "Created", "Queued", "Running", "Succeeded", or "Failed"
+
+    """
+
+    data = tools.get_tool_execution(
+        org_friendly_id=get_value()["organization_id"], resource_id=job_id
+    )
+
+    return data.attributes.status
+
+
+def wait_for_job(job_id: str, *, poll_interval: int = 4):
+    """Run while job is in non-terminal state, polling repeatedly"""
+
+    status = "Running"
+    txt_length = 0
+    bs = "".join(["\b" for _ in range(txt_length)])
+    while not (status == "Succeeded" or status == "Failed"):
+        print(bs, end="", flush=True)
+        status = query_run_status(job_id)
+        txt_length = len(status)
+        print(status, end="", flush=True)
+        time.sleep(poll_interval)
+        bs = "".join(["\b" for _ in range(txt_length)])
+
+
+@beartype
+def _resolve_column_name(column_name: str, cols: list) -> str:
+    """resolve column IDs from column name
+
+    Args:
+        column_name (str): column name
+        cols (list): list of columns
+
+    Returns:
+        str: column ID
+    """
+
+    column_ids = [col.id for col in cols]
+    column_names = [col.name for col in cols]
+
+    if column_name not in column_names and column_name not in column_ids:
+        raise ValueError(f"column_name must be one of {column_names} or {column_ids}")
+    elif column_name in column_names:
+        column_id = [col.id for col in cols if col.name == column_name][0]
+    else:
+        column_id = column_name
+
+    return column_id
+
+
+@functools.cache
+@beartype
+def _get_cluster_id() -> str:
+    """gets a valid cluster ID to run tools on
+
+    this defaults to pulling us-west-2"""
+
+    available_clusters = clusters.list_clusters(
+        org_friendly_id=get_value()["organization_id"]
+    )
+
+    cluster = [
+        cluster
+        for cluster in available_clusters
+        if "us-west-2" in cluster.attributes.name
+    ]
+    cluster = cluster[0]
+    cluster_id = cluster.id
+    return cluster_id
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This document describes how to run tools on the Deep Origin platform.