Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tooling to run autodock vina on deep origin #131

Merged
merged 7 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/how-to/tools/list-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This document describes how to discover and list tools on the Deep Origin platform.

First, available tools are listed on the panel on left, with documentation on each tool and how to run them.

!!! info "Coming soon"
More tools will be added soon. The ability to list tools from the API is coming soon.
1 change: 1 addition & 0 deletions docs/how-to/tools/run-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This document describes how to run tools on the Deep Origin platform.
12 changes: 12 additions & 0 deletions docs/tools/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Tools on Deep Origin

This page describes how you can run tools on Deep Origin.

## What are tools?

Tools are containerized scientific tools that can be run on the Deep Origin platform. The Deep Origin API (and this python client) make it easy to run tools on the Deep Origin platform, and wire up inputs and outputs.


## Can I make my own tools?

This ability is coming soon.
139 changes: 139 additions & 0 deletions docs/tools/vina.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# AutoDock Vina

[AutoDock Vina](https://vina.scripps.edu/) is a molecular docking tool widely used in computational drug discovery. It predicts the binding mode of a small molecule (ligand) to a protein (receptor) by modeling their interactions. Vina is known for its high performance, accuracy, and user-friendly interface. It calculates the binding affinity and provides docking poses of the ligand in the receptor’s active site.

Vina is commonly used to screen potential drug candidates, study protein-ligand interactions, and explore binding mechanisms. It employs a scoring function to evaluate the strength of binding and an efficient optimization algorithm to search for the best docking configuration


## File Inputs



### 1. Receptor File

- Format: [PDBQT](https://userguide.mdanalysis.org/2.6.0/formats/reference/pdbqt.html) (Protein Data Bank, with charges and torsions).
- This file represents the target (e.g., a protein or DNA structure).

Upload this file to the Data Hub. Typically, you would upload this file to a database dedicated to runs with this tool, in a column named `receptor`.

### 2. Ligand File

- Format: [PDBQT](https://userguide.mdanalysis.org/2.6.0/formats/reference/pdbqt.html).
- Represents the small molecule (e.g., a drug or compound).
- Prepared by calculating torsions and assigning charges using AutoDock Tools or MGLTools.
- Other input formats (e.g., PDB or MOL2) must be converted to PDBQT using preparation tools before use in AutoDock Vina.

Upload this file to the Data Hub. Typically, you would upload this file to a database dedicated to runs with this tool, in a column named `ligand`.

## Parameters

This section describes the parameters for a tool run, that are passed in the `start_run` function.


### Search Space

Defines the area of the receptor where docking will occur. This is critical for focusing on the active site or binding pocket.

#### `center_x, center_y, center_z`:

- Coordinates of the center of the search box, specified in Ångstroms.
- Should be based on the binding site of the receptor (obtained from experimental data or visual inspection).

#### `size_x, size_y, size_z`:

- Dimensions of the search box along each axis (in Ångstroms).
- Determines the search region’s size. A larger box covers more area but increases computation time.
- For most cases, sizes between 20-30 Ångstroms per side are typical for a flexible ligand.

### Docking Parameters

#### `energy_range`

- The energy difference (in kcal/mol) between the best pose and the worst acceptable pose.
- Smaller values prioritize only low-energy poses; larger values allow more diverse poses.

#### `exhaustiveness`:

- Determines the thoroughness of the search.
- Higher values increase the number of sampling attempts, improving accuracy but requiring more computational time.
- Default is 8; lower values (e.g., 1-4) are faster but less exhaustive.

#### `num_modes`:

- The maximum number of docking poses to generate.
- Vina will output up to this many unique poses for analysis.

## Running Vina on Deep Origin

To run AutoDock Vina on Deep Origin, follow these steps:


### 1. Create a database to store input and output files

### 2. Start a tool run on Deep Origin

For this tool run, we will use the following parameters:

```python
search_space = {
"center_x": 15.190,
"center_y": 53.903,
"center_z": 16.917,
"size_x": 1.1,
"size_y": 1.1,
"size_z": 1.1,
}

docking = {
"energy_range": 0.3,
"exhaustiveness": 1,
"num_modes": 9,
}
```

To start a tool run, use:

```python
from deeporigin.tools import autodock_vina

job_id = autodock_vina.start_run(
database_id="<your-db-name>",
row_id="<row-name>",
search_space=search_space,
docking=docking,
output_column_name="<output-column-name>",
receptor_column_name="<receptor-column-name>",
ligand_column_name="<ligand-column-name>",
)
```

`start_run` returns the ID of the tool run, that can be used to monitor the status of the run and terminate it if needed.

`start_run` prints a message that looks like:

```bash
🧬 Job started with ID: 9f7a3741-e392-45fb-a349-804b7fca07d7
```

To monitor the status of the tool run, use:

```python
from deeporigin.tools.utils import query_run_status
query_run_status("9f7a3741-e392-45fb-a349-804b7fca07d7")
```

To wait for the tool run to finish, use:

```python
from deeporigin.tools.utils import wait_for_job
wait_for_job("9f7a3741-e392-45fb-a349-804b7fca07d7")
```


## Outputs

### Docked Poses

- Format: PDBQT.
- Contains the ligand poses docked in the receptor.
- Each pose includes the coordinates, torsional flexibility, and orientation of the ligand.
7 changes: 7 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,13 @@ nav:
- How to:
- Install variables and secrets: how-to/compute-hub/variables.md
- Get info about your workstation: how-to/compute-hub/workstation-info.md
- Tools:
- tools/index.md
- How to:
- List tools: how-to/tools/list-tools.md
- Run tools: how-to/tools/run-tools.md
- Available tools:
- Autodock Vina: tools/vina.md
- Changelog: changelog.md
- Support: https://www.support.deeporigin.com/servicedesk/customer/portals

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ dependencies = [
"pyjwt",
"cryptography",
"python-box",
"do-sdk-platform==1.0.0",
"do-sdk-platform==1.1.0",
]
dynamic = ["version"]

Expand Down
12 changes: 12 additions & 0 deletions src/platform/clusters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
"""bridge module to interact with the platform tools api"""

import sys

from deeporigin.platform.utils import add_functions_to_module

methods = add_functions_to_module(
module=sys.modules[__name__],
api_name="ClustersApi",
)

__all__ = list(methods)
Empty file added src/tools/__init__.py
Empty file.
105 changes: 105 additions & 0 deletions src/tools/autodock_vina.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
"""module to run AutoDock Vina on Deep Origin"""

from beartype import beartype
from deeporigin.config import get_value
from deeporigin.data_hub import api
from deeporigin.platform import tools
from deeporigin.tools.utils import _get_cluster_id, _resolve_column_name


@beartype
def start_run(
*,
database_id: str,
row_id: str,
search_space: dict,
docking: dict,
output_column_name: str,
receptor_column_name: str,
ligand_column_name: str,
) -> str:
"""starts an AutoDock Vina run

Args:
database_id (str): database ID or name of the database to source inputs from and write outputs to
row_id (str, optional): row ID or name of the row to source inputs from and write outputs to.
search_space (dict): search space parameters. Must include keys 'center_x', 'center_y', 'center_z', 'size_x', 'size_y', and 'size_z'
docking (dict): docking parameters. Must include keys 'energy_range', 'exhaustiveness', and 'num_modes'
output_column_name (str): name of the column to write output file to
receptor_column_name (str): name of the column to source the receptor file from
ligand_column_name (str): name of the column to source the ligand file from

Returns:
str: job ID of the run


"""

if docking.keys() != {"energy_range", "exhaustiveness", "num_modes"}:
raise ValueError(
"docking must be a dictionary with keys 'energy_range', 'exhaustiveness', and 'num_modes'"
)

if search_space.keys() != {
"center_x",
"center_y",
"center_z",
"size_x",
"size_y",
"size_z",
}:
raise ValueError(
"search_space must be a dictionary with keys 'center_x', 'center_y', 'center_z', 'size_x', 'size_y', and 'size_z'"
)

if not database_id.startswith("_database"):
data = api.convert_id_format(hids=[database_id])
database_id = data[0].id

db = api.describe_database(database_id=database_id)
cols = db.cols

# resolve columns
output_column_id = _resolve_column_name(output_column_name, cols)
receptor_column_id = _resolve_column_name(receptor_column_name, cols)
ligand_column_id = _resolve_column_name(ligand_column_name, cols)

if not row_id.startswith("_row"):
data = data = api.convert_id_format(hids=[row_id])
row_id = data[0].id

json_data = {
"toolId": "deeporigin/autodock-vina",
"inputs": {
"receptor": {
"rowId": row_id,
"columnId": receptor_column_id,
"databaseId": database_id,
},
"ligand": {
"rowId": row_id,
"columnId": ligand_column_id,
"databaseId": database_id,
},
"searchSpace": search_space,
"docking": docking,
},
"outputs": {
"output_file": {
"rowId": row_id,
"columnId": output_column_id,
"databaseId": database_id,
},
},
"clusterId": _get_cluster_id(),
}

response = tools.execute_tool(
org_friendly_id=get_value()["organization_id"],
execute_tool_dto=json_data,
)

job_id = response.id

print(f"🧬 Job started with ID: {job_id}")
return job_id
87 changes: 87 additions & 0 deletions src/tools/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""this module contains utility functions used by tool execution"""

import functools
import time

from beartype import beartype
from deeporigin.config import get_value
from deeporigin.platform import clusters, tools


def query_run_status(job_id: str):
"""determine the status of a run, identified by job ID

Args:
job_id (str): job ID

Returns:
One of "Created", "Queued", "Running", "Succeeded", or "Failed"

"""

data = tools.get_tool_execution(
org_friendly_id=get_value()["organization_id"], resource_id=job_id
)

return data.attributes.status


def wait_for_job(job_id: str, *, poll_interval: int = 4):
"""Run while job is in non-terminal state, polling repeatedly"""

status = "Running"
txt_length = 0
bs = "".join(["\b" for _ in range(txt_length)])
while not (status == "Succeeded" or status == "Failed"):
print(bs, end="", flush=True)
status = query_run_status(job_id)
txt_length = len(status)
print(status, end="", flush=True)
time.sleep(poll_interval)
bs = "".join(["\b" for _ in range(txt_length)])


@beartype
def _resolve_column_name(column_name: str, cols: list) -> str:
"""resolve column IDs from column name

Args:
column_name (str): column name
cols (list): list of columns

Returns:
str: column ID
"""

column_ids = [col.id for col in cols]
column_names = [col.name for col in cols]

if column_name not in column_names and column_name not in column_ids:
raise ValueError(f"column_name must be one of {column_names} or {column_ids}")
elif column_name in column_names:
column_id = [col.id for col in cols if col.name == column_name][0]
else:
column_id = column_name

return column_id


@functools.cache
@beartype
def _get_cluster_id() -> str:
"""gets a valid cluster ID to run tools on

this defaults to pulling us-west-2"""

available_clusters = clusters.list_clusters(
org_friendly_id=get_value()["organization_id"]
)

cluster = [
cluster
for cluster in available_clusters
if "us-west-2" in cluster.attributes.name
]
cluster = cluster[0]
cluster_id = cluster.id
return cluster_id
Loading