-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: added ability to download files in parallel (#88)
## changes - [x] ability to download mulitple files in parallel using ~~asyncio~~ concurrent.futures - [x] docs improvements - [x] added changelog to docs - [x] added docs on how to download files - [x] added methods to download files from the CLI - [x] added docs on platform API features - [x] ~~download_database should use download_files, not download_file~~ next sprint - [x] download_files should allow users to download to a specific folder -- no need to specify all names - [x] download_files should work with no args -- and it will download all files - [x] dropped support for python 3.9 ## technical discussion there are two ways we can go about downloading files. one is to use asyncio and the other is to use concurrent.futures i initially implemented everything asyncio, but decided to switch to concurrent.futures because - mixing async with sync code leads to a lot of boilerplate and repeated code - asyncio code doesn't work natively in jupyter notebooks. we have to use a 3rd party package called nested_asyncio to get it to work, which is more overhead - technically asyncio should be lighter and faster, but i didn't see this in practice - using asyncio also requires us to use the low level async client, which leads to more doubling of work and code - using the low-level asyncio client makes testing more complex, because we can't hot swap clients ### rejected asyncio implementation: ```python async def _create_file_download_urls_async(file_ids: list[str]) -> list[str]: """async method to create file download URLs for a list of files. Do not use this method. This is called internally by the `download_files` method.""" async_client = _api._get_default_client(use_async=True) tasks = [] for file_id in file_ids: tasks.append( _api.create_file_download_url(file_id=file_id, client=async_client) ) data = await asyncio.gather(*tasks) urls = [item.data.download_url for item in data] return urls ``` and ```python @beartype def download_files( file_ids: Optional[list[str]] = None, names: Optional[list[str]] = None, ): """download multiple files in parallel using asyncio If you want to download a single file, use download_file as it has lower overhead. Args: file_ids: IDs of the files on Deep Origin names: Names of the files. Optional. If None, names will be retrieved from Deep Origin Returns: None """ # we need nest_asynio to allow this work # in a jupyter kernel import nest_asyncio nest_asyncio.apply() if names is None: # names not provided, determine names from Deep Origin files = list_files(file_ids=file_ids) names = [item.file.name for item in files] # create presigned URLs for all files in parallel urls = asyncio.run(_create_file_download_urls_async(file_ids)) # download files in parallel asyncio.run(_download_files_async(urls, names)) return urls ``` and ```python async def _download_async(session, url, save_path) -> None: """Downloads a single file asynchronously and saves it to the specified path. Do not use this. Use the synchronous wrapper function download_files instead.""" async with session.get(url) as response: with open(save_path, "wb") as file: async for chunk in response.content.iter_chunked(8192): if chunk: # Filter out keep-alive chunks file.write(chunk) async def _download_files_async(urls: list[str], save_paths: list[str]) -> None: """Downloads multiple files asynchronously. Do not use this. Use the synchronous wrapper function download_files instead.""" async with aiohttp.ClientSession() as session: tasks = [] for url, save_path in zip(urls, save_paths): tasks.append(_download_async(session, url, save_path)) await asyncio.gather(*tasks) ````
- Loading branch information
Showing
22 changed files
with
372 additions
and
462 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Download files | ||
|
||
This page describes how to download files from Deep Origin to your local computer. | ||
|
||
|
||
## Download one or many files from the Data hub | ||
|
||
To download file(s) to the Deep Origin data hub, run the following commands: | ||
|
||
=== "CLI" | ||
|
||
```bash | ||
deeporigin data download-files | ||
``` | ||
|
||
This will download all files on Deep Origin to the current folder. | ||
|
||
To download files that have been assigned to a particular row, use: | ||
|
||
```bash | ||
deeporigin data download-files --assigned-row-ids <row-id-1> <row-id-2> ... | ||
``` | ||
|
||
To download specific files, pass the file IDs using: | ||
|
||
|
||
```bash | ||
deeporigin data download-files --file-ids <file-1> <file-1> ... | ||
``` | ||
|
||
|
||
=== "Python" | ||
|
||
```py | ||
from deeporigin.data_hub import api | ||
api.download_files(files) | ||
``` | ||
|
||
`files` is a list of files to download, and is a list of `ListFilesResponse` objects. To obtain this list, use `api.list_files()`, the output of which can be used as an input to `download_files`. | ||
|
||
!!! Tip "Download all files" | ||
To download all files, call `api.list_files()` and pass the output to `download_files`. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Get user information | ||
|
||
## Get info about current user | ||
|
||
To get information about the currently logged in user, including the user ID, use: | ||
|
||
```python | ||
from deeporigin.platform import api | ||
api.whoami() | ||
``` | ||
|
||
returns information about the current user. A typical response is: | ||
|
||
```json | ||
{ | ||
"data": { | ||
"attributes": { | ||
"company": null, | ||
"expertise": null, | ||
"industries": null, | ||
"pendingInvites": [], | ||
"platform": "OS", | ||
"title": null | ||
}, | ||
"id": "google-apps|[email protected]", | ||
"type": "User" | ||
}, | ||
"links": { | ||
"self": "https://os.deeporigin.io/users/me" | ||
} | ||
} | ||
``` | ||
|
||
## Get information about a user | ||
|
||
To get information about a user, use: | ||
|
||
|
||
```python | ||
from deeporigin.platform import api | ||
api.resolve_user("user-id") | ||
``` | ||
|
||
where `user-id` is the ID of the user, in the format returned by `api.whoami()`. A typical response looks like: | ||
|
||
|
||
```json | ||
{ | ||
"data": { | ||
"attributes": { | ||
"avatar": "https://...", | ||
"email": "[email protected]", | ||
"name": "User Name" | ||
}, | ||
"id": "918ddd25-ab97-4400-9a14-7a8be1216754", | ||
"type": "User" | ||
}, | ||
"links": { | ||
"self": "https://..." | ||
} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
|
||
# Get information about workstations | ||
|
||
To list all workstations on Deep Origin, use: | ||
|
||
```python | ||
from deeporigin.platform import api | ||
api.get_workstations() | ||
``` | ||
|
||
This returns a list of objects, where each object correspond to a workstation. A typical entry looks like: | ||
|
||
```json | ||
{ | ||
"attributes": { | ||
"accessMethods": [ | ||
{ | ||
"icon": "/assets/icons/catalog-items/jupyterlab.svg", | ||
"id": "jupyterlab", | ||
"name": "JupyterLab" | ||
}, | ||
{ | ||
"icon": "/assets/icons/catalog-items/code-server.svg", | ||
"id": "code-server", | ||
"name": "VS Code (web)" | ||
} | ||
], | ||
"accessSettings": { | ||
"publicKey": "ssh-ed25519 ... ", | ||
"ssh": true | ||
}, | ||
"autoStopIdleCPUThreshold": 0, | ||
"autoStopIdleDuration": 30, | ||
"blueprint": "deeporigin/deeporigin-python:staging", | ||
"cloudProvider": { | ||
"region": "us-west-2", | ||
"vendor": "aws" | ||
}, | ||
"clusterId": "3bb775e4-8be6-4936-a6b9", | ||
"created": "2024-10-05T17:01:06.840Z", | ||
"description": "dfd", | ||
"drn": "drn:...", | ||
"enableAutoStop": true, | ||
"name": "forthcoming-tyrannosaurus-8fd", | ||
"nextUserActions": [ | ||
"DELETE" | ||
], | ||
"orgHandle": "deeporigin-com", | ||
"requestedResources": { | ||
"cpu": 8, | ||
"gpu": 0, | ||
"gpuSize": "NONE", | ||
"memory": 32, | ||
"storage": 250 | ||
}, | ||
"state": { | ||
"error": "", | ||
"isError": false, | ||
"stage": "READY", | ||
"status": "TERMINATED" | ||
}, | ||
"status": "TERMINATED", | ||
"summary": "", | ||
"templateVersion": "v0.1.0", | ||
"updated": "2024-10-07T12:46:46.511Z", | ||
"userHandle": "google-apps|[email protected]", | ||
"volumeDrns": [ | ||
"..." | ||
], | ||
"wasAutoStopped": false | ||
}, | ||
"id": "...", | ||
"type": "ComputeBench" | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Platform API | ||
|
||
The Deep Origin CLI and python client allows you to control and interact with the Deep Origin platform. |
Oops, something went wrong.