Skip to content

Commit

Permalink
Merge pull request #13 from cedadev/development
Browse files Browse the repository at this point in the history
Add init command and prepare for v1.0 release
  • Loading branch information
jackleland authored Jan 26, 2024
2 parents 92977cc + 8466ac1 commit 732ae53
Show file tree
Hide file tree
Showing 26 changed files with 1,798 additions and 53 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/sphinx.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Sphinx build

on:
push:
branches: [ "main", "development" ]
pull_request:
branches: [ "main" ]


jobs:
build:
runs-on: ubuntu-latest
steps:
# Checkout and build the docs with sphinx
- uses: actions/checkout@v2
- name: Build HTML
uses: ammaraskar/sphinx-action@master
with:
docs-folder: "docs/user_guide"
# pre-build-command: "mkdir /tmp/sphinx-log"
- name: Upload artifacts
uses: actions/upload-artifact@v1
with:
name: html-docs
path: docs/user_guide/build/html/
# Deploys to the gh-pages branch if the commit was made to main, the
# gh-pages then takes over serving the html
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
if: github.ref == 'refs/heads/main'
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/build/html
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,8 @@
.pytest_cache/
*egg-info
.venv/
tests/test_dir*
tests/test_dir*
**/.DS_Store

build/
docs/user_guide/build/
67 changes: 52 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
CEDA Near-Line Data Store Client
================================

For more information please see the [documentation](https://cedadev.github.io/nlds-client/index.html).

This is the client used for interacting with the CEDA Near-Line Data Store
([NLDS](https://github.com/cedadev/nlds)).

Expand All @@ -9,32 +11,67 @@ It consists of a library part (clientlib) and a command line client

NLDS client is built upon [Requests](https://docs.python-requests.org/en/master/index.html) and [Click](https://click.palletsprojects.com/en/8.0.x/)

NLDS client requires Python 3. It has been tested with Python 3.8 and Python 3.9.
NLDS client requires Python 3. It has been tested with Python 3.8, 3.9, 3.10
and 3.11.

Installation
------------

1. Create a Python virtual environment:
`python3 -m venv ~/nlds-client`

2. Activate the nlds-client:
`source ~/nlds-client/bin/activate`
1. Create a python virtual environment:
``` bash
python3 -m venv ~/nlds-client
```

3. Install the nlds-client package with editing capability:
`pip install -e ~/Coding/nlds-client`
2. Activate your new virtual environment:
``` bash
source ~/nlds-client/bin/activate
```

Using
-----
3. Install the nlds-client package directly from this github repo:
``` bash
pip install git+https://github.com/cedadev/nlds-client.git
```

Using the pip install method, an alias for the NLDS client is created: `nlds`.
This can be used to `PUT` and `GET` files and filelists to the NLDS server.
If installing this repository for development we recommend you install with the
editable flag (`-e`).

Config
------
NLDS client requires a config file in the user's home directory: `~/.nlds-config`. This file contains information about the JASMIN infrastructure and so is not included in the GitHub repository. A Jinja-2 template is included in the `nlds_client/templates/nlds-config.j2` file.
NLDS client requires a config file in the user's home directory: `~/.nlds-config`. This file contains information about the JASMIN infrastructure and so is not included in the GitHub repository. A Jinja-2 template is included in the `nlds_client/templates/nlds-config.j2` file.
This config file requires information about the JASMIN OAuth2 authentication server, including the `oauth_client_id`, `oauth_client_secret` and various URLs.
These can be populated by using the command
This config file requires information about the JASMIN OAuth2 authentication server, including the `oauth_client_id`, `oauth_client_secret` and various URLs.
```
nlds init
```
When the NLDS client is first used, the user will be prompted to enter their username and password. This is the username and password for the JASMIN accounts portal.
This will need to be pointed at an appropriate NLDS server url but will default to the one hosted on JASMIN - see the [relevant section](https://cedadev.github.io/nlds-client/configuration.html) of the docs for more details.
When the NLDS client is first used, the user will be prompted to enter their username and password. This is the username and password for the JASMIN accounts portal.
The token is stored at a location defined in the `~/.nlds-config` file with the key `oauth_token_file_location`.
To use the NLDS you will also need to provide a `token` and `secret_key` to access the object store you would like to use as an NLDS cache.
Details for how to do this are in the [documentation](https://cedadev.github.io/nlds-client/configuration.html).
Usage
-----
Upon installation, an alias for the NLDS client is created: `nlds`. This can be used to `PUT` and `GET` files and filelists to and from an NLDS server, as well as a whole host of other commands. See the [documentation](https://cedadev.github.io/nlds-client/tutorial.html) for an extensive tutorial.
A [reference](https://cedadev.github.io/nlds-client/command_ref.html) of all available commands is also available.
Tests
-----
Automatic unit-testing is run with pytest, to manually run the tests for a local development environment, first ensure the appropriate version of pytest is installed in your venv. From the root of the repository, run:
``` bash
pip install -r tests/requirements.txt
```
and then, similarly from the root of the repo, run the tests with:
```bash
pytest
```
Empty file added docs/.nojekyll
Empty file.
8 changes: 8 additions & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

<!DOCTYPE html>

<html lang="en">
<head>
<meta http-equiv="Refresh" content="0; url='user_guide/build/html/index.html'" />
</head>
</html>
4 changes: 4 additions & 0 deletions docs/user_guide/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Sphinx==7.1.2
sphinx-click==5.1.0
sphinx-rtd-theme==2.0.0
-r ../../requirements.txt
175 changes: 175 additions & 0 deletions docs/user_guide/source/catalog_organisation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
.. _catalog_organisation:

Catalog Organisation
====================

When a user PUTs files to the NLDS, the files are recorded in a catalog on
behalf of the user. The user can then list which files they have in the catalog
and also search for files based on a regular expression. Additionally, users
can associate a label and tags, in the form of *key:value* pairs with a file or
collection of files.

*Figure 1* shows a simplified version of the structure of the catalog, with just
the information relevant to the user remaining.

.. figure:: ./simple_catalog.png

Figure 1: Simplified view of the NLDS data-catalog

The terms in figure 1 are explained below:

#. :ref:`Holdings<holding>`
#. :ref:`Transactions<transaction>`
#. :ref:`Tags<tags>`
#. :ref:`File<file>`
#. :ref:`Location<location>`

.. _holding:

Holdings
--------

**Holdings** are collections of files, that the user has chosen to collect
together and assign a label to the collection. A reason to collect files in a
holding might be that they are from the same experiment, or climate model run,
or measuring campaign. Users can give the holding a **label**, but if they do
not then a seemingly random **label** will be assigned to the holding. This is
actually the id of the first **transaction** that created the holding. Users
can change the **label** that a holding has at any time.

**Holdings** are created when a user PUTs a file into the NLDS, using either the
``nlds put`` or ``nlds putlist`` command. These commands take a **label**
argument with the ``-l`` or ``--label`` option. The first time a user PUTs a
file, or list of files, into a **holding**, the **holding** will be created.
If a **label** is specified then the **holding** will be assigned that **label**.
If a **label** is not specified then the seemingly random **label** will be
assigned.

After this, if a user PUTs a file into the NLDS and specifies a **label** for a
**holding** that already exists, then the file will be added to that **holding**.
If the **holding** with the specified **label** does not exist then the file
will be added to a new **holding**. This leads to the behaviour that, if a
**label** is not specified when PUTting a file (or list of files) into the NLDS,
a new **holding** will be created for each file (or list of files).

Reading this, you may ask the question "What happens if I add a file that
already exists in the NLDS?". This is a good question, and a number of rules
cover it:

1. The ``original_path`` of a file must be unique within a **holding**. An
error is given if a user PUTs a file into a **holding** that already exists and
the file with ``original_path`` already exists in the **holding**.

2. The ``original_path`` does not have to be unique across **holdings**.
Multiple files with the same ``original_path`` can exist in the NLDS, providing
that they belong to different **holdings**, with different **labels**.

3. Users can GET files without specifying which **holding** to get them from,
i.e. the ``-l`` or ``--label`` option is not given when ``nlds get`` or ``nlds
getlist`` commands are invoked. In this case, the newest file is returned.

Organising the catalog in this way means that users can use the NLDS as an
iterative backup solution, by PUTting files into differently labelled
**holdings** at different times. GETting the files will returned the latest
files, while leaving the older files still accessible by specifying the
**holding** **label**.

.. _transaction:

Transactions
------------

**Transactions** record the user's action when PUTting a file into the NLDS.

Check warning on line 82 in docs/user_guide/source/catalog_organisation.rst

View workflow job for this annotation

GitHub Actions / build

undefined label: '_holding'
As alluded to above, in the :ref:`_holding` section, each **holding** can contain
numerous **transactions**. A **transaction** is created every time a user PUTs
a single file, or list of files, into the NLDS. This **transaction** is assigned
to a holding based on the **label** supplied by the user. If a **label** is
specified for a number of PUT actions, then the **holding** with that label will
contain all the **transactions** arising from the PUT actions.

A **transaction** contains very little information itself, but its place in the
catalog hierarchy is important. As can be seen in figure 2, it contains a list
of **files** and it belongs to a **holding**. This is the mapping that allows
users to add files to **holdings** iteratively and at different times. For
example, a user may PUT the files ``file_1``, ``file_2`` and ``file_3`` into the
**holding** with ``backup_1`` **label** on the 23rd Dec 2023. The user may then
PUT ``file_4``, ``file_5`` and ``file_6`` into the same **holding** on the 4th
Jan 2024, by specifying the label ``backup_1``. This will have the effect of
creating two **transactions** - one containing ``file_1``, ``file_2`` and ``file_3``
and the other containing ``file_4``, ``file_5`` and ``file_6``, with the
``backup_1`` **holding** containing both **transactions**. Therefore, all **files**
(``file_1`` through to ``file_6``) are associated with the ``backup_1``
**holding** at particular ``ingest_times``.

If, at a later time, the user puts ``file_1`` to ``file_6`` into
another **holding** with a **label** of ``backup_2`` then another
**transaction** will be created with a later ``ingest_time`` and the **files**
will be associated with the **transaction** and the ``backup_2`` **holding**.
The **files** may have changed in the interim and, therefore, the **files**
with the same filenames may be different in ``backup_2`` than they are in
``backup_1``. This is the mechanism by which NLDS allows users to perform
iterative backups and how users can get the latest files, via the ``ingest_time``.

.. _tags:

Tags
----

NLDS allows the user to associate **tags** with a **holding**, in a
``key:value`` format. For example, a series of **holding** could have **tags**
with the ``key`` as ``experiment`` and ``value`` as the experiment name or
number.

A **holding** can contain numerous **tags** and these are in addition to the
**holdings** **label**. **Tags** can be used for searching for files in the
``list`` and ``find`` commands.

.. _file:

File
----

The very purpose of NLDS is the long term storage of **files**, recording their
details in a data catalog and then accessing (GETting) them when they are
required. The **file** object in the data catalog records the details of a
single **file**, including the original path of the file, its size and the
ownership and permissions of the file. Users can GET files in a number of ways,
including by using just the ``original_path`` where the NLDS will return the
most recent file with that path.

Also associated with **files** is the checksum of the file. NLDS supports
different methods of calculating checksums, and so more than one checksum can
be associated with a single file.

.. _location:

Location
--------

The user interacts with the NLDS by PUTting and GETting **files**, without knowing
(or caring) where those **files** are stored. From a user view, the **files** are
stored in the NLDS. In reality the NLDS first writes the **files** to *object
storage*. Later the **files** are backed up to *tape storage*. When the NLDS
*object storage* approaches capacity, **files** will be removed from the
*object storage* depending on a policy which takes into account several variables,
including when they were last accessed. If a user subsequently GETs a **file**
that has removed from the *object storage* then the NLDS will first retrive
the **file** from the *tape storage* to the *object storage* before copying it
to the user specified target.

The **location** object in the Catalog database is associated to a file, and
can have one of three states:

1. The **file** is held on the *object storage* only. It will be backed up
to the *tape storage* later.

2. The **file** is held on both the *object storage* and *tape storage*. Users
can access the file without any staging required by the NLDS.

3. The **file** is held on the *tape storage* only. If a user accesses the
**file** then the NLDS will *stage* it to the *tape storage*, before completing
the GET on behalf of the user. The user does not need to concern themselves
with the details of this. However, accessing a file that is stored only on
*tape* will take longer than if it was held on *object storage*.


33 changes: 33 additions & 0 deletions docs/user_guide/source/command_ref.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. _command-ref:

Command Line Reference
======================

The primary method of interacting with the Near-Line Data Store is through a
command line client, which can be installed using the instructions.

Users must specify a command to the ``nlds`` and options and arguments for that
command.

``nlds [OPTIONS] COMMAND [ARGS]...``

As an overview the commands are:

Commands:
| ``find Find and list files.``
| ``get Get a single file.``
| ``getlist Get a number of files specified in a list.``
| ``init Set up the nlds client with a config file on first use.``
| ``list List holdings.``
| ``meta Alter metadata for a holding.``
| ``put Put a single file.``
| ``putlist Put a number of files specified in a list.``
| ``stat List transactions.``
Each command has its own specific options. The argument is generally the file
or filelist that the user wishes to operate on. The full command listing is
given below.

.. click:: nlds_client.nlds_client:nlds_client
:prog: nlds
:nested: full
10 changes: 5 additions & 5 deletions docs/user_guide/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'Near-Line Data Store client'
copyright = '2023, Neil Massey and Jack Leland'
project = 'Near-Line Data Store'
copyright = '2023, Centre for Environmental Data Analysis, Science and Technologies Facilities Council, UK Research and Innovation'
author = 'Neil Massey and Jack Leland'
release = '0.1.0'
version = '0.1.1'
release = '0.1.1-RC1'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand All @@ -24,5 +24,5 @@
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'alabaster'
html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']
Loading

0 comments on commit 732ae53

Please sign in to comment.