diff --git a/.github/workflows/sphinx.yml b/.github/workflows/sphinx.yml new file mode 100644 index 0000000..1212193 --- /dev/null +++ b/.github/workflows/sphinx.yml @@ -0,0 +1,33 @@ +name: Sphinx build + +on: + push: + branches: [ "main", "development" ] + pull_request: + branches: [ "main" ] + + +jobs: + build: + runs-on: ubuntu-latest + steps: + # Checkout and build the docs with sphinx + - uses: actions/checkout@v2 + - name: Build HTML + uses: ammaraskar/sphinx-action@master + with: + docs-folder: "docs/user_guide" + # pre-build-command: "mkdir /tmp/sphinx-log" + - name: Upload artifacts + uses: actions/upload-artifact@v1 + with: + name: html-docs + path: docs/user_guide/build/html/ + # Deploys to the gh-pages branch if the commit was made to main, the + # gh-pages then takes over serving the html + - name: Deploy + uses: peaceiris/actions-gh-pages@v3 + if: github.ref == 'refs/heads/main' + with: + github_token: ${{ secrets.GITHUB_TOKEN }} + publish_dir: docs/build/html \ No newline at end of file diff --git a/.gitignore b/.gitignore index f8ffcad..806b931 100644 --- a/.gitignore +++ b/.gitignore @@ -3,4 +3,7 @@ *egg-info .venv/ tests/test_dir* -build/ \ No newline at end of file +**/.DS_Store + +build/ +docs/user_guide/build/ \ No newline at end of file diff --git a/docs/.nojekyll b/docs/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..0368041 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,8 @@ + + + + + + + + \ No newline at end of file diff --git a/docs/user_guide/requirements.txt b/docs/user_guide/requirements.txt new file mode 100644 index 0000000..08bbf46 --- /dev/null +++ b/docs/user_guide/requirements.txt @@ -0,0 +1,4 @@ +Sphinx==7.2.6 +sphinx-click==5.1.0 +sphinx-rtd-theme==2.0.0 +-r ../../requirements.txt \ No newline at end of file diff --git a/docs/user_guide/source/catalog_organisation.rst b/docs/user_guide/source/catalog_organisation.rst new file mode 100644 index 0000000..d63a73c --- /dev/null +++ b/docs/user_guide/source/catalog_organisation.rst @@ -0,0 +1,175 @@ +.. _catalog_organisation: + +Catalog Organisation +==================== + +When a user PUTs files to the NLDS, the files are recorded in a catalog on +behalf of the user. The user can then list which files they have in the catalog +and also search for files based on a regular expression. Additionally, users +can associate a label and tags, in the form of *key:value* pairs with a file or +collection of files. + +*Figure 1* shows a simplified version of the structure of the catalog, with just +the information relevant to the user remaining. + +.. figure:: ./simple_catalog.png + + Figure 1: Simplified view of the NLDS data-catalog + +The terms in figure 1 are explained below: + +#. :ref:`Holdings` +#. :ref:`Transactions` +#. :ref:`Tags` +#. :ref:`File` +#. :ref:`Location` + +.. _holding: + +Holdings +-------- + +**Holdings** are collections of files, that the user has chosen to collect +together and assign a label to the collection. A reason to collect files in a +holding might be that they are from the same experiment, or climate model run, +or measuring campaign. Users can give the holding a **label**, but if they do +not then a seemingly random **label** will be assigned to the holding. This is +actually the id of the first **transaction** that created the holding. Users +can change the **label** that a holding has at any time. + +**Holdings** are created when a user PUTs a file into the NLDS, using either the +``nlds put`` or ``nlds putlist`` command. These commands take a **label** +argument with the ``-l`` or ``--label`` option. The first time a user PUTs a +file, or list of files, into a **holding**, the **holding** will be created. +If a **label** is specified then the **holding** will be assigned that **label**. +If a **label** is not specified then the seemingly random **label** will be +assigned. + +After this, if a user PUTs a file into the NLDS and specifies a **label** for a +**holding** that already exists, then the file will be added to that **holding**. +If the **holding** with the specified **label** does not exist then the file +will be added to a new **holding**. This leads to the behaviour that, if a +**label** is not specified when PUTting a file (or list of files) into the NLDS, +a new **holding** will be created for each file (or list of files). + +Reading this, you may ask the question "What happens if I add a file that +already exists in the NLDS?". This is a good question, and a number of rules +cover it: + +1. The ``original_path`` of a file must be unique within a **holding**. An +error is given if a user PUTs a file into a **holding** that already exists and +the file with ``original_path`` already exists in the **holding**. + +2. The ``original_path`` does not have to be unique across **holdings**. +Multiple files with the same ``original_path`` can exist in the NLDS, providing +that they belong to different **holdings**, with different **labels**. + +3. Users can GET files without specifying which **holding** to get them from, +i.e. the ``-l`` or ``--label`` option is not given when ``nlds get`` or ``nlds +getlist`` commands are invoked. In this case, the newest file is returned. + +Organising the catalog in this way means that users can use the NLDS as an +iterative backup solution, by PUTting files into differently labelled +**holdings** at different times. GETting the files will returned the latest +files, while leaving the older files still accessible by specifying the +**holding** **label**. + +.. _transaction: + +Transactions +------------ + +**Transactions** record the user's action when PUTting a file into the NLDS. +As alluded to above, in the :ref:`_holding` section, each **holding** can contain +numerous **transactions**. A **transaction** is created every time a user PUTs +a single file, or list of files, into the NLDS. This **transaction** is assigned +to a holding based on the **label** supplied by the user. If a **label** is +specified for a number of PUT actions, then the **holding** with that label will +contain all the **transactions** arising from the PUT actions. + +A **transaction** contains very little information itself, but its place in the +catalog hierarchy is important. As can be seen in figure 2, it contains a list +of **files** and it belongs to a **holding**. This is the mapping that allows +users to add files to **holdings** iteratively and at different times. For +example, a user may PUT the files ``file_1``, ``file_2`` and ``file_3`` into the +**holding** with ``backup_1`` **label** on the 23rd Dec 2023. The user may then +PUT ``file_4``, ``file_5`` and ``file_6`` into the same **holding** on the 4th +Jan 2024, by specifying the label ``backup_1``. This will have the effect of +creating two **transactions** - one containing ``file_1``, ``file_2`` and ``file_3`` +and the other containing ``file_4``, ``file_5`` and ``file_6``, with the +``backup_1`` **holding** containing both **transactions**. Therefore, all **files** +(``file_1`` through to ``file_6``) are associated with the ``backup_1`` +**holding** at particular ``ingest_times``. + +If, at a later time, the user puts ``file_1`` to ``file_6`` into +another **holding** with a **label** of ``backup_2`` then another +**transaction** will be created with a later ``ingest_time`` and the **files** +will be associated with the **transaction** and the ``backup_2`` **holding**. +The **files** may have changed in the interim and, therefore, the **files** +with the same filenames may be different in ``backup_2`` than they are in +``backup_1``. This is the mechanism by which NLDS allows users to perform +iterative backups and how users can get the latest files, via the ``ingest_time``. + +.. _tags: + +Tags +---- + +NLDS allows the user to associate **tags** with a **holding**, in a +``key:value`` format. For example, a series of **holding** could have **tags** +with the ``key`` as ``experiment`` and ``value`` as the experiment name or +number. + +A **holding** can contain numerous **tags** and these are in addition to the +**holdings** **label**. **Tags** can be used for searching for files in the +``list`` and ``find`` commands. + +.. _file: + +File +---- + +The very purpose of NLDS is the long term storage of **files**, recording their +details in a data catalog and then accessing (GETting) them when they are +required. The **file** object in the data catalog records the details of a +single **file**, including the original path of the file, its size and the +ownership and permissions of the file. Users can GET files in a number of ways, +including by using just the ``original_path`` where the NLDS will return the +most recent file with that path. + +Also associated with **files** is the checksum of the file. NLDS supports +different methods of calculating checksums, and so more than one checksum can +be associated with a single file. + +.. _location: + +Location +-------- + +The user interacts with the NLDS by PUTting and GETting **files**, without knowing +(or caring) where those **files** are stored. From a user view, the **files** are +stored in the NLDS. In reality the NLDS first writes the **files** to *object +storage*. Later the **files** are backed up to *tape storage*. When the NLDS +*object storage* approaches capacity, **files** will be removed from the +*object storage* depending on a policy which takes into account several variables, +including when they were last accessed. If a user subsequently GETs a **file** +that has removed from the *object storage* then the NLDS will first retrive +the **file** from the *tape storage* to the *object storage* before copying it +to the user specified target. + +The **location** object in the Catalog database is associated to a file, and +can have one of three states: + +1. The **file** is held on the *object storage* only. It will be backed up +to the *tape storage* later. + +2. The **file** is held on both the *object storage* and *tape storage*. Users +can access the file without any staging required by the NLDS. + +3. The **file** is held on the *tape storage* only. If a user accesses the +**file** then the NLDS will *stage* it to the *tape storage*, before completing +the GET on behalf of the user. The user does not need to concern themselves +with the details of this. However, accessing a file that is stored only on +*tape* will take longer than if it was held on *object storage*. + + diff --git a/docs/user_guide/source/command_ref.rst b/docs/user_guide/source/command_ref.rst new file mode 100644 index 0000000..f32f7e5 --- /dev/null +++ b/docs/user_guide/source/command_ref.rst @@ -0,0 +1,30 @@ +Command Line Reference +====================== + +The primary method of interacting with the Near-Line Data Store is through a +command line client, which can be installed using the instructions. + +Users must specify a command to the ``nlds`` and options and arguments for that +command. + +``nlds [OPTIONS] COMMAND [ARGS]...`` + +As an overview the commands are: + +Commands: + | ``find Find and list files.`` + | ``get Get a single file.`` + | ``getlist Get a number of files specified in a list.`` + | ``list List holdings.`` + | ``meta Alter metadata for a holding.`` + | ``put Put a single file.`` + | ``putlist Put a number of files specified in a list.`` + | ``stat List transactions.`` + +Each command has its own specific options. The argument is generally the file +or filelist that the user wishes to operate on. The full command listing is +given below. + +.. click:: nlds_client.nlds_client:nlds_client + :prog: nlds + :nested: full \ No newline at end of file diff --git a/docs/user_guide/source/conf.py b/docs/user_guide/source/conf.py index a432b94..5dbce95 100644 --- a/docs/user_guide/source/conf.py +++ b/docs/user_guide/source/conf.py @@ -5,11 +5,11 @@ # -- Project information ----------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information - -project = 'Near-Line Data Store client' -copyright = '2023, Neil Massey and Jack Leland' +project = 'Near-Line Data Store' +copyright = '2023, Centre for Environmental Data Analysis, Science and Technologies Facilities Council, UK Research and Innovation' author = 'Neil Massey and Jack Leland' -release = '0.1.0' +version = '0.1.1' +release = '0.1.1-RC1' # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration @@ -24,5 +24,5 @@ # -- Options for HTML output ------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output -html_theme = 'alabaster' +html_theme = 'sphinx_rtd_theme' html_static_path = ['_static'] diff --git a/docs/user_guide/source/configuration.rst b/docs/user_guide/source/configuration.rst new file mode 100644 index 0000000..c09c8ab --- /dev/null +++ b/docs/user_guide/source/configuration.rst @@ -0,0 +1,56 @@ +.. _configuration: + +Configuration File +================== + +When the user first invokes ``nlds`` from the command line or issues a command +from the ``nlds_client.clientlib`` API, a configuration file is required in the +user's home directory with the path: + +``~/.nlds-config`` + +This configuration file is JSON formatted and contains the authentication +credentials required by: + + * The OAuth server + * The Object Storage + +It also contains the default user and group to use when issuing a request to the +NLDS. These can be overriden by the ``-u|--user`` and ``-g|--group`` command +line options. + +Finally, it contains the URL of the server and the API version, and the location +of the OAuth token file that is also created the first time the ``nlds`` command +is invoked. + +An example configuration file is shown below. Authentication details have been +redacted. You will have to contact the service provider to gain these +credentials. + +:: + + { + "server" : { + "url" : "{{ nlds_api_url }}", + "api" : "{{ nlds_api_version }}" + }, + "user" : { + "default_user" : "{{ user_name }}", + "default_group" : "{{ user_gws }}" + }, + "authentication" : { + "oauth_client_id" : "{{ oauth_client_id }}", + "oauth_client_secret" : "{{ oauth_client_secret }}", + "oauth_token_url" : "{{ oauth_token_url }}", + "oauth_scopes" : "{{ oauth_scopes }}"", + "oauth_token_file_location" : "~/.nlds-token" + }, + "object_storage" : { + "access_key" : "{{ object_store_access_key }}", + "secret_key" : "{{ object_store_secret_key }}" + + }, + "option" : { + "resolve_filenames" : "false" + } + } diff --git a/docs/user_guide/source/index.rst b/docs/user_guide/source/index.rst new file mode 100644 index 0000000..30a1e81 --- /dev/null +++ b/docs/user_guide/source/index.rst @@ -0,0 +1,41 @@ +.. Near-Line Data Store client documentation master file, created by + sphinx-quickstart on Thu Feb 2 16:17:53 2023. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +Near-Line Data Store documentation +================================== + +The Near-Line Data Store (NLDS) is a multi-tiered storage solution that uses +Object Storage as a front end to a tape library. It catalogs the data as it is +ingested and permits multiple versions of files. It has a microservice +architecture using a message broker to communicate between the parts. +Interaction with NLDS is via a HTTP API, with a Python library and command-line +client provided to users for programmatic or interactive use. + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + installation.rst + configuration.rst + catalog_organisation.rst + tutorial.rst + status_codes.rst + command_ref.rst + license.rst + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` + +NLDS was developed at the `Centre for Environmental Data Analysis `_ +with support from the ESiWACE2 project. The project ESiWACE2 has received +funding from the European Union's Horizon 2020 research and innovation programme +under grant agreement No 823988. + +NLDS is Open-Source software with a BSD-2 Clause License. The license can be +read :ref:`here `. \ No newline at end of file diff --git a/docs/user_guide/source/installation.rst b/docs/user_guide/source/installation.rst new file mode 100644 index 0000000..79bb65d --- /dev/null +++ b/docs/user_guide/source/installation.rst @@ -0,0 +1,23 @@ +.. |br| raw:: html + +
+ +.. _installation: + +Installation +============ +To use NLDS, first you must install the client software. This guide will show +you how to install it into a Python virtual-environment (virtualenv) in your +user space or home directory. + +#. log onto the machine where you wish to install the JDMA client into your + user space or home directory. + +#. Create a Python virtual environment: |br| + ``python3 -m venv ~/nlds-client`` + +#. Activate the nlds-client: |br| + ``source ~/nlds-client/bin/activate`` + +#. Install the nlds-client package from github: |br| + ``pip install git+https://github.com/cedadev/nlds-client.git@0.1.1#egg=nlds-client`` diff --git a/docs/user_guide/source/license.rst b/docs/user_guide/source/license.rst new file mode 100644 index 0000000..c819e08 --- /dev/null +++ b/docs/user_guide/source/license.rst @@ -0,0 +1,35 @@ +.. _license: + +NLDS License +------------ + +NLDS is Open Source software made available under a BSD 2-Clause License. + +BSD 2-Clause License +==================== + +Copyright (c) 2019-2023, Centre of Environmental Data Analysis Developers, +Scientific and Technical Facilities Council (STFC), +UK Research and Innovation (UKRI). +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +* Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. \ No newline at end of file diff --git a/docs/user_guide/source/simple_catalog.png b/docs/user_guide/source/simple_catalog.png new file mode 100644 index 0000000..c225493 Binary files /dev/null and b/docs/user_guide/source/simple_catalog.png differ diff --git a/docs/user_guide/source/simple_catalog.puml b/docs/user_guide/source/simple_catalog.puml new file mode 100644 index 0000000..36bcf30 --- /dev/null +++ b/docs/user_guide/source/simple_catalog.puml @@ -0,0 +1,37 @@ +@startuml simple_catalog + +object "**Holding**" as holding { + label [STRING](unique with user) +} + +object "**Transaction**" as transaction { + ingest_time [DATETIME] +} + +object "**Tag**" as tag { + key [STRING] + value [STRING] +} + +object "**File**" as file { + original_path [STRING] + size [INT] + user [STRING] + group [STRING] + file_permissions [INT] +} + +object "**Location**" as location { + storage_type [OBJECT_STORAGE|TAPE] + access_time [DATETIME] +} + +object "**Checksum**" as checksum { +} + +transaction "1" *-- "many" file +holding "1" *-- "many" transaction +holding "1" *-- "many" tag +file "1" *-- "many" location +file "1" *-- "many" checksum +@enduml diff --git a/docs/user_guide/source/status_codes.rst b/docs/user_guide/source/status_codes.rst new file mode 100644 index 0000000..5b4295b --- /dev/null +++ b/docs/user_guide/source/status_codes.rst @@ -0,0 +1,54 @@ +.. _status_codes: + +Status Codes +============ + +When the NLDS is asked to PUT or GET some data, the transaction goes through a +number of states from initialisation to completion. +The state of a transaction can be queried by using the ``nlds stat`` command +in the ``nlds_client``. + +Transaction states +------------------ + ++------+----------------------------+------------------------------------------+ +|``-1``|``'INITIALISING'`` | Transaction is starting | +| | | | ++------+----------------------------+------------------------------------------+ +|``0`` |``'ROUTING'`` | Transaction is in the message queue | +| | | | ++------+----------------------------+------------------------------------------+ +|``1`` |``'SPLITTING'`` | Transaction is being split into smaller | +| | | transactions. The user doesn't have to | +| | | worry about this. | ++------+----------------------------+------------------------------------------+ +|``2`` |``'INDEXING'`` | Scanning the files and directories in | +| | | the PUT transaction. | ++------+----------------------------+------------------------------------------+ +|``3`` |``'CATALOG_PUTTING'`` | Recording the scanned files into the | +| | | catalog entry for the transaction. | ++------+----------------------------+------------------------------------------+ +|``4`` |``'TRANSFER_PUTTING'`` | Putting the files to the NLDS, actually | +| | | transferring the data. | ++------+----------------------------+------------------------------------------+ +|``5`` |``'CATALOG_ROLLBACK'`` | Remove, from the catalog, any | +| | | inaccessible files in the transaction. | ++------+----------------------------+------------------------------------------+ +|``6`` |``'CATALOG_GETTING'`` | Get a catalog entry. | +| | | | ++------+----------------------------+------------------------------------------+ +|``7`` |``'TRANSFER_GETTING'`` | Getting the files from the NLDS. | +| | | | ++------+----------------------------+------------------------------------------+ +|``8`` |``'COMPLETE'`` | Transaction has completed successfully. | +| | | | ++------+----------------------------+------------------------------------------+ +|``9`` |``'FAILED'`` | Transaction has failed completely. | +| | | | ++------+----------------------------+------------------------------------------+ +|``10``|``'COMPLETE_WITH_ERRORS'`` | Transaction has completed, but with some| +| | | errors. | ++------+----------------------------+------------------------------------------+ +|``11``|``'COMPLETE_WITH_WARNINGS'``| Transaction has completed, but with some| +| | | warnings | ++------+----------------------------+------------------------------------------+ \ No newline at end of file diff --git a/docs/user_guide/source/tutorial.rst b/docs/user_guide/source/tutorial.rst new file mode 100644 index 0000000..8c9d54f --- /dev/null +++ b/docs/user_guide/source/tutorial.rst @@ -0,0 +1,942 @@ +Tutorial +======== + +This page is a tutorial on NLDS covering: + +* :ref:`Introduction to the NLDS ` +* :ref:`Setting up the NLDS client ` +* :ref:`Running the NLDS client for the first time ` +* :ref:`How the NLDS data catalog is organised ` +* :ref:`Getting help on the NLDS commands ` +* :ref:`Copying a single file (PUT) to the NLDS ` +* :ref:`Copying a list of files (PUTLIST) to the NLDS ` +* :ref:`Querying the status of a transaction (STAT) ` +* :ref:`Querying the file collections the user holds on the NLDS (LIST) ` +* :ref:`Querying the files the user holds on the NLDS (FIND) ` +* :ref:`Changing the label of a file collection (META)