Skip to content

Commit

Permalink
📝 update readmes (esp. new config variables)
Browse files Browse the repository at this point in the history
  • Loading branch information
bertsky committed Aug 28, 2024
1 parent abe069a commit 11f9264
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 10 deletions.
40 changes: 31 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,12 @@ complete stack of OCR-D-related software.

The easiest way to install is via `pip`:

```sh
pip install ocrd
pip install ocrd

# or just the functionality you need, e.g.

pip install ocrd_modelfactory
```

All Python software released by [OCR-D](https://github.com/OCR-D) requires Python 3.8 or higher.

**NOTE** Some OCR-D-Tools (or even test cases) _might_ reveal an unintended behavior if you have specific environment modifications, like:
> **NOTE** Some OCR-D tools (or even test cases) _might_ reveal an unintended behavior if you have specific environment modifications, like:
* using a custom build of [ImageMagick](https://github.com/ImageMagick/ImageMagick), whose format delegates are different from what OCR-D supposes
* custom Python logging configurations in your personal account

Expand All @@ -82,7 +77,6 @@ Almost all behaviour of the OCR-D/core software is configured via CLI options an

Some parts of the software are configured via environment variables:

* `OCRD_METS_CACHING`: If set to `true`, access to the METS file is cached, speeding in-memory search and modification.
* `OCRD_PROFILE`: This variable configures the built-in CPU and memory profiling. If empty, no profiling is done. Otherwise expected to contain any of the following tokens:
* `CPU`: Enable CPU profiling of processor runs
* `RSS`: Enable RSS memory profiling
Expand All @@ -95,18 +89,46 @@ Some parts of the software are configured via environment variables:
* `XDG_CONFIG_HOME`: Directory to look for `./ocrd/resources.yml` (i.e. `ocrd resmgr` user database) – defaults to `$HOME/.config`.
* `XDG_DATA_HOME`: Directory to look for `./ocrd-resources/*` (i.e. `ocrd resmgr` data location) – defaults to `$HOME/.local/share`.

* `OCRD_DOWNLOAD_RETRIES`: Number of times to retry failed attempts for downloads of workspace files.
* `OCRD_DOWNLOAD_RETRIES`: Number of times to retry failed attempts for downloads of resources or workspace files.
* `OCRD_DOWNLOAD_TIMEOUT`: Timeout in seconds for connecting or reading (comma-separated) when downloading.

* `OCRD_MISSING_INPUT`: How to deal with missing input files (for some fileGrp/pageId) during processing:
* `SKIP`: ignore and proceed with next page's input
* `ABORT`: throw `MissingInputFile` exception

* `OCRD_MISSING_OUTPUT`: How to deal with missing output files (for some fileGrp/pageId) during processing:
* `SKIP`: ignore and proceed processing next page
* `COPY`: fall back to copying input PAGE to output fileGrp for page
* `ABORT`: re-throw whatever caused processing to fail

* `OCRD_MAX_MISSING_OUTPUTS`: Maximal rate of skipped/fallback pages among all processed pages before aborting (decimal fraction, ignored if negative).

* `OCRD_EXISTING_OUTPUT`: How to deal with already existing output files (for some fileGrp/pageId) during processing:
* `SKIP`: ignore and proceed processing next page
* `OVERWRITE`: force writing result to output fileGrp for page
* `ABORT`: re-throw `FileExistsError` exception


* `OCRD_METS_CACHING`: Whether to enable in-memory storage of OcrdMets data structures for speedup during processing or workspace operations.

* `OCRD_MAX_PROCESSOR_CACHE`: Maximum number of processor instances (for each set of parameters) to be kept in memory (including loaded models) for processing workers or processor servers.

* `OCRD_MAX_PARALLEL_PAGES`: Maximum number of processor threads for page-parallel processing (within each Processor's selected page range, independent of the number of Processing Workers or Processor Servers). If set `>1`, then a METS Server must be used for METS synchronisation.

* `OCRD_PROCESSING_PAGE_TIMEOUT`: Timeout in seconds for processing a single page. If set >0, when exceeded, the same as OCRD_MISSING_OUTPUT applies.

* `OCRD_NETWORK_SERVER_ADDR_PROCESSING`: Default address of Processing Server to connect to (for `ocrd network client processing`).
* `OCRD_NETWORK_SERVER_ADDR_WORKFLOW`: Default address of Workflow Server to connect to (for `ocrd network client workflow`).
* `OCRD_NETWORK_SERVER_ADDR_WORKSPACE`: Default address of Workspace Server to connect to (for `ocrd network client workspace`).
* `OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS`: Number of attempts for a worker to create its queue. Helpful if the rabbitmq-server needs time to be fully started.

* `OCRD_NETWORK_CLIENT_POLLING_SLEEP`: How many seconds to sleep before trying `ocrd network client` again.
* `OCRD_NETWORK_CLIENT_POLLING_TIMEOUT`: Timeout for a blocking `ocrd network client` (in seconds).

* `OCRD_NETWORK_SOCKETS_ROOT_DIR`: The root directory where all mets server related socket files are created.
* `OCRD_NETWORK_LOGS_ROOT_DIR`: The root directory where all ocrd_network related file logs are stored.



## Packages

Expand Down
10 changes: 9 additions & 1 deletion README_bashlib.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ For example:
* [`ocrd__log`](#ocrd__log)
* [`ocrd__minversion`](#ocrd__minversion)
* [`ocrd__dumpjson`](#ocrd__dumpjson)
* [`ocrd__resolve_resource`](#ocrd__resolve_resource)
* [`ocrd__show_resource`](#ocrd__show_resource)
* [`ocrd__list_resources`](#ocrd__list_resources)
* [`ocrd__usage`](#ocrd__usage)
* [`ocrd__parse_argv`](#ocrd__parse_argv)
<!-- END-MARKDOWN-TOC -->
Expand Down Expand Up @@ -56,6 +59,10 @@ export OCRD_TOOL_NAME=ocrd-foo-bar

(Which you automatically get from [`ocrd__wrap`](#ocrd__wrap).)

### `ocrd__resolve_resource`

Output given resource file's path.

### `ocrd__show_resource`

Output given resource file's content.
Expand Down Expand Up @@ -88,14 +95,15 @@ This will be filled by the parser along the following keys:
- `profile`: whether `--profile` is enabled
- `profile_file`: the argument of `--profile-file`
- `log_level`: the argument of `--log-level`
- `mets_server_url`: the argument of `--mets-server-url` argument
- `mets_file`: absolute path of the `--mets` argument
- `working_dir`: absolute path of the `--working-dir` argument or the parent of `mets_file`
- `page_id`: the argument of `--page-id`
- `input_file_grp`: the argument of `--input-file-grp`
- `output_file_grp`: the argument of `--output-file-grp`

Moreover, there will be an associative array **`params`**
with the fully expanded runtime values of the ocrd-tool.json parameters.
with the fully validated and default-expanded runtime values of the `ocrd-tool.json` parameters.

### `ocrd__wrap`

Expand Down

0 comments on commit 11f9264

Please sign in to comment.