From f4c08dc49f4451647c5170eadd35a18c4ecbd647 Mon Sep 17 00:00:00 2001
From: Josh Wong <23216828+josh-wong@users.noreply.github.com>
Date: Fri, 6 Sep 2024 17:46:22 +0900
Subject: [PATCH] Delete doc
---
docs/scalardb-data-loader/easy-to-start.mdx | 29 --
.../getting-started-export.mdx | 54 ----
.../getting-started-import.mdx | 285 ------------------
3 files changed, 368 deletions(-)
delete mode 100644 docs/scalardb-data-loader/easy-to-start.mdx
delete mode 100644 docs/scalardb-data-loader/getting-started-export.mdx
delete mode 100644 docs/scalardb-data-loader/getting-started-import.mdx
diff --git a/docs/scalardb-data-loader/easy-to-start.mdx b/docs/scalardb-data-loader/easy-to-start.mdx
deleted file mode 100644
index 5a42e2a9..00000000
--- a/docs/scalardb-data-loader/easy-to-start.mdx
+++ /dev/null
@@ -1,29 +0,0 @@
-# ScalarDB Data Loader API
-
-This document describes how to get started with the ScalarDB data loader API.
-
-## Start ScalarDB Data Loader API
-
-* Clone the `scalardb-data-loader` repository
-```
-git clone https://github.com/scalar-labs/scalardb-data-loader.git
-```
-
-* Navigate to the `scalardb-data-loader` directory
-
-* Update the following properties in the `fixtures/test/conf/application.yml` file with appropriate cloud credentials
- * storage.jclouds.provider
- * storage.jclouds.identity
- * storage.jclouds.credential
- * storage.jclouds.container
-
-* Build the `scalardb-data-loader-api` docker image
-```
-./gradlew :api:docker
-```
-
-* Start the `scalardb-data-loader-api`
-```
-cd fixtures/test
-docker-compose up
-```
diff --git a/docs/scalardb-data-loader/getting-started-export.mdx b/docs/scalardb-data-loader/getting-started-export.mdx
deleted file mode 100644
index 958670b7..00000000
--- a/docs/scalardb-data-loader/getting-started-export.mdx
+++ /dev/null
@@ -1,54 +0,0 @@
-# Getting started with Export
-
-This document explains how you can get started with the ScalarDB Data Loader Export function.
-
-## Features
-
-The ScalarDB Data Loader allows you to export data in the following formats:
-
-- JSON
-- JSONLines
-- CSV
-
-Each export will run a ScalarDB scan operation based on the provided CLI arguments when running data loader.
-
-## Usage
-
-The data loader export function can be started with the following minimal configuration:
-
-```console
-./scalardb-data-loader export --config scalardb.properties --namespace namespace --table tableName
-```
-
-
-
-- --config: the path to the scalardb connection properties file
-- --namespace: the namespace of the table that contains the data
-- --table: name of the table that contains the data
-
-By default, the data loader will create the output file in the working directory if the `--output-file` argument is omitted as well.
-
-### Command-line flags
-
-Here is a list of flags (options) that can be used with the scalardb data loader.
-
-| Flag | Description | Usage |
-| ----------------- | ------------------------------------------------------------ | ------------------------------------------------------ |
-| --config | The path to the scalardb.properties file. If omitted the tool looks for a file named `scalardb.properties` in the current folder | `scalardb-data-loader --config scalardb.properties` |
-| --namespace | Namespace to export table data from. Required. | `scalardb-data-loader --namespace namespace` |
-| --table | Name of table to export data from. Required. | `scalardb-data-loader --table tableName` |
-| --key | Export data of specific Partition key. By default, it exports all data from the specified table. | `scalardb-data-loader --key columnName=value` |
-| --sort | Specify a column to sort on. The column needs to be a clustering key. The argument can be repeated to provide multiple sortings. This flag is only applicable to `--key`. | `scalardb-data-loader --sort columnName=desc` |
-| --projection | Limit the columns that are exported by providing a projection. The argument can be repeated to provide multiple projections. | `scalardb-data-loader --projection columnName` |
-| --start | Clustering key to mark scan start. This flag is only applicable to `--key`. | `scalardb-data-loader --start columnName=value` |
-| --start-exclusive | Is the scan start exclusive or not. If omitted, the default value is `false`. This flag is only applicable to `--key` | `scalardb-data-loader --start-exclusive` |
-| --end | Clustering key to mark scan end. This flag is only applicable to `--key`. | `scalardb-data-loader --end columnName=value` |
-| --end-exclusive | Is the scan start exclusive or not. If omitted, the default value is `false`. This flag is only applicable to `--key` | `scalardb-data-loader --end-exclusive` |
-| --limit | Limit the results of the scan. If omitted, the default value is `0` which means their is no limit. | `scalardb-data-loader --limit 1000` |
-| --output-file | The name and path of the output file. If omitted, the tool will save the file in the current folder with the following name format:
`export_namespace.tableName_timestamp.json` or `export_namespace.tableName_timestamp.csv`
The ouput folder needs to exists. The dataloader does not create the output folder for you. | `scalardb-data-loader --output-file ./out/output.json` |
-| --format | The output format. By default `json` is selected. | `scalardb-data-loader --format json` |
-| --metadata | When set to true the transaction metadata is included in the export. By default this is set to `false` | `scalardb-data-loader --metadata` |
-| --delimiter | The delimiter used in CSV files. Default value is `;` | `scalardb-data-loader --delimiter ;` |
-| --no-headers | Exclude header row in CSV file. Default is `false` | `scalardb-data-loader --no-headers` |
-| --threads | Thread count for concurrent processing | `scalardb-data-loader --threads 500` |
-
diff --git a/docs/scalardb-data-loader/getting-started-import.mdx b/docs/scalardb-data-loader/getting-started-import.mdx
deleted file mode 100644
index 18914c94..00000000
--- a/docs/scalardb-data-loader/getting-started-import.mdx
+++ /dev/null
@@ -1,285 +0,0 @@
-# Getting started with Import
-
-This document explains how you can get started with the ScalarDB Data Loader Import function.
-
-## Features
-
-- Import data from JSON or JSONLines files
-- Automatic data mapping based on source field name mapping
-- Custom Data mapping via a JSON control file
-- Import data from one record or line into multiple tables
-- Support for INSERT, UPDATE and UPSERT
-
-## Usage
-
-The data loader import function can be started with the following minimal configuration:
-
-```console
-./scalardb-data-loader import --config scalardb.properties --namespace namespace --table tableName
-```
-
-The above configuration starts an import process where no control file is used and the data mapping is applied automatically.
-
-
-
-Execute the following steps to successfully import new or existing data
-
-- Prepare a source file containing data that needs to be imported.
-
-- Choose the right import mode. By default, the import is done in `upsert` mode which means that data
- will be inserted if new or updated if the partition key and/or clustering key is found. Other
- options are `insert` mode or `update` mode.
-
-- Find the correct `namespace` and `table` name to import data to.
-
-- Determine if you want to run an `all required columns` check for each data row. If enabled data
- rows with missing columns will be treated as failed and not imported.
-
-- Specify the pathnames for the `success` and `failed` output files. By default the data loader
- creates the files in the working directory.
-
-- When dealing with JSON data, determine if you want the JSON output for the success or failed log files to
- be in `pretty print` or not. By default, this option is disabled for performance
-
-- Optionally specify the `threads` argument to tweak performance
-
-- Run the import from the command line to start importing your data. Make sure to run the ScalarDB Data
- Loader in the correct `storage` or `transaction` mode depending on your running ScalarDB instance.
-
-### Command-line flags
-
-Here is a list of flags (options) that can be used with the data loader:
-
-| Flag | Description | Usage |
-|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|
-| --mode | The mode in which ScalarDB is running. If omitted, the default value is `storage` | `scalardb-data-loader --mode transaction` |
-| --config | the path to the scalardb.properties file. If omitted the tool looks for a file named `scalardb.properties` in the current folder | `scalardb-data-loader --config scalardb.properties` |
-| --namespace | Namespace to export table data from. Required when no control file is provided. | `scalardb-data-loader --namespace namespace` |
-| --table | name of the table to export data from. Required when no control file is provided. | `scalardb-data-loader --table tableName` |
-| --import-mode | Mode to import the data into the ScalarDB table. Supported modes are `insert`, `update` and `upsert`. Optional. Default the value is set to `upsert` | `scalardb-data-loader --import-mode=upsert` |
-| --all-columns-required | If set, data rows cannot be imported if they are missing columns. Optional. By default, the check is not executed. | `scalardb-data-loader --all-columns-required` |
-| --file | Specify the path to the file that will be imported. Required | `scalardb-data-loader --file ` |
-| --success | The path to the file that is created to write the succeed import results to. Both succeed and failed import results will be written to a different file.
Optional. By default, the a new file will be created in the current working directory.
Note: if the file already exists, it will be overridden. | `scalardb-data-loader --success ` |
-| --failed | The path to the file that will be created to write the failed import results to.
Optional. By default, the a new file will be created in the current working directory.
Note: if the file already exists, it will be overridden. | `scalardb-data-loader --failed ` |
-| --threads | Thread count for concurrent processing | `scalardb-data-loader --threads 500` |
-| --format | The format of the import file. `json` and `jsonl` files are supported. Optional, default the value `json` is selected. | `scalardb-data-loader --format json` |
-| --ignore-null | The null values in the source file will be ignored, which means that the existing data will not be overwritten. Optional, default the value is `false`. | `scalardb-data-loader --ignore-null` |
-| --pretty | When set, the output to the success and failed files is done in `pretty print` mode. By default the option is not enabled. | `scalardb-data-loader --pretty` |
-| --control-file | The path to the JSON control file specifying the rules for the custom data mapping and/or multi-table import. | `scalardb-data-loader --control-file control.json` |
-| --control-file-validation-level | The validation level for the control file. `MAPPED`, `KEYS` or` FULL`.
Optional and by default the level is set to `MAPPED` | `scalardb-data-loader --control-file-validation-level FULL` |
-| --log-put-value | Wether the value that was used in the ScalarDB `PUT` operation is included in the log files or not.
Optional and disabled by default. | `scalardb-data-loader --log-put-value` |
-| --error-file-required | To export an optional error file of type JSON when the import file contains CSV data. By default, this option is disabled. | `scalardb-data-loader --error-file-required` |
-| --error | To specify an optional error file when the import file contains CSV data. | `scalardb-data-loader --error ` |
-| --delimiter | To specify a custom delimiter if the import file contains CSV data. | `scalardb-data-loader --delimiter ` |
-| --header | To specify the header row data if the import file contains CSV data and does not have a header row. | `scalardb-data-loader --header ` |
-
-## Import mode
-
-The data loader supports the following import modes:
-
-| Mode | Description |
-| ------ | ------------------------------------------------------------ |
-| INSERT | Each source record is treated as new data. If the data already exists in the ScalarDB table, based on the partition and clustering key, the import for this source data will fail. |
-| UPDATE | Each source record is treated as an update for existing data in the ScalarDB table. If the data does not exist in the table, based on the partition key and clustering key, the import for this source data will fail. |
-| UPSERT | If the target ScalarDB table already contains the data, the import will be done via an UPDATE. If the target data is missing, it will be treated as an INSERT. |
-
-*Note*:
-
- In the case of `INSERT`, it is required to have matching fields in the source files for each target column via automatic or custom mapping via the control file. This also applies to an `UPSERT` that turns into an `INSERT`.
-
-## Data mapping
-
-### Automatic mapping
-
-When no control file is provided, the data loader will automatically map the fields in the source JSON data to the available columns in the ScalarDB table. If the name does not match, and if all columns are required, it will be treated as a validation error. In this case, the import for this record will fail and the result will be added to the failed output log.
-
-### Custom mapping
-
-When the source fields do not match the target column name, it is necessary to use a control file. In this control, file you can specify the custom mapping rules for the field names.
-
-e.g. the following control file to map the field `custom_id` in the source file to `id` in the target table.
-
-```json
-{
- "tables": [{
- "namespace": "sample",
- "table_name": "table1",
- "mappings": [{
- "source_field": "custom_id",
- "target_column": "id"
- }]
- }
- ]
-}
-```
-
-## Control file
-
-To allow for custom data mapping or multi-table importing, the data loader supports configuration via a JSON control file. This file needs to be passed in via the `--control-file` argument when starting the data loader.
-
-### Control file validation levels
-
-To enforce validation on the control file, the data loader allows you to specify the validation level. Based on the set level, the data loader will run a pre-check and validate the control file based on the level rules.
-
-The following levels are supported:
-
-| Level | Description |
-| ------ | ------------------------------------------------------------ |
-| FULL | This validation makes sure that the control file has mappings for each column in the target ScalarDB table. |
-| KEYS | This validation makes sure that mappings are available for each ScalarDB table partition and, if available, clustering keys columns in the control file mappings. |
-| MAPPED | The validation makes sure that the provided source fields and target columns exist for only the mappings that are provided in the control file.
No other fields are checked. |
-
-The validation level is optional and can be set via the `--control-file-validation-level` argument when starting the data loader.
-
-*Note*:
-
-This validation is run as a pre-check and does not mean the import process will automatically succeed.
-
-e.g. If the level is set to mapped and the control file does not contain mappings for each column for an INSERT, the import process will still fail as all columns are required to be mapped for an INSERT.
-
-## Multi-table import
-
-The data loader supports multi-table target importing.
-
-One single row in a JSON or JSONLine file can be imported into multiple tables by specifying table mapping rules in the control file. Currently, multi-table import is not supported without a control file.
-
-When using multi-table import in ScalarDB transaction mode, a transaction is created for each table import. e.g. If the source record is mapped to 3 tables in the control file, 3 separate transactions are created.
-
-e.g. The import the following source record into `table1` and `table2` we execute the following steps:
-
-| Table1 | Table2 |
-| ------ | ------ |
-| Id | Id |
-| price | amount |
-
-**Source record**
-
-```json
-[{
- "custom_id": 1,
- "custom_price": 1000,
- "custom_amount": 100
-
-}]
-```
-
-**Control file**
-
-```json
-{
- "tables": [{
- "namespace": "sample",
- "table_name": "table1",
- "mappings": [{
- "source_field": "custom_id",
- "target_column": "id"
- }, {
- "source_field": "custom_price",
- "target_column": "price"
- }]
- },
- {
- "namespace": "sample",
- "table_name": "table2",
- "mappings": [{
- "source_field": "custom_id",
- "target_column": "id"
- }, {
- "source_field": "custom_amount",
- "target_column": "amount"
- }]
- }
- ]
-}
-```
-
-
-
-## Output logs
-
-When starting an import task, the data loader logs the import results in two files. One file contains the import data that is successfully imported and one file contains the data that cannot be imported. The failed data will contain an added field that explains why the data could not be imported. This field is called `data_loader_import_status`.
-
-The file containing the failed imports can be edited to correct the problems and used as the source file for a new import task as is. It is not required to first remove the `data_loader_import_status` field containing the error. This field will be ignored during the import process and the original value will not be included in the new version of the success or failed output file.
-
-The file with the successfully imported data also contains the `data_loader_import_status` field. In this file, each imported data row has a status message for the data. Whether a new row was created or existing data was updated.
-
-### Log format
-
-| Field | Description |
-| -------------- | ------------------------------------------------------------ |
-| action | The result of the import process for the data record. UPDATE, INSERT or FAILED_DURING_VALIDATION |
-| namespace | The name of the namespace of the table that the data is imported in |
-| tablename | The name of the table that the data is imported in |
-| is_data_mapped | Whether custom data mapping was applied or not based on an available control file. |
-| tx_id | The id of the transaction. Only available if the data loader is run in `transaction` mode. |
-| value | The final value, after optional data mapping, that the data loader uses in the `PUT` operation. |
-| row_number | The line number or record number of the source data. |
-| Errors | A list of validation or other errors for things that went wrong during the import process. |
-
-Example of a JSON formatted success log file:
-
-```json
-[{
- "column_1": 1,
- "column_2": 2,
- "column_n": 3,
- "data_loader_import_status": {
- "results": [{
- "action": "UPDATE",
- "namespace": "namespace1",
- "tableName": "table1",
- "is_data_mapped": true,
- "tx_id": "value",
- "value": "value",
- "row_number": "value"
- }]
- }
-}]
-```
-
-
-
-Example of a JSON formatted failed log file:
-
-```json
-[{
- "column_1": 1,
- "column_2": 2,
- "column_n": 3,
- "data_loader_import_status": {
- "results": [{
- "action": "FAILED_DURING_VALIDATION",
- "namespace": "namespace1",
- "tableName": "table1",
- "is_data_mapped": false,
- "value": "value",
- "row_number": "value",
- "errors": [
- "missing columns found during validation"
- ]
- }]
- }
-}]
-```
-
-## Data duplicates
-
-The data loader does not handle duplicates by itself. In ScalarDB transaction mode, trying to
-update the same target data in fast succession will cause `No Mutation` errors and these are not
-handled by the data loader. These failed data rows will be added to the failed import result output
-file and can be re-tried for import later.
-
-However, it is recommended to make sure the import file does not contain updates or inserts on the
-same partition keys and/or clustering keys as the correct state cannot be guaranteed by the data
-loader.
-
-
-
-## Storage vs transaction mode
-
-ScalarDB supports both storage and transaction mode and this support is included in the data loader import process.
-
-When the loader is started in storage mode, each import is executed in a non-transactional way.
-
-Starting the loader in transaction mode will use transactions to import the data. Currently, each row is imported via a separate transaction. When importing a single record into multiple tables, a separate transaction is created for each table import.
-