doc:add lambda layout analysis docs

awslabs · Mar 24, 2024 · a06c02b · a06c02b
1 parent 76a12ad
commit a06c02b
Show file tree

Hide file tree

Showing 4 changed files with 163 additions and 28 deletions.
diff --git a/docs/en/deploy-layout-analysis.md b/docs/en/deploy-layout-analysis.md
@@ -15,70 +15,72 @@ include "include-deploy-description.md"
 
 ## API reference
 
-### Text classification
-
 - HTTP request method: `POST`
 
 - Request body parameters
 
-| **Name**  | **Type**  | **Required** |  **Description**  |
-|----------|-----------|------------|------------|
-| url | *String* |Choose url or img.| Image URL address, which supports HTTP/HTTPS and S3 protocols. Supported image formats are jpg/jpeg/png/bmp, with the longest side not exceeding 4096px.|
-| img | *String* |Choose url or img.|Base64 encoded image data.|
-| output_type | *String* | |`json` or `markdown`, Whether the result is returned in json or converted to markdown.|
-| table_type | *String* | |`html` or `markdown`, Whether the table in the result is returned in html or converted to markdown.|
+| **Name**    | **Type** | **Required**       | **Description**                                                                                                                                          |
+| ----------- | -------- | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| url         | _String_ | Choose url or img. | Image URL address, which supports HTTP/HTTPS and S3 protocols. Supported image formats are jpg/jpeg/png/bmp, with the longest side not exceeding 4096px. |
+| img         | _String_ | Choose url or img. | Base64 encoded image data.                                                                                                                               |
+| output_type | _String_ |                    | `json` or `markdown`, Whether the result is returned in json or converted to markdown.                                                                   |
+| table_type  | _String_ |                    | `html` or `markdown`, Whether the table in the result is returned in html or converted to markdown.                                                      |
 
 - Example Request
 
 **Example 1**
 
-``` json
+```json
 {
-  "url": "{{page.meta.sample_image}}"
+  "url": "{{page.meta.sample_image}}",
+  "output_type": "json"
 }
 ```
 
-``` json
+```json
 {
-  "img": "Base64-encoded image data"
+  "img": "Base64-encoded image data",
+  "output_type": "json"
 }
 ```
 
 **Example 2**
 
-``` json
+```json
 {
-  "url": "{{page.meta.sample_image}}"
+  "url": "{{page.meta.sample_image}}",
+  "output_type": "markdown"
 }
 ```
 
-``` json
+```json
 {
-  "img": "Base64-encoded image data"
+  "img": "Base64-encoded image data",
+  "output_type": "markdown"
 }
 ```
 
 - Response parameters
 
 When `output_type` is `json`, a list is returned, and an item in the list is a block in a document.
 
-| **Name** | **Type** | **Description**  |
-|----------|-----------|------------|
-|BlockType    |*String*   |These elements correspond to the different portions of the layout, and are: text, title, figure or table|
-|location |*Dict*     |location information about the location of detected items on a document page.|
-|Text    |*String*   |The text content of the current block. When BlockType is `table`, the html or markdown of the table will be returned depending on the `table_type`.|
+| **Name**  | **Type** | **Description**                                                                                                                                     |
+| --------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
+| BlockType | _String_ | These elements correspond to the different portions of the layout, and are: text, title, figure or table                                            |
+| Geometry  | _Dict_   | location information about the location of detected items on a document page.                                                                       |
+| Text      | _String_ | The text content of the current block. When BlockType is `table`, the html or markdown of the table will be returned depending on the `table_type`. |
 
 When `output_type` is `markdown`, a dict is returned.
 
-| **Name** | **Type** | **Description**  |
-|----------|-----------|------------|
-|markdown    |*String*   |Converted images to markdown result.|
+| **Name** | **Type** | **Description**                      |
+| -------- | -------- | ------------------------------------ |
+| markdown | _String_ | Converted images to markdown result. |
 
 - Example JSON response
 
 **Example 1 `output_type` is `json` response**
 
-``` json
+```json
 [
   {
     "BlockType": "text",
@@ -110,7 +112,7 @@ When `output_type` is `markdown`, a dict is returned.
 
 **Example 2 `output_type` is `markdown` response**
 
-``` json
+```json
 {
   "Markdown": "核准日期：xxx年xx月xx日 \n\n修改日期：xxx年xx月xx日...."
 }

diff --git a/docs/en/deployment.md b/docs/en/deployment.md
@@ -49,6 +49,7 @@ information, see [ICP Recordal](https://www.amazonaws.cn/en/support/icp/?nc1=h_l
 | **LiteOCR**                   | no      | Deploy [General OCR (Simplified Chinese)](deploy-general-ocr.md)              |
 | **GeneralOCRTraditional**     | no      | Deploy [General OCR (Traditional Chinese)](deploy-general-ocr-traditional.md) |
 | **AdvancedOCR**               | no      | Deploy [Advanced OCR](deploy-general-ocr-traditional.md)                      |
+| **DocumentLayoutAnalysis**    | no      | Deploy [Document Layout Analysis](deploy-layout-analysis.md)                  |
 | **CustomOCR**                 | no      | Deploy [Custom OCR](deploy-custom-ocr.md)                                     |
 | **CarLicensePlate**           | no      | Deploy [Car License Plate](deploy-car-license-plate.md)                       |
 | **FaceComparison**            | no      | Deploy [Face Comparison](deploy-face-comparison.md)                           |

diff --git a/docs/zh/deploy-layout-analysis.md b/docs/zh/deploy-layout-analysis.md
@@ -0,0 +1,131 @@
+---
+feature_id: DocumentLayoutAnalysis
+feature_name: 文档版面分析
+feature_endpoint: layout_analysis
+deployment_time: 20 Minutes
+destroy_time: 15 Minutes
+sample_image: 图像的URL地址
+feature_description: 将文档图像转换为 Markdown/JSON 格式输出，并以 Markdown/HTML 格式生成表格。
+feature_scenario: 适用于将纸质文件更改为电子格式、文件识别和内容审查，以提高信息处理效率。
+---
+
+{%
+include "include-deploy-description.md"
+%}
+
+## API 参数说明
+
+- HTTP 方法: `POST`
+
+- Body 请求参数
+
+| **名称**                    | **类型** | **是否必选**      | **说明**                                                                                                                                                                                                                             |
+| --------------------------- | -------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| url&nbsp;&nbsp;&nbsp;&nbsp; | _String_ | 与 img 参数二选一 | 图像的 URL 地址。支持 HTTP/HTTPS 和 S3 协议。要求图像格式为 jpg/jpeg/png/bmp ，图像大小建议不超过 1920 _ 1080，在开启人像增强的情况下，图像大小建议不超过 1280 _ 720。AWS Lambda 版本方案由于性能限制，图像大小建议不超过 400 \* 400 |
+| img                         | _String_ | 与 url 参数二选一 | 进行 Base64 编码的图像数据                                                                                                                                                                                                           |
+| output_type                 | _String_ |                   | json`或`markdown`，返回结果是 json 格式还是转换为 markdown 格式。                                                                                                                                                                    |
+| table_type                  | _String_ |                   | html "或 "markdown"，结果中的表格是以 html 格式返回还是转换为 markdown 格式。                                                                                                                                                        |
+
+- 请求 Body 示例
+
+**示例 1**
+
+```json
+{
+  "url": "{{page.meta.sample_image}}",
+  "output_type": "json"
+}
+```
+
+```json
+{
+  "img": "Base64-encoded image data",
+  "output_type": "json"
+}
+```
+
+**示例 2**
+
+```json
+{
+  "url": "{{page.meta.sample_image}}",
+  "output_type": "markdown"
+}
+```
+
+```json
+{
+  "img": "Base64-encoded image data",
+  "output_type": "markdown"
+}
+```
+
+- 返回参数
+
+- 返回参数
+
+| **名称**  | **类型** | **说明**                                                   |
+| --------- | -------- | ---------------------------------------------------------- |
+| BlockType | _String_ | 这些元素与版面的不同部分相对应，它们是：文本、标题、图或表 |
+| Geometry  | _Dict_   | 文档页面上检测到的项目的位置信息                           |
+| Text      | _String_ | 当前块的文本内容。当 BlockType 为 "table "时，将根据 "table_type "返回表格的 html 或 markdown。 |
+
+当 `output_type` 为 `markdown` 时，将返回一个 dict。
+
+| **名称**  | **类型** | **说明**                                                   |
+| -------- | -------- | ------------------------------------ |
+| markdown | _String_ | 将图像转换为 markdown 结果 |
+
+- 返回示例
+
+** 示例 1 `output_type` 是 `json` 格式**
+
+```json
+[
+  {
+    "BlockType": "text",
+    "Geometry": {
+      "BoundingBox": {
+        "Width": 972,
+        "Height": 79,
+        "Left": 561,
+        "Top": 564
+      }
+    },
+    "Text": "核准日期：xxx年xx月xx日"
+  },
+  {
+    "BlockType": "text",
+    "Geometry": {
+      "BoundingBox": {
+        "Width": 1543,
+        "Height": 80,
+        "Left": 569,
+        "Top": 678
+      }
+    },
+    "Text": "修改日期：xxx年xx月xx日"
+  }
+    ...
+]
+```
+
+**示例 2 `output_type` 是 `markdown` 格式**
+
+```json
+{
+  "Markdown": "核准日期：xxx年xx月xx日 \n\n修改日期：xxx年xx月xx日...."
+}
+```
+
+{%
+include-markdown "include-deploy-code.md"
+%}
+
+{%
+include "include-deploy-cost-8GB.md"
+%}
+
+{%
+include-markdown "include-deploy-uninstall.md"
+%}
diff --git a/docs/zh/deployment.md b/docs/zh/deployment.md
@@ -37,10 +37,11 @@
       来部署所需要的功能。所有功能的参数默认值均为 **no**。
 
       | 参数名称                               | 默认值   |  描述 |
-                        |-------------------|-------| --------  |
+      |-------------------|-------| --------  |
       | **Lite OCR - Simplified Chinese**  | no    | 部署[轻量级文字识别（简体中文）](deploy-general-ocr.md) |
       | **General OCR - Traditional Chinese** | no    | 部署[通用文字识别（繁体中文）](deploy-general-ocr-traditional.md) |
-      | **Advanced OCR**     | no    | 部署[高阶文字识别](deploy-advanced-ocr.md) |
+      | **Advanced OCR**                   | no    | 部署[高阶文字识别](deploy-advanced-ocr.md) |
+      | **DocumentLayoutAnalysis**         | no      | 部署 [文档版面分析](deploy-layout-analysis.md)   |
       | **Custom Template OCR**            | no    | 部署[自定义模板文字识别](deploy-custom-ocr.md) |
       | **Car License Plate**              | no    | 部署[车牌信息识别](deploy-car-license-plate.md) |
       | **Face Comparison**                | no    | 部署[人脸相似度比对](deploy-face-comparison.md) |