重新核对README.md

AlieZVzz · Sep 29, 2022 · b22466e · b22466e
1 parent 88774f1
commit b22466e
Show file tree

Hide file tree

Showing 8 changed files with 148 additions and 203 deletions.
diff --git a/README.md b/README.md
@@ -1,174 +1,132 @@
-#  面向藏族传统节日的汉藏双语命名实体识别研究 
-#  A Study of Bilingual Chinese-Tibetan Named Entity Recognition for Traditional Tibetan Festivals
->  In this study, Chinese-Tibetan bilingual text data containing information on Tibetan traditional festivals from news websites such as People's Daily Online and People's Daily Tibetan Edition were collected and manually annotated. This study compares the performance of multiple pretrained models and word vectors for Tibetan traditional festival named entity recognition task in the Chinese-Tibetan bilingual scenario and analyzes the impact of two feature processing layers of the model, BiLSTM layer and CRF layer, on the experimental results.
-## File Index
-```
-D:.
-├───.idea
-│   ├───dataSources
-│   └───inspectionProfiles
-├───checkpoints
-│   └───01
-├───CINO text classification
-│   ├───data
-│   ├───log
-│   └───Tibetan News Classification Corpus
-├───config
-├───data_collect
-│   ├───Chinese data
-│   └───Tibetan data
-│       └───crawler_data
-├───log
-│   ├───bo_fasttext_bilstm_crf
-│   ├───bo_PLM_bilstm
-│   ├───bo_PLM_crf
-│   ├───cn_fasttext_bilstm_crf
-│   ├───cn_PLM_bilstm
-│   ├───cn_PLM_bilstm_crf
-│   └───cn_PLM_crf
-├───model
-│   ├───CINO_base
-│   ├───ernie
-│   ├───fasttext_bo
-│   ├───fasttext_cn
-│   ├───roberta-base-bo
-│   ├───roberta-chinese
-│   └───__pycache__
-├───output
-├───static
-│   ├───css
-│   ├───font
-│   ├───image
-│   ├───js
-│   └───picture
-├───templates
-├───utils
-├───visualization
-└───__pycache__
+#  A Study of Bilingual Chinese-Tibetan Named Entity Recognition for Traditional Tibetan Festivals（面向藏族传统节日的汉藏双语命名实体识别研究)
+## Overview
+This repository contains the official implementation of "A Study of Bilingual Chinese-Tibetan Named Entity Recognition for Traditional Tibetan Festivals" paper. Additionaly, detailed implement guides are provided.
+
+In this study, Chinese-Tibetan bilingual text data containing information on Tibetan traditional festivals from news websites such as People's Daily Online and People's Daily Tibetan Edition were collected and manually annotated. This study compares the performance of multiple pretrained models and word vectors for Tibetan traditional festival named entity recognition task in the Chinese-Tibetan bilingual scenario and analyzes the impact of two feature processing layers of the model, BiLSTM layer and CRF layer, on the experimental results.
+## Environment
+- Python 3.8
+- Pytorch 1.10
+- beautifulsoup4 4.11.1
+- Flask 2.0.2
+- html2text 2020.1.16
+- joeynmt 1.5.1
+- transformers 4.18.0
 
+To install the environment using Conda:
+```bash
+$ conda env create -f requirements.yml
+```
+## Running
+### Train
+To train the models in this study, run the command below. Detailed configs are in the config folder.
+```bash
+$ python main.py 
+```
+### Deployment
+To deploy trained model to server, run the command below.
+```bash
+$ python app.py
 ```
+
 ## Metrics
-<table class="21" border="1" cellspacing="0" cellpadding="0" align="left" style="border-collapse:collapse;border:none;margin-left:6.75pt;margin-right:
- 6.75pt">
- <tbody><tr style="height:2.85pt">
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><a name="_Hlk107927379"></a><a name="OLE_LINK82"><span style="font-size:7.5pt;
-  font-family:宋体">语言</span></a></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><a name="_Hlk105266154"></a><a name="OLE_LINK42"><span style="font-size:7.5pt;
-  font-family:宋体">模型</span></a></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span style="font-size:7.5pt;font-family:宋体">准确率</span></span><span style="font-size:7.5pt;
-  font-family:宋体"></span></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span style="font-size:7.5pt;font-family:宋体">召回率</span><span style="font-size:7.5pt;
-  font-family:宋体"></span></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;font-family:&quot;Times New Roman&quot;,serif">F1</span></p>
-  </td>
- </tr>
- <tr style="height:2.85pt">
-  <td rowspan="2" style="border:none;padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span style="font-size:7.5pt;font-family:宋体">汉语</span></p>
-  </td>
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><a name="OLE_LINK41"><span style="font-size:7.5pt;font-family:宋体">汉语</span></a><span lang="EN-US" style="font-size:7.5pt">fastText</span><span style="font-size:7.5pt;
-  font-family:宋体">词向量</span><span lang="EN-US" style="font-size:7.5pt">-BiLSTM-CRF</span><span style="font-size:7.5pt;font-family:宋体">模型</span></p>
-  </td>
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">94.65%</span></p>
-  </td>
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">89.64%</span></p>
-  </td>
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">91.97%</span></p>
-  </td>
- </tr>
- <tr style="height:2.85pt">
-  <td valign="top" style="border:none;padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span style="font-size:7.5pt;font-family:宋体">汉语</span><span lang="EN-US" style="font-size:7.5pt">RoBERTa</span><span style="font-size:7.5pt;
-  font-family:宋体">预训练模型</span><span lang="EN-US" style="font-size:7.5pt">-BiLSTM-CRF</span><span style="font-size:7.5pt;font-family:宋体">模型</span></p>
-  </td>
-  <td valign="top" style="border:none;padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">93.97%</span></p>
-  </td>
-  <td valign="top" style="border:none;padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">92.32%</span></p>
-  </td>
-  <td valign="top" style="border:none;padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">93.05%*</span></p>
-  </td>
- </tr>
- <tr style="height:2.85pt">
-  <td rowspan="2" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span style="font-size:7.5pt;font-family:宋体">藏语</span></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span style="font-size:7.5pt;font-family:宋体">藏语</span><span lang="EN-US" style="font-size:7.5pt">fastText</span><span style="font-size:7.5pt;
-  font-family:宋体">词向量</span><span lang="EN-US" style="font-size:7.5pt">-BiLSTM-CRF</span><span style="font-size:7.5pt;font-family:宋体">模型</span></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">84.97%</span></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">76.68%</span></p>
-  </td>
-  <td valign="top" style="border-top:solid #7F7F7F 1.0pt;border-left:none;
-  border-bottom:solid #7F7F7F 1.0pt;border-right:none;padding:0cm 5.4pt 0cm 5.4pt;
-  height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">80.37%</span></p>
-  </td>
- </tr>
- <tr style="height:2.85pt">
-
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span style="font-size:7.5pt;font-family:宋体">藏语</span><span lang="EN-US" style="font-size:7.5pt">RoBERTa</span><span style="font-size:7.5pt;
-  font-family:宋体">预训练模型</span><span lang="EN-US" style="font-size:7.5pt">-BiLSTM-CRF</span><span style="font-size:7.5pt;font-family:宋体">模型</span></p>
-  </td>
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">83.40%</span></p>
-  </td>
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">89.60%</span></p>
-  </td>
-  <td valign="top" style="border:none;border-bottom:solid #7F7F7F 1.0pt;
-  padding:0cm 5.4pt 0cm 5.4pt;height:2.85pt">
-  <p class="MsoNormal" align="center" style="text-align:center"><span lang="EN-US" style="font-size:7.5pt;color:black">86.27%*</span></p>
-  </td>
- </tr>
+<table>
+<tbody>
+  <tr>
+    <td>
+      <p>语言</p>
+    </td>
+    <td>
+      <p>模型</p>
+    </td>
+    <td>
+      <p>准确率</p>
+    </td>
+    <td>
+      <p>召回率</p>
+    </td>
+    <td>
+      <p></p>
+    </td>
+  </tr>
+  <tr>
+    <td rowspan="2">
+      <p>汉语</p>
+    </td>
+    <td>
+      <p>汉语fastText词向量-BiLSTM-CRF模型</p>
+    </td>
+    <td>
+      <p>94.65%</p>
+    </td>
+    <td>
+      <p>89.64%</p>
+    </td>
+    <td>
+      <p>91.97%</p>
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <p>汉语RoBERTa预训练模型-BiLSTM-CRF模型</p>
+    </td>
+    <td>
+      <p>93.97%</p>
+    </td>
+    <td>
+      <p>92.32%</p>
+    </td>
+    <td>
+      <p>93.05%*</p>
+    </td>
+  </tr>
+  <tr>
+    <td rowspan="2">
+      <p>藏语</p>
+    </td>
+    <td>
+      <p>藏语fastText词向量-BiLSTM-CRF模型</p>
+    </td>
+    <td>
+      <p>84.97%</p>
+    </td>
+    <td>
+      <p>76.68%</p>
+    </td>
+    <td>
+      <p>80.37%</p>
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <p>藏语RoBERTa预训练模型-BiLSTM-CRF模型</p>
+    </td>
+    <td>
+      <p>83.40%</p>
+    </td>
+    <td>
+      <p>89.60%</p>
+    </td>
+    <td>
+      <p>86.27%*</p>
+    </td>
+  </tr>
 </tbody></table>
-
 
-## Usage  
-在model文件夹中对应的预训练模型文件夹中放入pytorch模型，并在data_collect文件夹中放入对应语言的txt文件，修改main.py、utils.py、crf.py文件中的train_method。
-运行main.py文件即可
+
+## Citation
+If you make use of this code, please cite the following paper:
+```bibtex
+
+@article{__nodate,
+	title = {面向藏族传统节日的汉藏双语命名实体识别研究},
+	issn = {2096-3467},
+	url = {https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CAPJ&dbname=CAPJLAST&filename=XDTQ20220919000&uniplatform=NZKPT&v=E0vy1-y12agMc69tk-2GtR8p4fYnElMzKx2NKvq_UAb22I3wPYyc87DvVxQNCyxI},
+	language = {中文},
+	urldate = {2022-09-29},
+	journal = {数据分析与知识发现},
+	author = {邓, 宇扬 and 吴, 丹},
+	keywords = {Named Entity Recognition, Pretrained Language Model, Tibetan Traditional Culture, 命名实体识别, 藏族传统文化, 预训练语言模型},
+	pages = {1--15},
+}
+
+```
diff --git a/core.py b/core.py
@@ -1,17 +1,3 @@
-import re
-import logging
-from subword_nmt import apply_bpe
-from sacremoses import MosesTokenizer, MosesDetokenizer
-import torch
-import json
-from utils import get_tags, format_result, tag2idx, TAGS, idx2tag
-from transformers import BertTokenizer, AutoTokenizer
-import langid
-from joeynmt.helpers import load_config, get_latest_checkpoint, \
-    load_checkpoint
-from joeynmt.vocabulary import build_vocab
-from joeynmt.model import build_model
-from joeynmt.prediction import validate_on_data
 import torch
 import json
 from utils import get_tags, format_result, tag2idx, TAGS, idx2tag

diff --git a/main.py b/main.py
@@ -91,8 +91,6 @@ def train(model, iterator, optimizer, scheduler, criterion, device, epoch):
 
         if i == 0:
             logger.info("=====sanity check======")
-            # print(words[0])
-            # print(type(words[0]))
             logger.info("words:%s", words[0])
             logger.info("x:%s", x.cpu().tolist()[0][:seqlens[0]])
             # logger.info("tokens:", tokenizer.convert_ids_to_tokens(x.cpu().numpy()[0])[:seqlens[0]])
@@ -175,7 +173,7 @@ def eval(model, iterator, f, device):
 
 if __name__ == "__main__":
 
-    train_method = "bo_PLM_bilstm"
+    train_method = "bo_PLM_crf"
     cfg = ConfigParser()
     cfg.read("config/Chinese_Tibetan_Config.ini", encoding='utf-8')
     batch_size = cfg.getint(train_method, "batch_size")

diff --git a/model/crf.py b/model/crf.py
@@ -12,7 +12,7 @@
 from transformers import XLMRobertaModel, BertModel, AutoModel, AutoModelForMaskedLM
 from configparser import ConfigParser
 
-train_method = "bo_PLM_bilstm"
+train_method = "bo_PLM_crf"
 cfg = ConfigParser()
 cfg.read("config/Chinese_Tibetan_Config.ini", encoding='utf-8')
 batch_size = cfg.getint(train_method, "batch_size")  # 所有的参数都能用get去读成文本
@@ -116,7 +116,7 @@ def _forward_alg(self, feats):
         batch_size = feats.shape[0]
 
         # alpha_recursion,forward, alpha(zt)=p(zt,bar_x_1:t)
-        log_alpha = torch.Tensor(batch_size, 1, self.tagset_size).fill_(-10000.).to(torch.device("cpu"))  # [batch_size, 1, 16]
+        log_alpha = torch.Tensor(batch_size, 1, self.tagset_size).fill_(-10000.).to(torch.device("cuda"))  # [batch_size, 1, 16]
         # normal_alpha_0 : alpha[0]=Ot[0]*self.PIs
         # self.start_label has all of the score. it is log,0 is p=1
         log_alpha[:, 0, self.start_label_id] = 0
@@ -170,7 +170,7 @@ def _viterbi_decode(self, feats):
 
         # batch_transitions=self.transitions.expand(batch_size,self.tagset_size,self.tagset_size)
 
-        log_delta = torch.Tensor(batch_size, 1, self.tagset_size).fill_(-10000.).to(torch.device("cpu"))
+        log_delta = torch.Tensor(batch_size, 1, self.tagset_size).fill_(-10000.).to(torch.device("cuda"))
         log_delta[:, 0, self.start_label_id] = 0.
 
         # psi is for the vaule of the last latent that make P(this_latent) maximum.

diff --git a/output/predict.json b/output/predict.json
@@ -1 +1 @@
-[{"text": "1212月4日是工布新年的大年三十，这是西藏乃至全中国最早的新年。除夕之夜，大昭寺的工布藏族按照习俗进行驱鬼仪式，并把过年用的食物措、油“炸果子”、牛羊肉等摆在木盘中“请狗赴宴”。狗吃饱后，人们围坐一圈，喝青稞酒，还要吃结达。", "entities": [{"start": 6, "stop": 10, "entity": "工布新年", "type": "Festival", "color": "#3772FF"}, {"start": 67, "stop": 70, "entity": "炸果子", "type": "Item", "color": "#EF709D"}, {"start": 72, "stop": 75, "entity": "牛羊肉", "type": "Item", "color": "#EF709D"}, {"start": 101, "stop": 103, "entity": "青稞", "type": "Item", "color": "#EF709D"}, {"start": 108, "stop": 110, "entity": "结达", "type": "Item", "color": "#EF709D"}, {"start": 82, "stop": 86, "entity": "请狗赴宴", "type": "Event", "color": "#E2EF70"}, {"start": 36, "stop": 39, "entity": "大昭寺", "type": "Location", "color": "#FFEAAE"}], "tokenized": ["121", "##2", "月", "4", "日", "是", "工", "布", "新", "年", "的", "大", "年", "三", "十", "，", "这", "是", "西", "藏", "乃", "至", "全", "中", "国", "最", "早", "的", "新", "年", "。", "除", "夕", "之", "夜", "，", "大", "昭", "寺", "的", "工", "布", "藏", "族", "按", "照", "习", "俗", "进", "行", "驱", "鬼", "仪", "式", "，", "并", "把", "过", "年", "用", "的", "食", "物", "措", "、", "油", "“", "炸", "果", "子", "”", "、", "牛", "羊", "肉", "等", "摆", "在", "木", "盘", "中", "“", "请", "狗", "赴", "宴", "”", "。", "狗", "吃", "饱", "后", "，", "人", "们", "围", "坐", "一", "圈", "，", "喝", "青", "稞", "酒", "，", "还", "要", "吃", "结", "达", "。"]}]
+[{"text": "12月4日是工布新年的大年三十，这是西藏乃至全中国最早的新年。除夕之夜，大昭寺的工布藏族按照习俗进行驱鬼仪式，并把过年用的食物措、油“炸果子”、牛羊肉等摆在木盘中“请狗赴宴”。狗吃饱后，人们围坐一圈，喝青稞酒，还要吃结达。", "entities": [{"start": 5, "stop": 9, "entity": "工布新年", "type": "Festival", "color": "#3772FF"}, {"start": 66, "stop": 69, "entity": "炸果子", "type": "Item", "color": "#EF709D"}, {"start": 71, "stop": 74, "entity": "牛羊肉", "type": "Item", "color": "#EF709D"}, {"start": 100, "stop": 102, "entity": "青稞", "type": "Item", "color": "#EF709D"}, {"start": 107, "stop": 109, "entity": "结达", "type": "Item", "color": "#EF709D"}, {"start": 81, "stop": 85, "entity": "请狗赴宴", "type": "Event", "color": "#E2EF70"}, {"start": 35, "stop": 38, "entity": "大昭寺", "type": "Location", "color": "#FFEAAE"}], "tokenized": ["12", "月", "4", "日", "是", "工", "布", "新", "年", "的", "大", "年", "三", "十", "，", "这", "是", "西", "藏", "乃", "至", "全", "中", "国", "最", "早", "的", "新", "年", "。", "除", "夕", "之", "夜", "，", "大", "昭", "寺", "的", "工", "布", "藏", "族", "按", "照", "习", "俗", "进", "行", "驱", "鬼", "仪", "式", "，", "并", "把", "过", "年", "用", "的", "食", "物", "措", "、", "油", "“", "炸", "果", "子", "”", "、", "牛", "羊", "肉", "等", "摆", "在", "木", "盘", "中", "“", "请", "狗", "赴", "宴", "”", "。", "狗", "吃", "饱", "后", "，", "人", "们", "围", "坐", "一", "圈", "，", "喝", "青", "稞", "酒", "，", "还", "要", "吃", "结", "达", "。"]}]
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		[{"text": "1212月4日是工布新年的大年三十，这是西藏乃至全中国最早的新年。除夕之夜，大昭寺的工布藏族按照习俗进行驱鬼仪式，并把过年用的食物措、油“炸果子”、牛羊肉等摆在木盘中“请狗赴宴”。狗吃饱后，人们围坐一圈，喝青稞酒，还要吃结达。", "entities": [{"start": 6, "stop": 10, "entity": "工布新年", "type": "Festival", "color": "#3772FF"}, {"start": 67, "stop": 70, "entity": "炸果子", "type": "Item", "color": "#EF709D"}, {"start": 72, "stop": 75, "entity": "牛羊肉", "type": "Item", "color": "#EF709D"}, {"start": 101, "stop": 103, "entity": "青稞", "type": "Item", "color": "#EF709D"}, {"start": 108, "stop": 110, "entity": "结达", "type": "Item", "color": "#EF709D"}, {"start": 82, "stop": 86, "entity": "请狗赴宴", "type": "Event", "color": "#E2EF70"}, {"start": 36, "stop": 39, "entity": "大昭寺", "type": "Location", "color": "#FFEAAE"}], "tokenized": ["121", "##2", "月", "4", "日", "是", "工", "布", "新", "年", "的", "大", "年", "三", "十", "，", "这", "是", "西", "藏", "乃", "至", "全", "中", "国", "最", "早", "的", "新", "年", "。", "除", "夕", "之", "夜", "，", "大", "昭", "寺", "的", "工", "布", "藏", "族", "按", "照", "习", "俗", "进", "行", "驱", "鬼", "仪", "式", "，", "并", "把", "过", "年", "用", "的", "食", "物", "措", "、", "油", "“", "炸", "果", "子", "”", "、", "牛", "羊", "肉", "等", "摆", "在", "木", "盘", "中", "“", "请", "狗", "赴", "宴", "”", "。", "狗", "吃", "饱", "后", "，", "人", "们", "围", "坐", "一", "圈", "，", "喝", "青", "稞", "酒", "，", "还", "要", "吃", "结", "达", "。"]}]
		[{"text": "12月4日是工布新年的大年三十，这是西藏乃至全中国最早的新年。除夕之夜，大昭寺的工布藏族按照习俗进行驱鬼仪式，并把过年用的食物措、油“炸果子”、牛羊肉等摆在木盘中“请狗赴宴”。狗吃饱后，人们围坐一圈，喝青稞酒，还要吃结达。", "entities": [{"start": 5, "stop": 9, "entity": "工布新年", "type": "Festival", "color": "#3772FF"}, {"start": 66, "stop": 69, "entity": "炸果子", "type": "Item", "color": "#EF709D"}, {"start": 71, "stop": 74, "entity": "牛羊肉", "type": "Item", "color": "#EF709D"}, {"start": 100, "stop": 102, "entity": "青稞", "type": "Item", "color": "#EF709D"}, {"start": 107, "stop": 109, "entity": "结达", "type": "Item", "color": "#EF709D"}, {"start": 81, "stop": 85, "entity": "请狗赴宴", "type": "Event", "color": "#E2EF70"}, {"start": 35, "stop": 38, "entity": "大昭寺", "type": "Location", "color": "#FFEAAE"}], "tokenized": ["12", "月", "4", "日", "是", "工", "布", "新", "年", "的", "大", "年", "三", "十", "，", "这", "是", "西", "藏", "乃", "至", "全", "中", "国", "最", "早", "的", "新", "年", "。", "除", "夕", "之", "夜", "，", "大", "昭", "寺", "的", "工", "布", "藏", "族", "按", "照", "习", "俗", "进", "行", "驱", "鬼", "仪", "式", "，", "并", "把", "过", "年", "用", "的", "食", "物", "措", "、", "油", "“", "炸", "果", "子", "”", "、", "牛", "羊", "肉", "等", "摆", "在", "木", "盘", "中", "“", "请", "狗", "赴", "宴", "”", "。", "狗", "吃", "饱", "后", "，", "人", "们", "围", "坐", "一", "圈", "，", "喝", "青", "稞", "酒", "，", "还", "要", "吃", "结", "达", "。"]}]