CS224N-PROJECT

Thai is one of the languages that does not have explicit segmentation, and cannot be used with most word based models. In this paper we will be tackling this problem by implementing BiLSTM-CRF and BiGRU-CRF based segmentation algorithms to parse Thai text. Our model achieves an F1 score of 94.78 (micro) and 96.26 (macro). Our model outperforms the micro averaged F1 score from previous models and has comparable macro F1 score. The model also works well on small data, but struggles with named entities.

Model

Results

Below is a summarized table of the character-level F1 metrics for our project

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.gitignore		.gitignore
BiLSTM_CRF.py		BiLSTM_CRF.py
CS224N_Report.pdf		CS224N_Report.pdf
GRU_CRF.py		GRU_CRF.py
README.md		README.md
f1.py		f1.py
reconvert.py		reconvert.py
run.py		run.py
run.sh		run.sh
test.py		test.py
test2.py		test2.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS224N-PROJECT

Model

Results

About

Releases

Packages

Languages

abanuelo/CS224N-PROJECT

Folders and files

Latest commit

History

Repository files navigation

CS224N-PROJECT

Model

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages