A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Updated
Nov 29, 2024 - Python
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Portuguese pre-trained BERT models
The hands-on NLTK tutorial for NLP in Python
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Projects and useful articles / links
A curated list of beginner resources in Natural Language Processing
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
This is a continuously updated handbook for readers to easily track the latest NL2SQL (Text-to-SQL) techniques in the literature and provide practical guidance for researchers and practitioners. If we missed any interesting work, feel free to contact us.
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
A lexicon for Sudachi
A curated list of NLP resources for Hungarian
A Dutch RoBERTa-based language model
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
summaries of all the papers I read
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
An open information extraction system that provides compact extractions
Created by Alan Turing