FullTextKP

Keyphrase Generation Beyond the Boundaries of Title and Abstract

Create environment

conda create -n FKP_env python=3.6

conda activate FKP_env

conda install pytorch cudatoolkit=11.3 -c pytorch

pip install transformers==4.12.0

Run Commands

Preprocess

cd preprocess

# Stage1
python preprocess_ACM_stage1.py

# Stage2

## Title+Abstract
python preprocess_ACM_stage2_v2.py

## Citations
python preprocess_ACM_stage2_v4.py

## Non-Citations
python preprocess_ACM_stage2_v5.py

## Random
python preprocess_ACM_stage2_v6.py

Summarization

Expects processed_data in the main directory, pacssum_models in the summarization folder

Download the pretrained models (into pacssum_models) for BERT using https://drive.google.com/file/d/1wbMlLmnbD_0j7Qs8YY8cSCh935WKKdsP/view?usp=sharing

cd summarization

# Run tfidf summarizer
python run.py --rep tfidf

# Run BERT Summarizer
python run.py --rep bert

Abstractive summarization

cd abstractive_summarization

# Stage1
python abs_sum.py

# Stage2
cd preprocess_abs_sum.py

python preprocess_abs_sum.py

Retrieval Augmentation

cd specter

python preprocess_ACM.py

./embed.sh

Train & Test

# Train
python train.py

# Train on limited data
python train.py --limit=100

# Load Checkpoint
python train.py --checkpoint=True

# Train for multiple runs after the initial run(s)
python train.py --times=3 --initial_time=1

# Test (assuming that saved weights are present)
python train.py --test=True

Citation

Please consider citing our paper if you find this work useful:

@inproceedings{garg-etal-2022-keyphrase,
    title = "Keyphrase Generation Beyond the Boundaries of Title and Abstract",
    author = "Garg, Krishna  and
      Ray Chowdhury, Jishnu  and
      Caragea, Cornelia",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.427",
    pages = "5809--5821",
    abstract = "Keyphrase generation aims at generating important phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches have largely used only the title and abstract of the articles to generate keyphrases. In this paper, we comprehensively explore whether the integration of additional information from the full text of a given article or from semantically similar articles can be helpful for a neural keyphrase generation model or not. We discover that adding sentences from the full text, particularly in the form of the extractive summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the text. Experimental results with three widely used models for keyphrase generation along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FullTextKP for keyphrase generation. Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract. We release the source code at https://github.com/kgarg8/FullTextKP.",
}

Credits

PacSum Repo for Summarization

Specter

FAISS

Questions

Please contact [email protected] for any questions related to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
abstractive_summarization		abstractive_summarization
agents		agents
collaters		collaters
configs		configs
controllers		controllers
models		models
preprocess		preprocess
res		res
specter		specter
summarization		summarization
trainers		trainers
transformers		transformers
utils		utils
.gitignore		.gitignore
LED_download.py		LED_download.py
LICENSE		LICENSE
README.md		README.md
parser.py		parser.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FullTextKP

Keyphrase Generation Beyond the Boundaries of Title and Abstract

Create environment

Run Commands

Preprocess

Summarization

Abstractive summarization

Retrieval Augmentation

Train & Test

Citation

Credits

Questions

About

Releases

Packages

Languages

License

kgarg8/FullTextKP

Folders and files

Latest commit

History

Repository files navigation

FullTextKP

Keyphrase Generation Beyond the Boundaries of Title and Abstract

Create environment

Run Commands

Preprocess

Summarization

Abstractive summarization

Retrieval Augmentation

Train & Test

Citation

Credits

Questions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages