conda create -n FKP_env python=3.6
conda activate FKP_env
conda install pytorch cudatoolkit=11.3 -c pytorch
pip install transformers==4.12.0
cd preprocess
# Stage1
python preprocess_ACM_stage1.py
# Stage2
## Title+Abstract
python preprocess_ACM_stage2_v2.py
## Citations
python preprocess_ACM_stage2_v4.py
## Non-Citations
python preprocess_ACM_stage2_v5.py
## Random
python preprocess_ACM_stage2_v6.py
Expects processed_data in the main directory, pacssum_models
in the summarization folder
Download the pretrained models (into pacssum_models
) for BERT using https://drive.google.com/file/d/1wbMlLmnbD_0j7Qs8YY8cSCh935WKKdsP/view?usp=sharing
cd summarization
# Run tfidf summarizer
python run.py --rep tfidf
# Run BERT Summarizer
python run.py --rep bert
cd abstractive_summarization
# Stage1
python abs_sum.py
# Stage2
cd preprocess_abs_sum.py
python preprocess_abs_sum.py
cd specter
python preprocess_ACM.py
./embed.sh
# Train
python train.py
# Train on limited data
python train.py --limit=100
# Load Checkpoint
python train.py --checkpoint=True
# Train for multiple runs after the initial run(s)
python train.py --times=3 --initial_time=1
# Test (assuming that saved weights are present)
python train.py --test=True
Please consider citing our paper if you find this work useful:
@inproceedings{garg-etal-2022-keyphrase,
title = "Keyphrase Generation Beyond the Boundaries of Title and Abstract",
author = "Garg, Krishna and
Ray Chowdhury, Jishnu and
Caragea, Cornelia",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-emnlp.427",
pages = "5809--5821",
abstract = "Keyphrase generation aims at generating important phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches have largely used only the title and abstract of the articles to generate keyphrases. In this paper, we comprehensively explore whether the integration of additional information from the full text of a given article or from semantically similar articles can be helpful for a neural keyphrase generation model or not. We discover that adding sentences from the full text, particularly in the form of the extractive summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the text. Experimental results with three widely used models for keyphrase generation along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FullTextKP for keyphrase generation. Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract. We release the source code at https://github.com/kgarg8/FullTextKP.",
}
Please contact [email protected]
for any questions related to this work.