Skip to content

OptimusPrimus/dcase2023_task6b

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

Setup

Create environment

conda env create -f environment.yml
conda activate salsa
pip install -e 'git+https://github.com/kkoutini/[email protected]#egg=hear21passt'
CFLAGS='-O3 -march=native' pip install https://github.com/f0k/minimp3py/archive/master.zip

Setup data environment

The default data directory is ~/shared ( can be changed via the directories.data_dir flag). The data sets in this folder should follow this structure:

  • clotho_v2
    • clotho_captions_{development,evaluation,validation}.csv
    • clotho_metadata_{development,evaluation,validation}.csv
    • {development,evaluation,validation} folders with audio files
  • audioset
    • download custom repository
    • download the audiocaps files into the corresponding directory
  • audiocaps
  • wavecaps
  • tmp
    • this is where the compressed audio data sets will go
  • audio_retrieval, (project name)
    • this is for model checkpoints, etc.
  • clotho_gpt
    • copy from this repository

Minimal Example

cd src
conda activate salsa

python -m experiments.audio_retrieval.train with data_loader.batch_size=64 data_loader.batch_size_eval=32 audio_loader.max_audio_length=30 audio_features.segment_length=10 audio_features.name=passt sentence_features.model=bert-base-uncased initial_tau=0.01 s_patchout_f=2 s_patchout_t=15 lr=2e-5 min_lr=1e-7 rampdown_type=cosine max_epochs=16 rampdown_stop=15 warmup_length=1 rampdown_start=1 audio_features.adopt_n_layers=0 sentence_features.adopt_n_layers=0 train_on=clothov2 load_parameters=None gpt_augment_p=0.0

More examples can be found in the script folder.

About

CP-JKU's Task6b Submission to DCASE2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published