Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

Setup

Create environment

conda env create -f environment.yml
conda activate salsa
pip install -e 'git+https://github.com/kkoutini/[email protected]#egg=hear21passt'
CFLAGS='-O3 -march=native' pip install https://github.com/f0k/minimp3py/archive/master.zip

Setup data environment

The default data directory is ~/shared ( can be changed via the directories.data_dir flag). The data sets in this folder should follow this structure:

clotho_v2
- clotho_captions_{development,evaluation,validation}.csv
- clotho_metadata_{development,evaluation,validation}.csv
- {development,evaluation,validation} folders with audio files
audioset
- download custom repository
- download the audiocaps files into the corresponding directory
audiocaps
- download the GitHub repository
wavecaps
- json_files
- download audio files from hugging face info corresponding folders
tmp
- this is where the compressed audio data sets will go
audio_retrieval, (project name)
- this is for model checkpoints, etc.
clotho_gpt
- copy from this repository

Minimal Example

cd src
conda activate salsa

python -m experiments.audio_retrieval.train with data_loader.batch_size=64 data_loader.batch_size_eval=32 audio_loader.max_audio_length=30 audio_features.segment_length=10 audio_features.name=passt sentence_features.model=bert-base-uncased initial_tau=0.01 s_patchout_f=2 s_patchout_t=15 lr=2e-5 min_lr=1e-7 rampdown_type=cosine max_epochs=16 rampdown_stop=15 warmup_length=1 rampdown_start=1 audio_features.adopt_n_layers=0 sentence_features.adopt_n_layers=0 train_on=clothov2 load_parameters=None gpt_augment_p=0.0

More examples can be found in the script folder.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
clotho_gpt		clotho_gpt
scripts		scripts
src		src
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

Setup

Create environment

Setup data environment

Minimal Example

About

Releases

Packages

Languages

OptimusPrimus/dcase2023_task6b

Folders and files

Latest commit

History

Repository files navigation

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

Setup

Create environment

Setup data environment

Minimal Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages