-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Icefall-based, Transducer ASR Model #2339
Conversation
Hi @HSTEHSTEHSTE Thank you for your pull request! Could you please target one of the dev branches, either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
|
||
| Repository link: https://github.com/k2-fsa/icefall/tree/master | ||
""" | ||
import ast |
Check notice
Code scanning / CodeQL
Unused import
| Repository link: https://github.com/k2-fsa/icefall/tree/master | ||
""" | ||
import ast | ||
from argparse import Namespace |
Check notice
Code scanning / CodeQL
Unused import
|
||
import numpy as np | ||
|
||
from art import config |
Check notice
Code scanning / CodeQL
Unused import
features, _, _ = self.transform_model_input(x=masked_adv_input[sample_index]) | ||
x_lens = torch.tensor([features.shape[1]]).to(torch.int32).to(self.device) | ||
y = k2.RaggedTensor(original_output[sample_index]) | ||
loss = self.transducer_model(x=features, x_lens=x_lens, y=y) |
Check notice
Code scanning / CodeQL
Unused local variable
raise ValueError("This estimator does not support `postprocessing_defences`.") | ||
|
||
# Set cpu/gpu device | ||
self._device = torch.device("cpu") |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class
isnan = torch.isnan(x[i]) | ||
nisnan = torch.sum(isnan).item() | ||
if nisnan > 0: | ||
logging.info("input isnan={}/{} {}".format(nisnan, x[i].shape, x[i][isnan], torch.max(torch.abs(x[i])))) |
Check warning
Code scanning / CodeQL
Unused argument in a formatting call
import ast | ||
from argparse import Namespace | ||
import logging | ||
from typing import Dict, List, Optional, Tuple, TYPE_CHECKING, Union |
Check notice
Code scanning / CodeQL
Unused import
from art import config | ||
from art.estimators.pytorch import PyTorchEstimator | ||
from art.estimators.speech_recognition.speech_recognizer import SpeechRecognizerMixin, PytorchSpeechRecognizerMixin | ||
from art.utils import get_file |
Check notice
Code scanning / CodeQL
Unused import
vtln_warp: float = 1.0 | ||
|
||
params = asdict(FbankConfig()) | ||
params.update({"sample_frequency": 16000, "snip_edges": False, "num_mel_bins": 23}) |
Check failure
Code scanning / CodeQL
Modification of parameter with default
|
||
params = asdict(FbankConfig()) | ||
params.update({"sample_frequency": 16000, "snip_edges": False, "num_mel_bins": 23}) | ||
params["frame_shift"] *= 1000.0 |
Check failure
Code scanning / CodeQL
Modification of parameter with default
for line in lexicon_file: | ||
if len(line.strip()) > 0: # and '<UNK>' not in line and '<s>' not in line and '</s>' not in line: | ||
vocab_size += 1 | ||
except: |
Check notice
Code scanning / CodeQL
Empty except
if len(line.strip()) > 0: | ||
word2id[line.split()[0]] = id | ||
id += 1 | ||
except: |
Check notice
Code scanning / CodeQL
Empty except
for line in lexicon_file: | ||
if len(line.strip()) > 0: # and '<UNK>' not in line and '<s>' not in line and '</s>' not in line: | ||
vocab_size += 1 | ||
except: |
Check notice
Code scanning / CodeQL
Except block handles 'BaseException'
if len(line.strip()) > 0: | ||
word2id[line.split()[0]] = id | ||
id += 1 | ||
except: |
Check notice
Code scanning / CodeQL
Except block handles 'BaseException'
# Set cpu/gpu device | ||
self._device = torch.device("cpu") | ||
if torch.cuda.is_available(): | ||
self._device = torch.device("cuda", 0) |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class
Signed-off-by: Xinyuan Li <[email protected]>
…ple Jupyter notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks for implementing the new ASR model. I left a few comments on things that should be addressed.
# Set cpu/gpu device | ||
self._device = torch.device("cpu") | ||
if torch.cuda.is_available(): | ||
self._device = torch.device("cuda", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already defined in the PyTorchEstimator
superclass and is redundant. Hence the CodeQL warning. This segment should be removed
isnan = torch.isnan(x[i]) | ||
nisnan = torch.sum(isnan).item() | ||
if nisnan > 0: | ||
logging.info("input isnan={}/{} {}".format(nisnan, x[i].shape, x[i][isnan], torch.max(torch.abs(x[i])))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging.info("input isnan={}/{} {}".format(nisnan, x[i].shape, x[i][isnan], torch.max(torch.abs(x[i])))) | |
logging.info(f"input isnan={nisnan}/{x[i][isnan]} {torch.max(torch.abs(x[i]))}") |
Use f-strings to avoid confusion with string formatting
@dataclass | ||
class FbankConfig: | ||
# Spectogram-related part | ||
dither: float = 0.0 | ||
window_type: str = "povey" | ||
# Note that frame_length and frame_shift will be converted to milliseconds before torchaudio/Kaldi sees them | ||
frame_length: float = 0.025 | ||
frame_shift: float = 0.01 | ||
remove_dc_offset: bool = True | ||
round_to_power_of_two: bool = True | ||
energy_floor: float = 1e-10 | ||
min_duration: float = 0.0 | ||
preemphasis_coefficient: float = 0.97 | ||
raw_energy: bool = True | ||
|
||
# Fbank-related part | ||
low_freq: float = 20.0 | ||
high_freq: float = -400.0 | ||
num_mel_bins: int = 40 | ||
use_energy: bool = False | ||
vtln_low: float = 100.0 | ||
vtln_high: float = -500.0 | ||
vtln_warp: float = 1.0 | ||
|
||
params = asdict(FbankConfig()) | ||
params.update({"sample_frequency": 16000, "snip_edges": False, "num_mel_bins": 23}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataclass seems a bit extraneous. What's the issue with just using a dictionary?
features, _, _ = self.transform_model_input(x=masked_adv_input[sample_index]) | ||
x_lens = torch.tensor([features.shape[1]]).to(torch.int32).to(self.device) | ||
y = k2.RaggedTensor(original_output[sample_index]) | ||
loss = self.transducer_model(x=features, x_lens=x_lens, y=y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the loss not also be returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially assumed so too but apprently not (if the other estimator implementations are anything to go by)
.github/workflows/ci-icefall.yml
Outdated
sudo apt-get update | ||
sudo apt-get -y -q install ffmpeg libavcodec-extra | ||
python -m pip install --upgrade pip setuptools wheel | ||
pip3 install -r requirements_test.txt | ||
apt-get update \ | ||
&& apt-get install -y \ | ||
libgl1-mesa-glx \ | ||
libx11-xcb1 \ | ||
git \ | ||
gcc \ | ||
mono-mcs \ | ||
libavcodec-extra \ | ||
ffmpeg \ | ||
curl \ | ||
libsndfile-dev \ | ||
libsndfile1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this is done as two apt install
steps? Can't they be combined into one?
indices = torch.LongTensor(indices) | ||
num_frames = torch.IntTensor([num_frames[idx] for idx in indices]) | ||
start_frames = torch.zeros(len(x), dtype=torch.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indices = torch.LongTensor(indices) | |
num_frames = torch.IntTensor([num_frames[idx] for idx in indices]) | |
start_frames = torch.zeros(len(x), dtype=torch.int) | |
indices = torch.tensor(indices, dtype=torch.int64, device=self._device) | |
num_frames = torch.tensor([num_frames[idx] for idx in indices], dtype=torch.int32, device=self._device) | |
start_frames = torch.zeros(len(x), dtype=torch.int32, device=self._device) |
Let's use the modern torch tensor instantiation on the proper device for optimal performance.
return self._input_shape # type: ignore | ||
|
||
@property | ||
def model(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def model(self): | |
def model(self) -> "torch.nn.Module": |
Missing return type
Signed-off-by: Xinyuan Li <[email protected]>
Signed-off-by: Xinyuan Li <[email protected]>
…link. Signed-off-by: Xinyuan Li <[email protected]>
This is a preliminary pull request, containing the ART-adaptation for Icefall-based ASR Models