-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
round-off errors in generating spectra and other parameters #77
Comments
Hi, Thanks for using LibXtract and reporting this issue. LibXtract already includes a check to ensure that if ((temp = XTRACT_SQ(real) + XTRACT_SQ(imag)) >
XTRACT_LOG_LIMIT)
temp = log(temp / NxN);
else
temp = XTRACT_LOG_LIMIT_DB; Where Actually, I think 2e-42 was set when LibXtract was still single-precision, and is certainly a lot higher than the smallest normal double. So... before we start adding epsilon values, we need to look at why you're getting Can you post an example of your source code (and input), which reproduces the issue? Thanks, Jamie |
I see. I automatically assumed that the log function was the source of the problem. Thanks, |
Would you mind sharing a minimum working code fragment that reproduces the problem? Then I can have a go at debugging it. Thanks, Jamie |
This is the feature extraction code. As you can see, it's pretty much the same as simpletest.cpp, except that I read files from a list of files in the following format: #include <iostream>
#include "xtract/libxtract.h"
#include "xtract/xtract_stateful.h"
#include "xtract/xtract_scalar.h"
#include "xtract/xtract_helper.h"
#include <math.h>
#include <fstream>
#include "WaveFile.h"
#define BLOCKSIZE 512
#define MAVG_COUNT 10
#define HALF_BLOCKSIZE 256
#define SAMPLERATE 16000
#define PERIOD 102
#define MFCC_FREQ_BANDS 14
#define MFCC_FREQ_MIN 20
#define MFCC_FREQ_MAX 8000
int main(){
// Initialize mel filters and Windows:
double argd[4] = {0}; // argument array (needed for feature/spectrum estimation)
double spectrum[BLOCKSIZE] = {0};
double mfccs[MFCC_FREQ_BANDS] = {0};
//Allocate Mel filters
xtract_mel_filter mel_filters;
mel_filters.n_filters = MFCC_FREQ_BANDS;
mel_filters.filters = (double **)malloc(MFCC_FREQ_BANDS * sizeof(double *));
for(uint8_t k = 0; k < MFCC_FREQ_BANDS; ++k)
{
mel_filters.filters[k] = (double *)malloc(BLOCKSIZE * sizeof(double));
}
// Initialize mfcc features.
xtract_init_mfcc(BLOCKSIZE >> 1, SAMPLERATE >> 1, XTRACT_EQUAL_GAIN, MFCC_FREQ_MIN, MFCC_FREQ_MAX, mel_filters.n_filters, mel_filters.filters);
// create the window functions
double *window = NULL;
double windowed[BLOCKSIZE] = {0};
window = xtract_init_window(BLOCKSIZE, XTRACT_HANN);
// End of initialization
// Read list of files:
std::ifstream wavList;
wavList.open("path/to/file/wav_test.txt");
std::string line_vals;
std::string delimiter = " ";
int nFiles = 0;
while (std::getline(wavList,line_vals)) {
nFiles++;
// RegEx to separate utt-ID from utt-path.
std::size_t spaceIdx = line_vals.find(delimiter);
std::string wavId = line_vals.substr(0,spaceIdx);
std::string wavPath = line_vals.substr(spaceIdx+1);
// Read audio data from file
WaveFile wavFile(wavPath);
int16_t *wavData = (int16_t *)wavFile.GetData(); // assume 16bit signed integer
std::size_t wavBytes = wavFile.GetDataSize();
uint64_t wavSamples = wavBytes / sizeof(int16_t);
double data[wavSamples];
std::cout << wavSamples << std::endl;
// std::cout << nFiles << std::endl;
for (int n = 0; n < wavSamples; ++n){
data[n] = (double)wavData[n]/(65535);// This value fixes it. Found it off the internet.
}
for (uint64_t n = 0; (n + BLOCKSIZE) < wavSamples; n += HALF_BLOCKSIZE){ // frame and overlap
xtract_windowed(&data[n], BLOCKSIZE, window, windowed);
/* get the spectrum */
argd[0] = SAMPLERATE / (double)BLOCKSIZE;
argd[1] = XTRACT_MAGNITUDE_SPECTRUM;
argd[2] = 0.f; /* DC component - we expect this to zero for square wave */
argd[3] = 0.f; /* No Normalisation */
xtract_init_fft(BLOCKSIZE, XTRACT_SPECTRUM);
xtract[XTRACT_SPECTRUM](windowed, BLOCKSIZE, &argd[0], spectrum);
xtract_free_fft();
/* THIS IS WHERE I MAKE THE CORRECTION TO THE SPECTRUM */
spectrum[255] = 0.0;
/* compute the MFCCs */
xtract_mfcc(spectrum, BLOCKSIZE >> 1, &mel_filters, mfccs);
// check for nans
for (int d = 0; d<MFCC_FREQ_BANDS; d++){
if (isnan(mfccs[d])){
// Add breakpoint here for debugging purposes
xtract_mfcc(spectrum, BLOCKSIZE >> 1, &mel_filters, mfccs);
d = MFCC_FREQ_BANDS;
}
}
}
}
} Thanks, |
Hi Navid, sorry for the slow reply. Would you mind uploading somewhere an example of one of the audio files you're using with this? One of the files that is causing the problem you are experiencing. Then I can debug this and make a fix. Jamie |
Hello,
In addition to thanking you for providing this tool, I've been using your source-code for a while now, I'd like to suggest making a minor adjustment to the source-code with respect to calculating the logarithm of variables. When calculating the logarithm of a double value, if the value is too small, the compiler might round that value to zero. This boosts the output of, say for instance, a log function to inf.
This problem was particularly hard for me to track down, because it only comes up when the signal noise level was very low and the compiler some times rounded off the value of a double data point to zero.
This could be easily resolved by simply adding an epsilon value to all variables before computing the logarithm. The instance I found was in vector.c:
log(temp)
in calculating the signal spectrum.
I'm sure there are other instances as well.
Many thanks,
Dev
The text was updated successfully, but these errors were encountered: