Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking to optimize precision of 'frame.off_start' #59

Open
mungewell opened this issue Jul 23, 2020 · 12 comments
Open

Looking to optimize precision of 'frame.off_start' #59

mungewell opened this issue Jul 23, 2020 · 12 comments

Comments

@mungewell
Copy link

I have been working with the sister project LTC-tools to feed an externally generated LTC signal into NPT/Chrony, with pretty good success.... Here's some 30NDF analysed by Chrony
better3

What is clear in the image is the horizontal banding, with the bands being ~20us apart (this is the 48KHz sample rate). It would seem that perhaps the integer nature of 'frame.off_start' is causing some jitter in the reported timing. Even though the timing is precise enough to record the variance, I am getting a swing of ~10 samples in detected 'off_start'.

I have looked at 'decoder.c' and it's not really clear to me how it 'finds' the start of the frame.

Dumping the biphase info I get:

00-00-00 +0000 03:22:31:13
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 20.000000, 
:267, 842, 1600.000000

00-00-00 +0000 03:22:31:14
19.999060, 19.999060, 19.999296, 20.499472, 19.874603, 19.874603, 19.905952, 19.905952, 
19.929464, 19.929464, 19.947098, 19.947098, 19.960323, 19.960323, 19.970243, 19.970243, 
19.977682, 20.483261, 19.862446, 19.896835, 19.922626, 19.922626, 19.941969, 19.941969, 
19.956476, 19.956476, 19.967358, 19.967358, 19.975519, 19.975519, 19.981638, 19.981638, 
19.986229, 19.986229, 19.989672, 19.989672, 19.992254, 19.992254, 19.994190, 19.994190, 
19.995644, 19.995644, 19.996733, 19.996733, 19.997549, 19.997549, 19.998161, 19.998161, 
19.998621, 19.998621, 19.998966, 19.998966, 19.999224, 19.999418, 19.999563, 19.999672, 
19.999754, 20.499817, 19.874863, 19.906147, 19.929609, 19.947206, 19.960405, 19.970304, 
19.977728, 20.483295, 19.862473, 19.896854, 19.922640, 19.941980, 19.956486, 20.467363, 
19.850523, 19.887892, 19.915918, 19.936939, 19.952705, 19.952705, 19.964529, 20.473396, 
:843, 392, 1599.809204

The last line is 'off_start', 'off_end' and the sum of the biphase bit timing. Typically I see all '20.0000', some frames show different values.

Does anyone have suggestions on how this might be improved (less variance), or where to look at the code?

@x42
Copy link
Owner

x42 commented Jul 23, 2020

While it would be possible to get sub-sample accuracy to align each frame, it is not really useful the case at hand. The absolute position information of frames is set by the sample-clock (either sample-count in a file or the tick from the soundcard). That is an integer count at the given sample-rate and the only valid time-domain in this case.

I suggest to use a DLL if you need higher precision or to filter time to align different clock-domains.

To answer your other question. The start-offset is calculated relative to the end of previous frames-end

if (FIRST_TIME)
  d->frame_start_off = posinfo - d->snd_to_biphase_period;
else
  d->frame_start_off = d->frame_start_prev;

...

d->frame_start_off += ceil(d->snd_to_biphase_period); // for each received bit

@x42
Copy link
Owner

x42 commented Jul 23, 2020

Would you mind putting labels on the axis on your graph? What exactly am I looking at?

@mungewell
Copy link
Author

Sorry, I should have been clearer/more detail in my post.... The question arises from the work I been doing here
x42/ltc-tools#17

My objective would be to reduce the variance/jitter in the reported 'off_start', which I use to calculate the 'received timestamp' which is sent to Chrony.

The plot is shows data from Chrony's 'refclock.log' file, Y-Axis is 'Cooked Time' vs sample number (effectively Time). 'LTC' is the name of the RefClock input, as is being tracked by Chrony as the data is logged - so should be no drift in value.

For example:

$ grep 'LTC' refclocks.log | grep -v -e '- N -' > refclocks.dat
$ gnuplot

	G N U P L O T
	Version 5.2 patchlevel 2    last modified 2017-11-01 

gnuplot> set yrange [-0.00025:0.00025]
gnuplot> plot "refclocks.dat" using 0:8
gnuplot>
$ head refclocks.log 
===============================================================================
   Date (UTC) Time         Refid  DP L P  Raw offset   Cooked offset      Disp.
===============================================================================
2020-07-21 02:53:29.041315 LTC    12 N 0 -4.131500e-02 -4.131500e-02  5.000e-08
2020-07-21 02:53:30.003900 LTC    13 N 0 -3.900000e-03 -3.900000e-03  5.000e-08
2020-07-21 02:53:31.003832 LTC    14 N 0 -3.832000e-03 -3.832000e-03  5.000e-08
2020-07-21 02:53:32.003968 LTC    15 N 0 -3.968000e-03 -3.968000e-03  5.000e-08
2020-07-21 02:53:31.003933 LTC     - N -       -       -3.934000e-03  4.808e-05
2020-07-21 02:53:33.003877 LTC     0 N 0 -3.877000e-03 -3.877000e-03  5.000e-08
2020-07-21 02:53:34.003854 LTC     1 N 0 -3.854000e-03 -3.854000e-03  5.000e-08

@mungewell
Copy link
Author

One thought I had in the early hours, was that this question is specific to a continuous LTC stream which should be of consistent speed/frequency. Perhaps LTCLib can have a mode where it knows speed is constant and tries hard to reduce jitter on timing measurements.

My hardware (DigiDesign SyncIO) does not set the 'date' bit within the stream, but others may and this may be a way to enable behaviour

        int use_date = !no_date && frame.ltc.binary_group_flag_bit0 == 0
                                && frame.ltc.binary_group_flag_bit2 == 1;

@x42
Copy link
Owner

x42 commented Jul 23, 2020

Perhaps LTCLib can have a mode where it knows speed is constant and tries hard to reduce jitter on timing measurements.

This is not in the scope of the library. The purpose of libltc is en/decoding of the biphase encoded signal. The library itself does not concern itself with timing per se.

@mungewell
Copy link
Author

I have been digging into some test data, one thing I note is that the reported volumes for packets are change quite a bit (ranging -0.454496 to -3.171338 in the course of a minute), which for a 'constant setup' should not really be true.

Looking at a previous recording of 30fps (48KHz) output of my SyncIO I see that the levels are mostly constant and (as per SMPTE spec) the transitions are slew'ed appropriately.
sync_io_slew

Looking at the code for the decoder:
https://github.com/x42/libltc/blob/master/src/decoder.c#L288

It does notable things:
1). Min/Max points 'decay' toward CENTER with 15/16ths scaling. This is independent of sample frequency.
2). Min/Max sample are assessed on a single sample, as the sample block is being decoded.
3). Min/Max thresholds are set 8/16ths CENTER to min/max points, meaning at 25% and 75% sample swing. But since min/max points are decaying this means they change as the state/level sits at high or low.

Hysteresis is good but delay's reference point/sample.

Because of the (deliberate) slew in the waveform the trigger timing will be affected by the chosen trigger thresholds. Given my observation on changing volume, could it be that this is causing 'wobble' in the timing???

I am away from test hardware at the moment, but I would suggest that min/max samples are assessed on a rolling average so that single glitch does not disrupt the decoding and that the decay be made a longer time constant (perhaps frequency dependent).

It may also be worthwhile to ensure that the sample is AC balanced.

@x42
Copy link
Owner

x42 commented Jul 24, 2020

Manually interpreting a band-limited signal by looking at individual spaced samples is not very useful in general.
What is the eventual goal here? For audio/video sync libltc is already far more than sufficiently precise already.

I don't know how Chrony comes into play but it seems you have two different time-domains (as opposed to just one like during A/V postprod). What is your use-case and how is the current precision not sufficient?

@mungewell
Copy link
Author

Ack that LibLTC probably meets most application uses, to align video frame you need to be +-~15ms.

My interest is a DIY 'Tenticle Sync'; a box which can align to LTC stream and then maintain frequency reference when disconnected. I set myself target of < +-100us whilst connected to reference, and drift of < +-15ms for 8hr after disconnect. Which would require in the order of 1ppm clock accuracy. [Plus I am nerd'ing out over stuff that I find interesting.]

I am not suggesting that you change code specifically for my particular stream/needs; but it does seems that there's something wrong in the code when the reported volume (frame.volume) oscillates so wildly when it should be somewhat constant.

I also understand concerns/fears over changing code and having it break somebody else's application....

@mungewell
Copy link
Author

I spent the evening playing around with samples - as mentioned before I am away from my test setup. Anyway, the short story is that I think there may be a hardware fault with my SyncIO.

No matter what I tried with LTC generated files I could not recreate the 'changing volume' effect, however I could see this while playing back a recording of my SyncIO I made some time ago. These samples were recorded directly onto a hand held recorder, no PC in the mix.

Its hard to tell when looking at the 'bad' file in Audactiy, but I think that this might have some digital noise on it.... If I use Sox to deliberately clip the file (and then bring levels back to a similar level, with some added slew) the 'changing volume' effect is greatly reduced.

$ sox 191214_0100_30nodrop.wav clipped.wav gain 50 gain -7.5 lowpass -2 8k
sox WARN gain: gain clipped 28966854 samples; decrease volume?

This plot show the min/max levels report by LibLTC before and after the 'repair'.
filtered

I'll run a full check on the hardware once I get back to it.

@x42
Copy link
Owner

x42 commented Jul 25, 2020

Ack that LibLTC probably meets most application uses, to align video frame you need to be +-~15ms.

Since it's a serf-clocking signal, the accuracy is ~1/2kHz. With a 2nd order PLL to track the phase it can be even more accurate. If you use 2 different clock domains you have to do that anyway.

We've tested synchronizing with Ardour, using a sound-card that is not word-clock synced, and recovering the clock using a DLL in software, then re-generating the signal and comparing using an analog scope. The accuracy is round 25 usec:

image

(yellow is the original analog signal, blue the re-generated one with the Gibbs effect being visible). Long term jitter measurements show a difference smaller than 2 audio samples.

@mungewell
Copy link
Author

I tried multiple hardware combinations, nothing shows improvement. Your comment about clock domains 'hit a nerve' so now my SyncIO is driving LTC and 48K super-clock to a DigiDesign 888, which digitizes to SPDIF fed to a USB sound card.

Again the "Refclocks" plot shows multiple (6ish) bands of points which are 1 sample clock apart....

Taking a step back I wanted to confirm that this is not a artifact of some code I added. The (temporary) patch below grabs each audio block from Jack and for each timecode packet sent the NTP/Chrony are dumped to file.
dump_file.patch.txt

This dumps files with 'timecode' and 'off_start' encoded in the filename. The zero length ones are because 'off_start' is larger than 'off_end' meaning that the original sample block has be replaced... so not correct to plot.

$ ls -al *.bin
-rw-rw-r-- 1 simon simon 16384 Jul 27 18:52 03_17_43_43_001494.bin
-rw-rw-r-- 1 simon simon     0 Jul 27 18:52 03_17_43_43_003225.bin
-rw-rw-r-- 1 simon simon 16384 Jul 27 18:52 03_17_44_44_002069.bin
-rw-rw-r-- 1 simon simon 16384 Jul 27 18:52 03_17_45_45_000920.bin
-rw-rw-r-- 1 simon simon     0 Jul 27 18:52 03_17_46_46_003860.bin
-rw-rw-r-- 1 simon simon     0 Jul 27 18:53 03_17_47_47_002711.bin
-rw-rw-r-- 1 simon simon 16384 Jul 27 18:53 03_17_48_48_001555.bin
-rw-rw-r-- 1 simon simon 16384 Jul 27 18:53 03_17_49_49_000406.bin
-rw-rw-r-- 1 simon simon     0 Jul 27 18:53 03_17_50_50_003346.bin
-rw-rw-r-- 1 simon simon 16384 Jul 27 18:53 03_17_51_51_002197.bin

These can be plotted, I use gnuplot as follows:

$ gnuplot

	G N U P L O T
	Version 5.2 patchlevel 2    last modified 2017-11-01 

gnuplot> plot "perfect_ltc.ref" binary format="%f" using ($0-1024):1 with lines title "reference LTC"
gnuplot> replot "03_17_58_58_002318.bin" binary format="%f" using ($0-2318):1 with lines notitle
gnuplot> replot "03_17_56_56_000528.bin" binary format="%f" using ($0-528):($1*-1) with lines notitle
gnuplot> replot "03_17_55_55_001683.bin" binary format="%f" using ($0-1683):1 with lines notitle
gnuplot> set xrange [-50:50]
gnuplot> replot

This plot clearly confirms that the waveforms from SyncIO are be interpreted slightly offset, the zero on X axis is/should be the start of the first bit. The value of 'off_start' jitters around by a few samples.... don't yet have an explanation :-(
offset_start

You might also notice that I inverted one of those, it seems that SyncIO does not drive the "Polarity correction bit". Don't know if that affects anything.

@mungewell
Copy link
Author

Improved the debug tools, to plot as individual 'strips' as this is easier to read.
dump_files.zip

Usage

$ rm *.bin
$ ../jltcntp  -u 0 -v -f 30 jackplay:out_1 > dump.txt
$ python ../build_plots.py > plot.gp 
$ gnuplot plot.gp

And this gives plots like, where the "off_start" for "03-17-49-49" is several samples too early. Again I am seeing ~7 sample spread with LTC from my SyncIO. This is WITHOUT the changes made in patch for Bug 17.
late_edge

00-00-00 +0000 03:17:48:29
:3865, 1362, -2503
>97, 172, -10.629578

00-00-00 +0000 03:17:49:00 ==> Tue Jul 28 03:17:49 2020
Writing to file: 03-17-49-49.1363.bin
19.959995, 20.469997, 19.852497, 19.852497, 19.889374, 19.889374, 19.917030, 19.917030,
19.937773, 19.937773, 19.953329, 19.953329, 19.964996, 19.964996, 19.973747, 19.973747,
19.980310, 19.985233, 20.488924, 20.366693, 20.275021, 20.275021, 19.956264, 19.956264,
19.967199, 19.967199, 19.975399, 19.975399, 19.981550, 19.981550, 19.986162, 19.986162,
19.989622, 19.989622, 19.992216, 19.992216, 19.994162, 19.994162, 19.995621, 19.995621,
19.996716, 19.996716, 19.997538, 19.997538, 19.998154, 19.998154, 19.998615, 19.998615,
19.998960, 19.998960, 19.999220, 19.999220, 19.999416, 20.499561, 20.374672, 20.281004,
20.210752, 20.158064, 19.618547, 20.213911, 20.160433, 20.120325, 20.090244, 20.067682,
19.550762, 19.663071, 20.247303, 20.185478, 20.139109, 20.104332, 20.078249, 19.558687,
19.669014, 20.251760, 20.188820, 20.141615, 20.106213, 20.106213, 19.829659, 20.372244,
=1602.420410
:1363, 2962, 1599
>97, 171, -10.746169

00-00-00 +0000 03:17:49:01
:2963, 472, -2491
>85, 181, -8.485379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants