Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlling a sound delivery onset through the sd.OutputStream object for human research #555

Open
mscheltienne opened this issue Sep 3, 2024 · 4 comments

Comments

@mscheltienne
Copy link

mscheltienne commented Sep 3, 2024

Hello @mgeier! First of all, thank you for the continuous development and maintenance of sounddevice, it's amazing 😃

Short version

How could I further improve the callback, argument settings of OutputStream, ... to precisely start delivering an auditory stimuli at a given time for a sounddevice backend used to deliver auditory stimuli in human neuroscience research? Code at the end.


Long version

You might know of Psychtoolbox, PsychoPy, tools to design psychological, behavioral, neurosciences experiments. I work primarily with the later, where delivering an auditory stimuli is very often used to elicit a neurological or behavioral response in a participant. Historically, Psychtoolbox offers an excellent interface to portaudio with which I can deliver an audio stimuli simultaneously with a trigger (an event to mark the audio onset, usually delivered via parallel port) with excellent precision (delay between the event and the audio onset measured at < 1 ms).

In python, this interface is accessible through PsychoPy and its SoundPTB object. PsychoPy also offers other backends, including SoundDeviceSound. Sadly, the other backends do not match the performance of SoundPTB (delay between event marking the sound onset and the actual sound onset non zero and variable). On top of that, PsychoPy development tends to break regularly the most basic features, including sound delivery with their latest 2024.2.0 and 2024.2.1 versions.

Bringing me to today's affairs, I gave a shot in implementing a python audio delivery backend matching SoundPTB performance using sounddevice. The objective is to package this backend independently from PsychoPy to avoid breakage and replace the existing PsychoPy SoundDeviceSound object with a wrapper around this new backend.

The key element which makes the delivery precise in SoundPTB is a scheduling mechanism. Typically, this is how a
sound and a trigger (event) would be delivered:

import psychtoolbox as ptb
from byte_triggers import ParallelPortTrigger
from psychopy.sound.backend_ptb import SoundPTB

# create a 440 Hz sinusoid lasting 200 ms
sound = SoundPTB(value=440.0, secs=0.2)
# create the parallel port trigger object
trigger = ParallelPortTrigger("arduino")

# schedule the sound in 200 ms, wait, deliver the trigger
now = ptb.GetSecs()  # psychtoolbox uses a separate clock
sound.play(when=now + 0.2)
sleep(0.2)  # usually not the default 'time.sleep' function which is too imprecise
trigger.signal(1)  # mark the onset

I tried replicating this scheduling idea with sounddevice, compensating for delays with the time.outputBufferDacTime and time.currentTime value obtained in the callback function, and I got very close. This callback approach compared to a blocking stream.write brings down the delay between the event and the audio onset to ~3 ms ± 1 ms. It is visibly slightly worst than the SoundPTB object, especially with an higher variability.

I'm a bit out of idea on how I could continue to improve this tentative and would love to hear your input on how I could improve the MWE below, how I could set the different OutputStream arguments. Without further due, here is the current working state yielding ~3 ms ± 1 ms.


import time

import numpy as np
import sounddevice as sd
from byte_triggers import ParallelPortTrigger


class Clock:
    def __init__(self) -> None:
        self._t0 = time.monotonic_ns()

    def get_time_ns(self) -> int:
        return time.monotonic_ns() - self._t0


class SoundSD:
    def __init__(self, data: np.ndarray, device: int, sample_rate: int) -> None:
        # store data, device and callback variables
        self._data = data if data.ndim == 2 else data[:, np.newaxis]
        self._clock = Clock()
        self._current_frame = 0
        self._target_time = None
        # create and open the output stream
        self._stream = sd.OutputStream(
            blocksize=0,  # 0, 4, 64, 128 -> no visible differences
            callback=self._callback,
            channels=data.shape[1] if data.ndim == 2 else 1,
            device=device,
            dtype=data.dtype,
            latency="low",
            samplerate=sample_rate,
        )
        self._stream.start()

    def _callback(self, outdata, frames, time_info, status) -> None:
        """Callback audio function."""  # noqa: D401
        if self._target_time is None:
            outdata.fill(0)
            return
        delta_ns = int((time_info.outputBufferDacTime - time_info.currentTime) * 1e9)
        if self._clock.get_time_ns() + delta_ns < self._target_time:
            outdata.fill(0)
            return
        end = self._current_frame + frames
        if end <= self._data.shape[0]:
            outdata[:frames, :] = self._data[self._current_frame : end, :]
            self._current_frame += frames
        else:
            data = self._data[self._current_frame :, :]
            data = np.vstack(
                (
                    data,
                    np.zeros((frames - data.shape[0], data.shape[1]), dtype=data.dtype),
                )
            )
            outdata[:frames, :] = data
            # reset
            self._current_frame = 0
            self._target_time = None

    def play(self, when: float | None = None) -> None:
        """Play the audio data.

        Parameters
        ----------
        when : float | None
            The relative time in seconds when to start playing the audio data. For
            instance, ``0.2`` wil start playing in 200 ms. If ``None``, the audio data
            is played as soon as possible.
        """
        self._target_time = (
            self._clock.get_time_ns()
            if when is None
            else self._clock.get_time_ns() + int(when * 1e9)
        )


if __name__ == "__main__":
    trigger = ParallelPortTrigger("arduino")
    device = sd.query_devices(sd.default.device["output"])
    sample_rate = int(device["default_samplerate"])
    duration = 0.2
    frequency = 440
    times = np.linspace(0, duration, int(duration * sample_rate), endpoint=False)
    data = np.sin(2 * np.pi * frequency * times)
    data /= np.max(np.abs(data))  # normalize
    data = data.astype(np.float32) * 0.1
    sound = SoundSD(data, device["index"], sample_rate)
    sound.play(when=0.5)
    time.sleep(0.5)  # to be replace with an higher precision sleep function
    trigger.signal(1)
    time.sleep(0.3)

Note, I measure the delay between the sound onset and the trigger through an EEG amplifier at 1 kHz. I can increase the sampling rate to 16 kHz if needed.

@mscheltienne
Copy link
Author

And for completion, a version with RawOutputStream: https://github.com/mscheltienne/stimuli/blob/main/stimuli/audio/_backend/sounddevice.py
It's the same logic, and still is visually slightly worse than SoundPTB.

@mscheltienne
Copy link
Author

Looks like I should look into https://github.com/spatialaudio/python-rtmixer

@mgeier
Copy link
Member

mgeier commented Sep 7, 2024

You might know of Psychtoolbox, PsychoPy, tools to design psychological, behavioral, neurosciences experiments.

Yes, I know about them but I'm not an active user.

#37
#35 (comment)
#100 (comment)

psychopy/psychopy#1312
psychopy/psychopy#1327

psychopy/psychopy#2065
psychopy/psychopy#2529
psychopy/psychopy#4673

The sounddevice backend was added to PsychoPy around 2016, but apparently the latency wasn't good enough ("massive sound delay and jitter") and they added the PTB backend in 2019 which was much better.

I just saw that there was even a "mega-study": https://peerj.com/articles/9414/

compensating for delays with the time.outputBufferDacTime and time.currentTime value obtained in the callback function

I have the feeling that those values can be quite inaccurate, maybe depending on host API, drivers and physical hardware.

See also https://stackoverflow.com/questions/78869667/sounddevice-in-python-time-vector-of-input-and-output-signal-synchronized and #148 (comment).

This callback approach compared to a blocking stream.write brings down the delay

Yes, the blocking API is definitely bad for jitter, since it quantizes the start times to block boundaries (AFAIK).

brings down the delay between the event and the audio onset to ~3 ms ± 1 ms. It is visibly slightly worst than the SoundPTB object, especially with an higher variability.

The variability might be due to the inaccurate time.outputBufferDacTime, but also due to the Python interpreter itself.

And for completion, a version with RawOutputStream

I'm not sure how much overhead is caused by the NumPy conversions, but if you don't need NumPy arrays, it's probably better to use the "raw" streams.

Looks like I should look into https://github.com/spatialaudio/python-rtmixer

Yeah, that's what I was suggesting in my conversation with the PsychoPy devs back in the day, e.g. psychopy/psychopy#2065 (comment).

Note that it is still quite experimental (many TODOs: https://github.com/search?q=repo%3Aspatialaudio%2Fpython-rtmixer+TODO%3A&type=code) and there wasn't much development happening in recent times, but if it seems promising to you, I'm willing to put some more work into it.

@mgeier
Copy link
Member

mgeier commented Sep 7, 2024

BTW, the original idea of the rtmixer module was to provide an example how to use a callback function implemented in C which can communicate with Python code. The intention was not to provide a fully-featured library itself (but it might still evolve into something like this).

What I want to say with this, is that maybe you should consider writing your own callback function in C (maybe using the rtmixer code as inspiration) and further C code that can interface with your parallel port trigger device. The goal would be to have no Python code between the trigger and whatever happens in the audio callback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants