All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
@ 2013-07-24  2:54 Smilen Dimitrov
  2013-07-24 13:03 ` Alan Horstmann
  2013-07-24 18:30 ` [Audacity-devel] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops Richard Ash
  0 siblings, 2 replies; 16+ messages in thread
From: Smilen Dimitrov @ 2013-07-24  2:54 UTC (permalink / raw
  To: alsa-devel; +Cc: portaudio, audacity-devel

Hi list(s),

Apologies in advance for the longish exposition - however, I have bumped into a problem with ALSA driver development, which raised some questions that I cannot really understand at the moment. I find it especially frustrating that I cannot formulate a simple question, but instead I have to resort to test code, captures and plots, to discuss my (mis)conceptions through those - so I hope at least someone will bear with this wall of text (some 23KB plaintext), and I'll be able to get some help with this.

I originally started working on a 16-bit, 44.1 kHz stereo ALSA driver for a device; and came to a point where I could achieve (what I thought was) full-duplex by running `arecord` and `aplay`, in separate shells, without any problems. However, if I tried to do the same from `audacity` - that is, have "record" running, while another track plays on the same card, with "Audacity Preferences/Recording/Overdub: Play other tracks while recording new one" enabled - then I'd experience some strange drops in the capture, which were not reported by any debug messages, even after I rebuilt and used PortAudio with debug messages on (my eventual goal is to have this experiment running in Audacity without drops).

(Therefore CC to audacity-devel, since I first started looking for "audacity full duplex drops" and couldn't find much; and CC to portaudio since some of my questions depend on correct understanding of PortAudio).


Anyways, I think I managed to reconstruct the drop problem I get (with my device driver and `audacity`), by using a modified command line program from the tests in the PortAudio distribution, and a modified `dummy` driver from the ALSA distribution. The complete code and test scripts are available in this directory:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/

I ran this on Ubuntu 11.04 Natty, Linux 2.6.38 and corresponding ALSA 1.0.24.2, audacity-1.3.13, PortAudio V19 - more about my machine and environment setup is in the Readme:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/Readme

In that directory there is a script, `run-alsa-pa-tests.sh`; which compiles: the ALSA driver from `dummy-2.6.32-patest.c`; and the PortAudio program from `patest_duplex_wire.c`; with different compile-time defines - and then runs the program, obtaining verbose logs (some timestamped through `ftrace`), which end up as .csv files. The .csv files are finally plotted in the time domain using a gnuplot script, `traceLogGraph.gp`, through a batch script `batch_traceLogFile.sh`. Note you may have to change some hardcoded values in the source codes, if you want to run this collection on your machine - see the Readme for more.

The file `captures01-04.tar.gz` contains complete logs from four runs of `run-alsa-pa-tests.sh` (NB: +70MB expanded), with respective `traceLogGraph.gp` gnuplot scripts that allow for reconstruction of all .png images; while the `captures_0*/` directories contain only some .pngs (those that will be referenced here).


Back to the problem: once I started working with `patest_duplex_wire.c`, I realised there are actually two ways to implement "full-duplex", understood simply as "playback and record at the same time", in PortAudio:
* By using two separate `PaStream*`s, with two `Pa_OpenStream`s and `Pa_StartStream`s, started one right after the other - and using their separate "record" (capture) and "play" callbacks
* By using a single `PaStream*`, which correspondingly has only one callback ("wire" callback), which handles memory copying in both playback and capture directions

In Audacity, as far as I could see, there is only one PortAudio callback defined (audacity-1.3.13/src/AudioIO.cpp has audacityAudioCallback;), so Audacity basically has a single "wire" callback for full-duplex. Because of that, both Audacity and `patest_duplex_wire.c` (when using a single stream/"wire" callback), can enter this section of portaudio-v19/src/hostapi/alsa/pa_linux_alsa.c:

    if( !xrun )
    {
        /* Get the number of available frames for the pcms that are marked ready.
         * @concern FullDuplex If only one direction is marked ready (from poll), the number of frames available for
         * the other direction is returned. Output is normally preferred over capture however, so capture frames may be
         * discarded to avoid overrun unless paNeverDropInput is specified.
         */
        int captureReady = self->capture.pcm ? self->capture.ready : 0,
            playbackReady = self->playback.pcm ? self->playback.ready : 0;
        PA_ENSURE( PaAlsaStream_GetAvailableFrames( self, captureReady, playbackReady, framesAvail, &xrun ) );

        if( self->capture.pcm && self->playback.pcm )
        {
            if( !self->playback.ready && !self->neverDropInput )
            {
                /* Drop input, a period's worth */
                PA_MDEBUG(( "%s: full-duplex (not xrun): Drop input, a period's worth - fra:%lu \n", __FUNCTION__, *framesAvail )); //added
                assert( self->capture.ready );
                PaAlsaStreamComponent_EndProcessing( &self->capture, PA_MIN( self->capture.framesPerBuffer,
                            *framesAvail ), &xrun );
                *framesAvail = 0;
                self->capture.ready = 0;
            }
        }
        else if( self->capture.pcm )
            assert( self->capture.ready );
        else
            assert( self->playback.ready );
    }
...

... which is precisely the section that seems to be causing the full-duplex capture drops I experience. I think I tried `neverDropInput` with `patest_duplex_wire.c`, and it kept segfaulting - but in any case, I don't think Audacity (at least this version) has any UI to manipulate this parameter; so I'd rather code the ALSA driver properly, so PortAudio thinks it's getting a proper full-duplex stream from it.

However, this is where I get stuck - because all I can see in the code snippet are boolean conditions. Thus I cannot see what the condition is to enter this segment - in terms of variables visible from the ALSA driver (like dpcm->pcm_buf_pos, or snd_pcm_capture_avail/_hw_avail(substream->runtime)). In other words, I'd like to know what sort of rule should the ALSA driver variables observe, so the above "drop input" section is never entered when using a single PortAudio stream/"wire" callback. This is the reason why I tried adding some extra debugging statements to PortAudio (see `audacity-portaudio.patch` in the online folder), to obtain logs through `ftrace` - in hope that I'd get accurate timestamps from both user- and kernel- space, against which variable values from logs could be plott
 ed. In brief, some of the variables on the plots are:

* frabCC/frabPC = framesAvail (Capture/Playback) in bytes from CallbackThreadFunc (pa_linux_alsa.c)
* frabCW/frabPW = framesAvail (Capture/Playback) in bytes from PaAlsaStream_WaitForFrames (pa_linux_alsa.c)
* frgbtc/frgbtp = framesGot (Capture/Playback) in bytes total (cumulative); CallbackThreadFunc (pa_linux_alsa.c)
* pav/cav/phwav/chwav = snd_pcm_playback/capture_avail/_hw_avail(substream->runtime) in bytes (dummy-2.6.32-patest.c)
* aptbC/aptbP/hptbC/hptbP = (Capture/Playback) substream->runtime->control->appl_ptr/hw_ptr in bytes (dummy-2.6.32-patest.c)

[ When a single PortAudio "wire" callback is used, only frabCC, frabCW and frgbtc appear (there are no separate playback variants then) - however, the driver still has separate timer functions/tasklets for each direction (playback or capture). ]

The original `dummy` driver has code for both Linux kernel system timers, and for high-resolution timers; the driver builds on my system automatically with hrtimers, which is what I'm interested in. Initially, I started by working with hrtimer functions set to a period of one jiffy: since HZ in my OS is defined as 250, one jiffy is 1/HZ = 1/250 = 4 ms - and then I calculated bytes per period as total bytes per second 44100*2*2 = 176400 divided by HZ, which gives 176400/250 = 705.6 ; and then I tried to increase dpcm->pcm_buf_pos with 705 each time the hrtimer tasklet was called (those captures have folder names ending with `_jif`). That results in a behavior like this:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_02_jif/trace_patest__14_DF_pr_512.csv_01.png

That is a capture with driver debug tracing (D), fixed bytes per period in driver (F), playback+record callbacks in `patest_duplex_wire.c` (pr), with request of 512 FRAMES_PER_BUFFER (512). What I find surprising here, is that driver's `chwav` (and `cav`) stay mostly constant, "resetting" themselves in just three "hits" of the timer/tasklet after 0.12 sec; while `pav` slowly raises (and `phwav` slowly falls) - until a point after 0.12 sec, where they "reset" themselves in just one "hit" of the timer function. Furthermore, `frabPC`, `frabCC` and `frabPW` also follow the `pav` line (indicating that some frames were left 'avail', after the requested 512 frames were 'got' in PA's `CallbackThreadFunc`). For a while I thought that this slow "raise" was the cause for the drop - but it isn't, as t
 hat particular capture doesn't display a drop (and it cannot either, because it doesn't use a single "wire" callback)

Then, I thought that the reason for the `pav` rising slope could be, that I'm taking too much time processing in the driver callbacks; but that can't be it, because the capture above is obtained with the "snd-dummy" driver - which does pretty much nothing in its hrtimer/takslet function, but increasing the pcm_buf_pos counters (and memsetting a couple of bytes in the capture direction); I cannot really see that as a CPU hog?! Then I thought, maybe it's a rounding problem, since I should be increasing driver pointers by 705.6 bytes per timer tick, and I'm increasing for 705 bytes instead - but then, that should result with _less_ data available for playback; while `pav` rising implies that _more_ data than needed is delivered to the PortAudio callback. Then I did another test, trying to che
 ck the delta between the kernel timestamps for capture and playback driver tasklets separately; it turns out, even if the period is specified as 4000000e-9 nsecs (4 ms) - the deltas still gr
avitate around 3.99x ms; I tried also calculating cumulative bytes per second (based on deltas and bytes per period) - and for different captures, they typically tended to stabilize to some +80 Bps to some +100 Bps above the required (176400 Bps). So, if the hrtimer indeed runs a bit faster, then indeed we may feed more data than needed from the driver to userspace (PA) - but then again, the slope of `pav` is at least 668 bytes for 12.6 ms, which is >> 80 Bps; so I'm still not sure where exactly does that slope come from.

Then, I noticed that the original `dummy.c` driver doesn't increase its counters by a fixed amount of bytes - but instead, by what I'd call "adaptive" amount of bytes: delta to a base time is taken; and it can be used to calculate the expected (wrapped) buffer position depending on the requested rate (although `dummy.c` does this in frames, and does it in the `_pointer` callback). So I did the same in the hrtimer tasklet - and, this seemingly does very little in respect to the `pav` slope:

http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_02_jif/trace_patest__06_DA_pr_512.csv_01.png

... which is still present, with nearly the same slope as before. Unfortunately, passing "adaptive" amount of bytes/frames around is not the solution to the drops either - because here is a capture of a "drop input" (`frabDI`), with single "wire" callback requesting 512 FRAMES_PER_BUFFER - happening at approx 60 ms, even before the first correction happens (after 0.12 sec):

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_02_jif/trace_patest__08_DA_w_512_drop.csv_01.png

The only thing I can notice here, is that when the drop happens, `phwav` briefly "crosses" below the value of `pav` (which otherwise never happens; the same goes for cav/chwav) - but I'm not sure if this is the cause, or the effect, of the drop. Also, it can be noted that for a PA single "wire" callback, the driver's playback and capture hrtimer tasklets hit very close to each other, and the corresponding variables are also very close - at the above resolution, the trace points for both nearly overlap.

Another thing I suspected was FRAMES_PER_BUFFER in the PortAudio program, so the test script switches the setting in `patest_duplex_wire.c` between 512 and 0 (which is `paFramesPerBufferUnspecified`). Note that when using 0 frames per buffer:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_02_jif/trace_patest__13_DF_pr_0.csv_01.png

... and playback+record callbacks, the is no more long slope of `pav`/`phwav`, and it self-corrects more quickly; while `cav`/`chwav` stay nearly constant - however, also note the condition where `phwav` briefly "crosses" below the value of `pav`, appears here again (but no "drop input" is detected, because we're not using a single "wire" callback).

However, not even FRAMES_PER_BUFFER=0 seems to help; as it doesn't prevent the "drop input" for a single "wire" callback, in the capture below:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_02_jif/trace_patest__03_xA_w_0_drop.csv_01.png

Since I was also suspecting the debug traces from the driver for having an influence on the drop, the debug traces are disabled for the driver (x) in that capture - and obviously, this is not the sole cause of the drop either.

Note that the original `dummy.c` kernel module calculates the period in nanoseconds, so it matches the pcm period size; in that case, one does not have to calculate if the current buffer position went over a period size - but can, instead, issue `snd_pcm_period_elapsed` directly at each call of the hrtimer tasklet. In this case, say for a pcm_period_size: 512 bytes, we might get a hrtimer period of 2902495 ns = 2.9 ms. This is controlled by the USE_JIFFY_PERIOD variable in the `dummy-2.6.32-patest.c` driver code, though not controlled by the `run-alsa-pa-tests.sh` - thus it has to be enabled manually; captures which USE_JIFFY_PERIOD have `_jif` as extension to the folder name, those that match to period size have `_psz` as extension.

Again, even this doesn't prevent occurrence of the full-duplex "drop input"s - in all, these of the provided captures in the .tar.gz file feature drops:

captures_01_psz/trace_patest__07_DA_w_0_drop.csv    captures_03_psz/trace_patest__16_DF_w_512_drop.csv
captures_01_psz/trace_patest__15_DF_w_0_drop.csv    captures_04_jif/trace_patest__03_xA_w_0_drop.csv
captures_02_jif/trace_patest__03_xA_w_0_drop.csv    captures_04_jif/trace_patest__07_DA_w_0_drop.csv
captures_02_jif/trace_patest__07_DA_w_0_drop.csv    captures_04_jif/trace_patest__11_xF_w_0_drop.csv
captures_02_jif/trace_patest__08_DA_w_512_drop.csv  captures_04_jif/trace_patest__15_DF_w_0_drop.csv
captures_02_jif/trace_patest__11_xF_w_0_drop.csv    captures_04_jif/trace_patest__16_DF_w_512_drop.csv
captures_03_psz/trace_patest__07_DA_w_0_drop.csv

As the filenames indicate:

* Regardless of whether the driver tunes its timer period to a jiffy (_jif) or to the period size (_psz);
* Regardless of whether I use trace debug statement in the driver (D), or not (x);
* Regardless of whether the driver "returns" adaptive (A) or fixed (F) amount of bytes;
* Regardless of whether we use paFramesPerBufferUnspecified=0 or 512 frames per buffer in PortAudio;

... as long as there is a single stream/"wire callback" used from PortAudio, a "drop input" is liable to happen when running `patest_duplex_wire.c` - and this, even with a virtual `dummy` driver, which has no communication with real hardware at all, and pretty much does nothing: its role of "reading"/"writing" or "returning" bytes is merely simulated by increasing the stream buffer position counters! The mind boggles - especially since, the way I see it, one should do nothing else but properly increase these buffer counters from the driver, in order to have PortAudio "believe" a "proper" full-duplex transfer is occurring?!... and yet, what this "proper" rule (of increasing driver buffer counters) should be, completely evades me.


Here I'd also like to note a few other observations - first, note that when scaling the driver hrtimer period to the period size (_psz), the `pav`/`phwav` became nearly constant, in comparison to the slopes still present upon FRAMES_PER_BUFFER=0 (which confuses me, since both: 705 bytes @ 4ms; and 2.9 bytes @ 2.9ms; should both be nearly as close to a rate of 44100 frames = 176400 bytes per second):

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_03_psz/trace_patest__06_DA_pr_512.csv_01.png

... but then, as the filenames above indicate, drops happen even then - although somewhat less often, and only when debug tracing in the driver is enabled. Could it be, that debug tracing directly from the driver can have such an effect on performance, so as to influence appearance of full-duplex "drop inputs" - even if precautions were taken, such as using `ftrace`s `trace_printk` (instead of the usual syslog `printk`), and piping the debug output directly to RAM (by using `/dev/shm`)?

On a scale of a complete capture, I'd usually expect all cumulative pointers to essentially follow a single line; and usually when a capture goes well, that is how it appears - like on the image below:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_04_jif/trace_patest__14_DF_pr_512.csv_.png

While it may be a bit difficult to follow all variables (recreate the plot in the interactive `wxt` terminal in `gnuplot`, to be able to turn off individual function plots), they all generally follow each other in the same general slope; also note that the capture ends at about 2 seconds (as programmed in `patest_duplex_wire.c`) - but playback lingers on for half a second more.

However, a full-duplex drop can seemingly happen even on a line that otherwise appears straight - like on the plot below (although the reason for that could be that the drop happens early in that capture):

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_02_jif/trace_patest__08_DA_w_512_drop.csv_.png

... but usually, a "drop input" resets (at least) the driver's playback `hw_ptr` (and others), like on the capture below:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_04_jif/trace_patest__07_DA_w_0_drop.csv_.png

Note that when using playback/record PA callbacks (which cannot detect a "drop input"), may have similar resets of playback `hw_ptr`:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_03_psz/trace_patest__05_DA_pr_0.csv_.png

... but also the capture variables may reset:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_03_psz/trace_patest__06_DA_pr_512.csv_.png

The above capture also shows `frgbtc` starting to deviate from `frgbtp`, when these "resets" occur - but piecewise, `frgbtc` still seems to keep slope parallel to `frgbtp` (and the rest of the cumulative position variables). However, this seems to be a case of proper XRUN, as the respective log for that capture includes "PaAlsaStream_HandleXrun: restarting Alsa to recover from XRUN" (while the full-duplex "drop input" is specifically _not_ an xrun).

Note also, that sometimes I get some variables to be completely non-linear:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/captures_01_psz/trace_patest__05_DA_pr_0.csv_.png

... however, note also that `captures_01_psz` was taken with an older version of the scripts, where I may have possibly failed to increase `ftrace` buffering - so I think those interruptions in the capture data are actually an artifact of a tracing problem. Otherwise, is it possible that ALSA could "pause" the playback/capture hrtimer tasklets for a while? Interestingly, when I tried using `patest_duplex_wire.c` with my onboard HDA Intel soundcard (naturally, without any kernel driver variable tracing, only the PA part) - a lot of the times it ended up producing similar non-linear (and diverging) cumulative variable traces - however, it never triggered a full-duplex "drop input".


Finally, I suspect that these settings in the `dummy-2.6.32-patest.c` module, may also play a role in the behavior that results with full-duplex drops (they are otherwise kept the same as in the original `dummy` module):

    static struct snd_pcm_hardware dummy_pcm_hardware = {
      .info =			(SNDRV_PCM_INFO_MMAP |
             SNDRV_PCM_INFO_INTERLEAVED |
             SNDRV_PCM_INFO_RESUME |
             SNDRV_PCM_INFO_MMAP_VALID),
    ...
      .buffer_bytes_max =	MAX_BUFFER_SIZE,
      .period_bytes_min =	64,
      .period_bytes_max =	MAX_PERIOD_SIZE, //(64*1024)
      .periods_min =		USE_PERIODS_MIN, //1
      .periods_max =		USE_PERIODS_MAX, //1024
      .fifo_size =		0,
    ...

... and my questions in respect to them, would be this (considering the capture data should be available through address substream->runtime->dma_area in the driver):

* Given that this is a virtual driver, there is no hardware with buffers whose status (how filled they are) could generate (hardware) interrupts; instead we "raise" hrtimer functions (softirq priority) directly from the kernel module - which further schedule a tasklet (with even lower priority), and where we have the opportunity to control the repetition period of the timer functions (and thus the expected transferred bytes per period, for a given audio rate). In this context, what is the meaning of `period_bytes_min` and `periods_min`? Can they be arbitrarily set - or should one set them in relation to the expected bytes per period achieved through the timer functions?
* Given that `dummy` is a virtual driver, and doesn't communicate with hardware - there is no actual DMA operation performed through it, right?
* If there is no actual DMA - is there actual MMAP? In other words, does the MMAP in SNDRV_PCM_INFO_MMAP refer to: memory mapping of a hardware card DMA bus address, to substream->runtime->dma_area; or to: memory mapping of kernel-space substream->runtime->dma_area, to whatever address user-space (e.g. snd_pcm_read()) uses to access it?
* If there is no actual DMA - should SNDRV_PCM_INFO_MMAP_VALID be then kept? (original `dummy` driver keeps it) What would be the meaning of SNDRV_PCM_INFO_MMAP_VALID in this context (of a virtual driver with hrtimer functions/tasklets?)
* What would be the meaning of SNDRV_PCM_INFO_JOINT_DUPLEX in this context? (I've seen [http://mailman.alsa-project.org/pipermail/alsa-devel/2007-December/004801.html "[alsa-devel] How do I use SNDRV_PCM_INFO_JOINT_DUPLEX?"], but I can't really relate it to this context) Could usage of SNDRV_PCM_INFO_JOINT_DUPLEX help avoid the PortAudio full-duplex drops?


And to summarize my general questions:

* Given the 'telecom' definition of full-duplex as "a point-to-point system composed of two connected parties or devices that can communicate with one another in both directions, simultaneously" (wiki), is there a stricter/more specific definition of "full-duplex":
** in terms of digital audio generally?
** specifically in PortAudio (I guess yes: "when a single callback is used for both playback and capture", but there's probably more in terms of buffer positions), and in ALSA (I guess no)?

* Given that none of the approaches used in the test (debug driver traces on vs. off; driver returning fixed vs. adaptive number of frames/bytes per period; driver tuning the period to pcm_period_size vs. to a jiffy; PA using fixed vs. using unspecified amount of frames per period) have explicitly prevented the appearance of single "wire" callback (full-duplex) drops; and given that the driver is a virtual one (merely updating buffer positions, which shouldn't be a CPU hog) - what could possibly be the reason for the full-duplex drops?

* Is it possible at all, to use the architecture of the `dummy` driver (hrtimer + tasklet just updating buffer positions) to code an ALSA driver, that will not generate full-duplex drops when used from PortAudio with single "wire" callback?
** If so, is there a metric/conditions in terms of ALSA driver variables, that should be followed to ensure proper full-duplex operation (under PortAudio with single callback)?

* Is it possible that the driver is OK - and instead I have a problem with coding of the single "wire" callback in PortAudio? (even though most of it is copied from examples)


Well, that was certainly a mouthful - but I hope at least I managed to provide a reproducible example, if not a coherent explanation of the problem...

Thanks in advance for any answers/pointers,
Cheers!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-07-24  2:54 Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops Smilen Dimitrov
@ 2013-07-24 13:03 ` Alan Horstmann
  2013-07-25  0:29   ` Smilen Dimitrov
  2013-07-24 18:30 ` [Audacity-devel] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops Richard Ash
  1 sibling, 1 reply; 16+ messages in thread
From: Alan Horstmann @ 2013-07-24 13:03 UTC (permalink / raw
  To: alsa-devel; +Cc: portaudio, audacity-devel, Smilen Dimitrov

Hi Smilen,

Some comments below, since I am (probably the only person) on Alsa, Portaudio 
and Audacity lists (but no expert).  Evidently you have gone to some lengths 
on this.

On Wednesday 24 July 2013 03:54, Smilen Dimitrov wrote:
> Apologies in advance for the longish exposition - however, I have bumped
> into a problem with ALSA driver development, which raised some questions
> that I cannot really understand at the moment. I find it especially
> frustrating that I cannot formulate a simple question, but instead I have
> to resort to test code, captures and plots, to discuss my (mis)conceptions
> through those - so I hope at least someone will bear with this wall of text
> (some 23KB plaintext), and I'll be able to get some help with this.
>
> I originally started working on a 16-bit, 44.1 kHz stereo ALSA driver for a
> device; and came to a point where I could achieve (what I thought was)
> full-duplex by running `arecord` and `aplay`, in separate shells, without
> any problems. However, if I tried to do the same from `audacity` - that is,
> have "record" running, while another track plays on the same card, with
> "Audacity Preferences/Recording/Overdub: Play other tracks while recording
> new one" enabled - then I'd experience some strange drops in the capture,
> which were not reported by any debug messages, even after I rebuilt and
> used PortAudio with debug messages on (my eventual goal is to have this
> experiment running in Audacity without drops).

I think there are too many areas embroiled in this, and they need to be 
separated out.  If you have issues writing an Alsa driver, don't rely on 
Portaudio for testing, but use a range of audio players, or other means, and 
enlist help just from Alsa-devel, describing the hardware and ideally 
presenting your driver code.  Have you read Takashi Iwai's notes:

	http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/index.html ?

I suspect the 'dummy device' may not be a good model and you are not correctly 
reporting the number of samples transferred to the hardware?

It is probably reasonable to consider Audacity not a focus of concern, since 
it simply uses Portaudio.  If you think there is a problem with Portaudio, 
run tests with standard sound hardware such as on-board, PCI or USB units 
known to work with Alsa, and take the focus off your own driver (since others 
will not be able to reproduce your custom results anyway).  There are few 
within Portaudio who are familiar with the Alsa code but we will try to 
assist if needed (I think that list has a 20K size limit IIRC).  Note that 
Audacity does not utilise the latest Portaudio code available, so consider 
getting a more recent version.

Hope this can help to move your problem forward.

Regards

Alan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Audacity-devel] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-07-24  2:54 Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops Smilen Dimitrov
  2013-07-24 13:03 ` Alan Horstmann
@ 2013-07-24 18:30 ` Richard Ash
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Ash @ 2013-07-24 18:30 UTC (permalink / raw
  To: alsa-devel, portaudio, audacity-devel; +Cc: sd

On Wed, 24 Jul 2013 04:54:00 +0200
Smilen Dimitrov <sd@imi.aau.dk> wrote:

> I ran this on Ubuntu 11.04 Natty, Linux 2.6.38 and corresponding ALSA
> 1.0.24.2, audacity-1.3.13, PortAudio V19 - more about my machine and
> environment setup is in the Readme:
> 
>     http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/Readme

Please don't use such an old version of Audacity - we have updated our
Portaudio snapshot significantly since then. Bugs in obsolete code
really aren't interesting.

> In Audacity, as far as I could see, there is only one PortAudio
> callback defined (audacity-1.3.13/src/AudioIO.cpp has
> audacityAudioCallback;), so Audacity basically has a single "wire"
> callback for full-duplex.
Correct.

I can't help with anything else, sorry, except to agree with Alan that
you need to take as much as possible out of the equation to start with
- and that certainly means Audacity. If at all possible, use current
 versions of software, and a set-up that others can replicate as much
 as possible.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-07-24 13:03 ` Alan Horstmann
@ 2013-07-25  0:29   ` Smilen Dimitrov
  2013-07-25  8:37     ` Clemens Ladisch
  0 siblings, 1 reply; 16+ messages in thread
From: Smilen Dimitrov @ 2013-07-25  0:29 UTC (permalink / raw
  To: alsa-devel; +Cc: portaudio, audacity-devel

Hi Alan, Richard, 

Many thanks for your responses!


> 
> It is probably reasonable to consider Audacity not a focus of concern, since 
> it simply uses Portaudio.  [...]
> enlist help just from Alsa-devel, 

>  [...] agree with Alan that
> you need to take as much as possible out of the equation to start with
> - and that certainly means Audacity.

Fully agreed on that; the only reason I mentioned Audacity, was (more or less) "search engine optimization": I initially met the problem there - and looking up for "Audacity full duplex drop" didn't give me much at the time. And as it took me quite some time to dig up where the problem was coming from - I thought mentioning Audacity as a lead into the problem, may save others looking up something similar in the future. Unfortunately, sometimes providing too much detail may shift the focus in the problem - it was absolutely not my intention to blame Audacity for anything [in fact, I know it is my own driver coding skills to blame :) ]; just to have a mention for a future reference, and maybe to see if some from that community have had similar experiences. 

I am including audacity-dev in this reply, just so my response above is logged; however, I've modified the reply-to field in this post, to point only to portaudio and alsa-devel - so hopefully any future responses will be taken there.


> ... describing the hardware and ideally 
> presenting your driver code. 

I have already coded a virtual (that is, independent of any hardware) ALSA driver, and an example PortAudio program, that demonstrate the exact same problem (as I originally met with hardware and Audacity); and the driver, program and test scripts are posted at the link:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/

... since I expect problems, if I try to attach all those files in a post to mailing list(s).


> 
> I think there are too many areas embroiled in this, and they need to be 
> separated out.  If you have issues writing an Alsa driver, don't rely on 
> Portaudio for testing, but use a range of audio players, or other means, [...]

I have used the default ALSA tools, `arecord` in parallel with `aplay`, and full-duplex in that way goes without any unexpected problems. It is only PortAudio when using a single/"wire" callback (and thus Audacity) where I experience the full-duplex "drop input" problem. 


> Have you read Takashi Iwai's notes:
> 
> 	http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/index.html ?
>

Many times! :) My difficulties there occur because the notes are written in respect to hypothetical devices, which I assume are PCI - and I don't really understand PCI. So I'm trying to translate the meanings when reading those notes to virtual drivers, and that sometimes works for me - sometimes, unfortunately, it doesn't (like with this problem).

 
> I suspect the 'dummy device' may not be a good model [...] 

Thanks for mentioning this - so far, I haven't read any comment online, on whether `dummy` is a good model of an actual hardware driver; judging from the way the timer functions are coded there, it seemed reasonable to me, so I assumed it is. Then again, I've tried to read `snd-hda-intel`'s driver, and I have a hard time relating it to `dummy`, so it certainly doesn't model all details; but I hoped at least the stream buffer positions (as eventually presented to user-space software) are correct. If anyone can confirm that `dummy` is not reliable in this, that would be great.


> [...] and you are not correctly 
> reporting the number of samples transferred to the hardware?

I agree that it must be the core of the problem - but I have problem understanding why, given I currently perceive that I'm doing everything right: I know I have a rate of 44100 frames per second; I choose either a period for timer functions, and calculate bytes per period to match the rate, or vice versa; and in each period, I increase stream buffer positions for that bytes per period amount (taking care of buffer wrapping). But PortAudio, when using a single callback for full-duplex, will still detect a "drop input" - and so, I'm at a loss in figuring out what is being done wrong. 


> If you think there is a problem with Portaudio, 
> run tests with standard sound hardware such as on-board, PCI or USB units 
> known to work with Alsa, and take the focus off your own driver 

I have already tried the PortAudio program with the onboard `snd-hda-intel` and its driver, and it doesn't do a full-duplex drop. However, when at the time I tried to plot the PortAudio variables I captured, I remember I got some weird plots (e.g. the cumulative count of framesGot in the PortAudio callback made a curved line, not a straight one). I should probably try this test again, though.


> (since others 
> will not be able to reproduce your custom results anyway). 

> If at all possible, use current
>  versions of software, and a set-up that others can replicate as much
>  as possible.

Well, I was hoping - given that the problem in OP is expressed in terms of an interaction of a virtual driver (no hardware) and a (relatively simple) PortAudio program - that it would be possible for others to reproduce the problem (at least on a matching kernel and PortAudio version), as it's all in software. 

However, I'll also admit that my original post is somewhat dense - and it may not have been obvious from first reading, that there is example code available that should run on any typical Linux PC (without need for additional card hardware).


> There are few 
> within Portaudio who are familiar with the Alsa code but we will try to 
> assist if needed (I think that list has a 20K size limit IIRC).  

Thanks for that, good to keep in mind. Well, I'm really hoping for a response from the alsa-dev folks, then - but given that this is a problem that is triggered (as far as I know) only in a specific section in PortAudio (even if by my own faulty driver code), I hope it would be OK to keep `portaudio` list in CC. I was also hoping that some Portaudio folks may have encountered the "full duplex drop" in their own development (even if on another OS, and with actual soundcards), and gained some insight with it they'd be willing to share.


> Please don't use such an old version of Audacity - we have updated our
> Portaudio snapshot significantly since then. Bugs in obsolete code
> really aren't interesting.

> Note that 
> Audacity does not utilise the latest Portaudio code available, so consider 
> getting a more recent version.
> 

The thing is, this work is part of a bigger project of mine, and in the name of reproducibility, I wanted to fix all my deliverables to the OSs I started working with (which is why I still use old versions). Given the "full duplex drop" is not a problem in PortAudio per se - but a problem with my ALSA driver code - I don't see a reason why the driver shouldn't work with these older versions, when/if it gets fixed. 


>> In Audacity, as far as I could see, there is only one PortAudio
>> callback defined (audacity-1.3.13/src/AudioIO.cpp has
>> audacityAudioCallback;), so Audacity basically has a single "wire"
>> callback for full-duplex.
> Correct.

Thanks for confirming that - that would mean (hopefully) that my PortAudio program `patest_duplex_wire.c`, when using a single "wire" callback, should be a somewhat accurate approximation for Audacity behavior; and therefore, if the driver problem gets solved with `patest_duplex_wire.c` - it should also be solved in Audacity. 


> Hope this can help to move your problem forward.
> 

I think I definitely should try running the experiments on the onboard "HDA Intel" soundcard, and try getting some more insight in how its driver manages its stream buffer positions - but I still hope some from alsa-devel may recognize the problem that I'm dealing with, and let me know where am I going wrong in buffer position calculation in my virtual driver (`dummy-2.6.32-patest.c`).

Many thanks again for the responses, 
Cheers!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-07-25  0:29   ` Smilen Dimitrov
@ 2013-07-25  8:37     ` Clemens Ladisch
  2013-08-04  0:05       ` Smilen Dimitrov
                         ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Clemens Ladisch @ 2013-07-25  8:37 UTC (permalink / raw
  To: Smilen Dimitrov; +Cc: portaudio, alsa-devel, audacity-devel

Smilen Dimitrov wrote:
>> [...] and you are not correctly
>> reporting the number of samples transferred to the hardware?
>
> I agree that it must be the core of the problem - but I have problem
> understanding why, given I currently perceive that I'm doing
> everything right: I know I have a rate of 44100 frames per second; I
> choose either a period for timer functions, and calculate bytes per
> period to match the rate, or vice versa; and in each period, I
> increase stream buffer positions for that bytes per period amount
> (taking care of buffer wrapping).

Your driver's .pointer callback must report the *actual* position at
which the hardware has finished reading from the buffer.  You *must*
read some hardware register of your DMA controller for this.  It is not
possible to deduce this from the current time because the clocks do not
run at the same speed, and any kind of buffering will introduce more
errors.

The dummy driver uses a timer because there is no actual hardware.


Regards,
Clemens

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-07-25  8:37     ` Clemens Ladisch
@ 2013-08-04  0:05       ` Smilen Dimitrov
  2013-08-06 10:59         ` Clemens Ladisch
  2013-08-14 14:30       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (playback) Smilen Dimitrov
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Smilen Dimitrov @ 2013-08-04  0:05 UTC (permalink / raw
  To: alsa-devel; +Cc: portaudio, Clemens Ladisch

Hi Clemens, 

Many thanks for your reply - and apologies it took me a while to write back (and for a longish email again). Since reading your reply, I've spent most of my time coding a new test case for discussion, now posted here (see Readme for more):

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/
    http://sdaaubckp.sourceforge.net/post/alsa-capttest/Readme

I took the previous advice; and this is an ALSA-only (and capture only, to keep in simple) test, trying to explore what happens when two `snd_pcm_readi` commands:

    ret1 = snd_pcm_readi(capture_pcm_handle, audiobuf, period_32_frames);
    ret2 = snd_pcm_readi(capture_pcm_handle, audiobuf, period_32_frames);

... are ran in succession, for 44100Hz/16b/stereo (with period_size=32 and buffer_size=64 frames, resulting with period time of 725.6 microseconds) in two contexts: 1) with my onboard PCI 'snd_hda_intel' card; and 2) with the virtual 'snd_dummy' driver. Hopefully it will help me get some questions I have clarified - and eventually result with a virtual ALSA driver, that does not trigger the full-duplex drop in PortAudio.

On 2013-07-25 10:37, Clemens Ladisch wrote:
>>> [...] and you are not correctly
>>> reporting the number of samples transferred to the hardware?
> >
>> I agree that it must be the core of the problem - but I have problem
>> understanding why, given I currently perceive that I'm doing
>> everything right: I know I have a rate of 44100 frames per second; I
>> choose either a period for timer functions, and calculate bytes per
>> period to match the rate, or vice versa; and in each period, I
>> increase stream buffer positions for that bytes per period amount
>> (taking care of buffer wrapping).
>
> Your driver's .pointer callback must report the *actual* position at
> which the hardware has finished reading from the buffer.  You *must*
> read some hardware register of your DMA controller for this.


I understand this - and agree with it, if I had such a case case, where my driver would talk to an actual hardware card. However, since here I'm interested in the operation of a virtual (platform) driver, which talks to _no_ soundcard hardware - how could I possibly read a hardware register, related to a card that doesn't exist? (Maybe it's the "...transferred to the hardware..." mention at the start of the quote, that gave the wrong impression of my focus in this case? My statement in the quote, refers to what I'm trying to do with the _virtual_ "dummy" driver.)

Anyways, now that it's mentioned, I wanted to make sure I've understood the reporting of the actual position at "which the hardware has finished reading from the buffer" conceptually, in context of actual soundcard hardware - so here is a diagram based on my onboard PCI 'hda-intel' card (replace ".png" with ".pdf" in link, to get a text-searchable PDF version):

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/montage-hda-intel.png

Here's what I'm trying to show on it: I'm assuming that the card has it's own intern capture buffer memory on board; and ALSA (the hda_intel driver, actually) manages the equivalent of `substream->runtime->dma_area` (the hda_intel driver actually manages it's own `area` pointer, as part of a `chip` structure) as capture buffer memory in RAM. The main purpose of the ALSA driver, then, is to manage the copying the data from the intern card capture buffer, to the `dma_area` capture buffer in RAM of the PC; once the data is in the `dma_area`, the rest of the ALSA engine will make sure that data ends up in `audiobuf` in user-space, upon a call to `snd_pcm_readi` as given above.

What is intended to be shown, is: the card starts filling its intern capture buffer, soon after `snd_pcm_start`; since the period is set to 32 frames, when the card reaches this boundary in its buffer, it generates an interrupt ("CardIRQ?"); the kernel reacts to this by handling this hardware interrupt, by eventually calling the `azx_interrupt()` handler of the hda-intel driver. [[[~ Obviously, I cannot measure the actual interrupts generated by the card, so the "CardIRQ?" positions are interpolated - based on reported kernel interrupt entries, but only where the `azx_interrupt` handler has been called (since it's also possible to capture interrupt entries for power, for instance); the shown filling of the buffer is then interpolated based on this. ~]]]

On the PC side, the driver's .pointer callback can be triggered both by userspace call to `snd_pcm_readi`, and (apparently) independently of it - but (surprising for me) it is not necessarily periodic! The `dma_area` filling shown is based on actual position returned by the .pointer callback (as much as space allows). Looking from a distance, it looks like the .pointer position returned, seems to track well the (idealized) filling of the intern capture buffer. [[[~ however, this may also be due to the .pointer callback (`azx_pcm_pointer()`) being occasionally called in quick succession (apparently in context of `snd_pcm_capture_ioctl()` function). ~]]]

Is the above understood correctly? And does the observation, that (apparently):
* the "filling" of `dma_area` buffer on the PC side "tracks well" the "filling" of the intern capture buffer on the card side;
... illustrate the nature of .pointer "reporting of the actual position at which the hardware has finished reading from the buffer" correctly?


> It is not
> possible to deduce this from the current time because the clocks do not
> run at the same speed, and any kind of buffering will introduce more
> errors.

Thanks for mentioning this - I had otherwise completely forgotten about clock domains; so this was the comment that got me working on this test round! First, I'd like to make sure I understand "the clocks do not run at the same speed" properly:

In the `montage-hda-intel.png` diagram above, there are three time axes, shown vertically. It is assumed that the card hardware has its own separate crystal oscillator (XOcard), in addition to the PC having its own crystal (XOpc) - as clock sources; consequently, they have their separate time axes "(Card) Time" and "(PC) Time", given there will always be some mismatch between the frequencies they generate. The leftmost axis is what I've called "(Real) time", and is used for no other reason, than being a reminder; I guess it would represent the clock of an "independent observer", or the "developer clock" - or the "global date & time clock" (such as retrieved by `ntpdate`). By the way, when I see "wall clock" referred to in code, does that refer to this, which I've called "Real Time"?

The diagram takes the "(Real) Time" axis to have a time unit == 1; the "(Card) Time" is set to have a 1.5% smaller (faster) unit than 1; and the "(PC) Time" is set to have a 1.5% larger (slower) unit than 1. The value of 1.5% is chosen arbitrarily, so the mismatch is more obviously visible through the diagram axes' ticks. Does this illustrate the nature of "the clocks do not run at the same speed" properly (in general)?

Anyways - I do understand the impossibility of deduction of the (XOcard) timing, based solely on algorithms running on (XOpc) timing. And I agree it would have been a problem, if I was in a context of working with actual soundcard hardware.

However, since I'm inquiring about a virtual driver - there is no actual card hardware, and consequently no actual (Xocard) crystal oscillator/clock; as you've noted:


>
> The dummy driver uses a timer because there is no actual hardware.
>

Right - I have tried to visualise this on the diagram below (again, replace `.png` with `.pdf` for a PDF version):

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/montage-dummy.png

That diagram should clearly show, that in a context of a virtual (platform) "dummy" driver, there is no actual card hardware targetted - nor a corresponding oscillator (with a corresponding independent clock domain). The large red arrows, that used to indicate the hardware IRQ in the `hda-intel` diagram, now indicate the timer functions softirq entry - and since they originate from the CPU itself, they now point from the other side (note also I'm reusing the "CardIRQ?" engine, to also render these interrupt entries on the card axis, under the cross-out). There is only one buffer (`dma_area`'s) filling shown - and as there is no intern card buffer now, there is nothing to compare this `dma_area`'s filling process to; the filling shown is based solely on what the .pointer callback returns.

And herein lies the crux of my inquiery: given there is no hardware targetted with a virtual driver, I can in principle return whatever I want as a .pointer position. Logic would say that the value returned from pointer, should increase (in this case) by 32 (frames) each PCM period (726 microseconds) - and it is with this in mind, that the timer functions are ran in the dummy driver. Userspace, in principle, deals only with this layer of information - so I should be able to simulate a proper operation to userspace, just by increasing this .pointer value properly. However, even if I do that, I still manage to somehow trigger a full-duplex drop in the PortAudio userspace layer - and that means, I'm still going wrong somewhere with the .pointer position calculation, even if I believe I'm doing it right.

The full-duplex part, of course, is not handled in this test, being capture only; however, I can notice some things, that may eventually have an influence:

* The .pointer callback can be called in quick succession, in context of `snd_pcm_capture_ioctl`, with both `dummy` and `hda-intel` drivers.
* In both cases, right after `snd_pcm_start` is called, .pointer is called, returning zero, BUT:
** with `hda-intel` driver, the card immediately raises an interrupt here - making for a total of 3 interrupts in a 2ms capture;
** with `dummy` driver, the timer function is just scheduled at the start - but does not fire at the start; it fires first after a period has expired - making for a total of 2 interrupts in a 2 ms capture.


Also, just by looking at `montage-hda-intel.png` vs. `montage-dummy.png`, one would gather that the hardware IRQ runs slightly faster than the expected 726 μs period; while the timer function softirq runs somewhat slower than that. However, that is inaccurate - both drivers' timing IRQ's can jitter in either direction; this is especially obvious if several captures are made, and then the corresponding plots animated, as shown on the animated .gif here:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/capttest.gif

In that gif, the plots which were vertical in the montages, are now shown horizontally; 2ms capture of `dummy` is shown on top, while the capture of `hda_intel` is shown below it. It is noticeable that, first, focusing on the "(Card) Time" track, `hda-inter` fires an additional "start" interrupt, which has no counterpart in `dummy`. However, it is also noticeable that:

* Focusing on the "(Card) Time" track, the `hda-intel` hardware interrupts are much less jittery (and follow the expected 726 us period more closely) than the timer interrupts of `dummy`;

I'm not sure if this is the expected behaviour. As far as I know, the order priority of interrupts in Linux is (crudely):

    hardware IRQ > softIRQ (sofftware IRQ) > task switching/scheduling > everything else

The card then, uses it's own clock, which is not burdened with anything else but filling buffers, meaning we can expect tight timing here; and when it generates an IRQ, it is handled by kernel with highest priority - ergo, not so much jitter. Timer functions, on the other hand, run in softIRQ context - meaning they (and their scheduling) could be preempted by the hardware IRQ of any other device on the system; ergo, more jitter. Is this reasonable to assume?

* When .pointer runs in quick succession, that usually results with a "correction" for `hda-intel` - but `dummy`'s position remains the same in the same situation

I just noticed this while writing this mail, otherwise I didn't pay much attention to it. But, I just remembered that the original `dummy` driver, calculates delta and returns pointer position based on it in the .pointer callback itself:

    dummy_hrtimer_pointer(struct snd_pcm_substream *substream)
    {
      ...
      delta = ktime_us_delta(hrtimer_cb_get_time(&dpcm->timer), dpcm->base_time);
      ...
      div_u64_rem(delta, runtime->buffer_size, &pos); // this sets pos
      return pos;
    }

... and in that, could simulate the "correction" that `hda-intel` also does, when called in quick succession.

However, in my version of the `dummy` driver, I'm also trying to write a few pulses to the `dma_area` - therefore I actually manage the position (returned by .pointer) in the hrtimer tasklet in a variable `pcm_buf_pos`:

    static void dummy_hrtimer_pcm_elapsed(unsigned long priv) // this is the tasklet
    {
      ...
      delta = ktime_us_delta(hrtimer_cb_get_time(&dpcm->timer), dpcm->base_time);
      ...
      div_u64_rem(delta, runtime->buffer_size, &pos); // this sets pos
      ...
      dpcm->pcm_buf_pos = frames_to_bytes(runtime, pos);
      ...
    }

... and in the .pointer, I simply return this number:

    dummy_hrtimer_pointer(struct snd_pcm_substream *substream)
    {
      ...
      pos = bytes_to_frames(runtime, dpcm->pcm_buf_pos);
      return pos;
    }

Now, note that the `capttest.gif` animation shows the jitter of the timer *(soft)IRQ entry*; however, the timer function in itself just schedules the tasklet to run even later - and this is also visible in `montage-dummy.png`, where it can be seen that the tasklet `dummy_hrtimer_pcm_elapsed` usually occurs up to some 100 us μs *after* the timer IRQ entry! This probably has an influence on the .pointer position calculation - but can it be to such a degree, to cause a PortAudio drop in full-duplex mode?


Since I've mentioned writing in the `dma_area`: `hda-intel` probably schedules the DMA controller, to transfer the data from the intern capture memory, to RAM of the PC - and as such, the transfer/copy uses no CPU cycles (CPU time). While in my `dummy`, just by trying to `memset` (not even copy) a few bytes, I'm using extra CPU cycles - could this also have an influence on increased jitter?

And now that DMA is mentioned, I might as well ask again:

* What is the meaning of MMAP in context of SNDRV_PCM_INFO_MMAP? Is it:
** A memory map from the card's internal buffer, via DMA, to the `dma_area` in PC RAM; or
** A memory map from `dma_area` in kernel space, to whatever buffer is referred to in user space?


Well, I hope someone will be able to confirm, if I am right in my understanding so far - and point out where am I otherwise wrong...

Thanks in advance for any feedback,
Cheers!
_______________________________________________
Portaudio mailing list
Portaudio@music.columbia.edu
http://music.columbia.edu/mailman/listinfo/portaudio

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-08-04  0:05       ` Smilen Dimitrov
@ 2013-08-06 10:59         ` Clemens Ladisch
  2013-08-06 11:41           ` David Henningsson
  2013-08-08  2:50           ` Smilen Dimitrov
  0 siblings, 2 replies; 16+ messages in thread
From: Clemens Ladisch @ 2013-08-06 10:59 UTC (permalink / raw
  To: Smilen Dimitrov; +Cc: alsa-devel

Smilen Dimitrov wrote:
> On 2013-07-25 10:37, Clemens Ladisch wrote:
>> Your driver's .pointer callback must report the *actual* position at
>> which the hardware has finished reading from the buffer

... for a playback stream, or finished reading, for a capture stream.

> I'm interested in the operation of a virtual (platform) driver,
> which talks to _no_ soundcard hardware

And what does this driver do?  What is your goal?

> I'm assuming that the card has it's own intern capture buffer memory
> on board;

No modern card has this.  All data is immediately read from/written to
main memory.

> and ALSA manages the equivalent of `substream->runtime->dma_area`
> as capture buffer memory in RAM. The main purpose of the ALSA driver,
> then, is to manage the copying the data from the intern card capture
> buffer, to the `dma_area` capture buffer in RAM of the PC;

This is handled by the hardware's DMA.

> once the data is in the `dma_area`, the rest of the ALSA engine will
> make sure that data ends up in `audiobuf` in user-space, upon a call
> to `snd_pcm_readi` as given above.

Yes.  (When using snd_pcm_mmap_*, the dma_area is mapped to userspace,
and ALSA itself will never access the contents of the buffer.)

> [...] where the `azx_interrupt` handler has been called (since it's
> also possible to capture interrupt entries for power, for instance);

The stream's SD_STS register tells whether this is an interrupt because
a period boundary has been crossed.

> On the PC side, the driver's .pointer callback can be triggered both
> by userspace call to `snd_pcm_readi`, and (apparently) independently
> of it - but (surprising for me) it is not necessarily periodic!

The .pointer callback is triggered by snd_pcm_period_elapsed() (because
some more data, or even another period, might have been transferred in
the meantime), whenever userspace writes or reads samples, or whenever
userspace feels like asking for the current position.

> when I see "wall clock" referred to in code, does that refer to
> this, which I've called "Real Time"?

It's called "wall clock" because Intel named the register this way;
actually, it's the device's sample clock.

> And herein lies the crux of my inquiery: given there is no hardware
> targetted with a virtual driver, I can in principle return whatever I
> want as a .pointer position.

If you are actually transferring sample from/to somewhere, you must
return the status of these transfers.

> I should be able to simulate a proper operation to userspace, just by
> increasing this .pointer value properly. However, even if I do that,
> I still manage to somehow trigger a full-duplex drop in the PortAudio
> userspace layer

A buffer length of about 1 ms is very likely to result in over/underruns,
regardless of what your .pointer callback does.

Better try with a 32000-frame buffer first.

> The card then, uses it's own clock, which is not burdened with
> anything else but filling buffers, meaning we can expect tight timing
> here; and when it generates an IRQ, it is handled by kernel with
> highest priority - ergo, not so much jitter. Timer functions, on the
> other hand, run in softIRQ context - meaning they (and their
> scheduling) could be preempted by the hardware IRQ of any other device
> on the system; ergo, more jitter. Is this reasonable to assume?

Yes, but other hardware interrupts interfere only if other devices are
used at the same time.


Regards,
Clemens

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-08-06 10:59         ` Clemens Ladisch
@ 2013-08-06 11:41           ` David Henningsson
  2013-08-06 13:04             ` Clemens Ladisch
  2013-08-08  2:50           ` Smilen Dimitrov
  1 sibling, 1 reply; 16+ messages in thread
From: David Henningsson @ 2013-08-06 11:41 UTC (permalink / raw
  To: Clemens Ladisch; +Cc: alsa-devel, Smilen Dimitrov

On 08/06/2013 12:59 PM, Clemens Ladisch wrote:
> Smilen Dimitrov wrote:
>> On 2013-07-25 10:37, Clemens Ladisch wrote:
>>> Your driver's .pointer callback must report the *actual* position at
>>> which the hardware has finished reading from the buffer
> 
> ... for a playback stream, or finished reading, for a capture stream.

Hi Clemens,

I'm not involved with Smilen but still find the questions interesting,
so as always, thanks for sharing your knowledge :-)

What if the pointer granularity is very coarse? E g, some hardware might
only be able what period you're in (IIRC, I've seen this on the Tegra
platform), rather than the actual sample. Would you recommend to report
the latest period boundary in that case, or interpolating it with timers?

This is also interesting to PulseAudio which likes to rewind buffers and
so on, and relies on a good pointer granularity.

>> The card then, uses it's own clock, which is not burdened with
>> anything else but filling buffers, meaning we can expect tight timing
>> here; and when it generates an IRQ, it is handled by kernel with
>> highest priority - ergo, not so much jitter. Timer functions, on the
>> other hand, run in softIRQ context - meaning they (and their
>> scheduling) could be preempted by the hardware IRQ of any other device
>> on the system; ergo, more jitter. Is this reasonable to assume?
> 
> Yes, but other hardware interrupts interfere only if other devices are
> used at the same time.

Also, it applies to kernel-space only. If you want to process anything
in userspace, you can still be interrupted by any kernel process -
hardIRQ, softIRQ or even other kernel tasks.


-- 
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-08-06 11:41           ` David Henningsson
@ 2013-08-06 13:04             ` Clemens Ladisch
  0 siblings, 0 replies; 16+ messages in thread
From: Clemens Ladisch @ 2013-08-06 13:04 UTC (permalink / raw
  To: David Henningsson; +Cc: alsa-devel, Smilen Dimitrov

David Henningsson wrote:
> On 08/06/2013 12:59 PM, Clemens Ladisch wrote:
>>> On 2013-07-25 10:37, Clemens Ladisch wrote:
>>>> Your driver's .pointer callback must report the *actual* position at
>>>> which the hardware has finished reading from the buffer
>>
>> ... for a playback stream, or finished reading, for a capture stream.
>
> What if the pointer granularity is very coarse? E g, some hardware might
> only be able what period you're in (IIRC, I've seen this on the Tegra
> platform), rather than the actual sample. Would you recommend to report
> the latest period boundary in that case, or interpolating it with timers?

By reporting position x, the driver guarantees that the device has
finished reading (for a playback stream) before x, and that the
application is allowed to overwrite the buffer before x with new sample
data.

When the driver does not know the current position of the DMA
controller, it must report the last known 'safe' position (and set
SNDRV_PCM_INFO_BLOCK_TRANSFER).


Regards,
Clemens

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops
  2013-08-06 10:59         ` Clemens Ladisch
  2013-08-06 11:41           ` David Henningsson
@ 2013-08-08  2:50           ` Smilen Dimitrov
  1 sibling, 0 replies; 16+ messages in thread
From: Smilen Dimitrov @ 2013-08-08  2:50 UTC (permalink / raw
  To: alsa-devel; +Cc: Clemens Ladisch

Hi Clemens, David,

Many thanks for bearing with me, and your feedback - I truly appreciate it! Apologies for a verbose response again; I have tried to organize this time, and push my more verbose snippets to the end of the mail; hope that helps with readability.


On 2013-08-06 12:59, Clemens Ladisch wrote:
> Smilen Dimitrov wrote:
>> On 2013-07-25 10:37, Clemens Ladisch wrote:
>
>> I'm interested in the operation of a virtual (platform) driver,
>> which talks to _no_ soundcard hardware
>
> And what does this driver do?  What is your goal?
>

For now, my goal is to develop a virtual driver (or rather, make minimal modifications to the existing `dummy` driver), which will a) work at CD-quality (44100/16b/2ch) and not trigger the full-duplex drop in PortAudio and b) write something trivial (like "pulses" at period and buffer boundary) in the capture buffer - just so I can load the driver, fire up Audacity, load a playback file, press record (which will start full-duplex in overdub mode), and see something expected be captured (without drops, and repeatedly).

The driver's purpose, in my case, would be to 1) learn what it takes for a virtual driver to not trigger full-duplex drop in PortAudio, and then 2) serve as a sort of a comparison basis (or a benchmark), against which I'd compare the operation of a similar (with timer functions) driver. I have moved the more verbose explanation in comment (*c1) at end of this mail.


>> I'm assuming that the card has it's own intern capture buffer memory
>> on board;
>
> No modern card has this.  All data is immediately read from/written to
> main memory.
>

Thanks for noting this - I wish I knew better :) ( By the way, can anyone recommend any references, which would get me up to speed on the evolution of soundcard hardware? )

My immediate question here is - why, then, do I observe IRQ's at period boundary with `hda-intel`? But then:

>> and ALSA manages the equivalent of `substream->runtime->dma_area`
>> as capture buffer memory in RAM. The main purpose of the ALSA driver,
>> then, is to manage the copying the data from the intern card capture
>> buffer, to the `dma_area` capture buffer in RAM of the PC;
>
> This is handled by the hardware's DMA.
>

Aha - ok, I'll try to speculate here, to make sure I now have a more acceptable model of operation for a proper DMA card; based on the simplified DMA schematic in the previously mentioned http://sdaaubckp.sf.net/post/alsa-capttest/montage-hda-intel.png - again just discussing capture, let's say 44100/16b/2ch:

* Card uses its XO to derive a sampling clock as close as possible to 44100 Hz;
* When this clock hits, card has to perform an ADC - this means it has to store 16 bits per channel somewhere (in this case, 4 bytes for the two channels)
* The card, having sampled, triggers a request on the DMA bus (possibly, by "raising" DREQ)
** Since this signal doesn't utilize the CPU - it is **not** registered as an interrupt (IRQ)
** The DMA controller then makes sure control and data bus are switched soon enough, so these 4 bytes are stored in main memory, possibly asserting DACK0 for acknowledgment afterwards
** Upon ACK, card internally accumulates +4 on its capture "buffer pointer" counter
** (Thus, this DMA "interrupt"/request, (sort of) functions as the sampling rate "timer"/trigger in the context of the PC as a whole - but not in the context of the CPU directly)
* When card realizes "buffer pointer" counter is >= period_size, it triggers an IRQ proper, that interrupts the CPU - not to initiate a copy from "intern capture buffer memory" to main RAM - but to inform the OS, that now +period_size frames (bytes) are available in main RAM memory
** Driver reacts to IRQ, then eventually raises `snd_pcm_period_elapsed` (and rest of ALSA takes it from there)

Now, I would call those 4 bytes I speculate about "intern capture memory", although it's definitely not a "buffer" - maybe the more proper term for them would be "registers" (as they are few in number, and likely fast)?. Or do modern cards, for instance, hook the ADC output directly to DMA bus - so not even those 4 bytes are present as "intern capture memory" on card?

In either case - could the speculative "breakdown" above, be taken to be a more accurate approximation of the capture process with a modern card? If so, then maybe the `montage-hda-intel.png` image can still be considered somewhat applicable for modern cards - as in: the period boundary IRQ is raised _as if_ there was an intern card capture buffer of the given period/buffer size, signalling a need for copy to main memory - except there is no actual copy, since there is no intern capture buffer (and capture data goes directly to main RAM) on modern cards.


>> And herein lies the crux of my inquiery: given there is no hardware
>> targetted with a virtual driver, I can in principle return whatever I
>> want as a .pointer position.
>
> If you are actually transferring sample from/to somewhere, you must
> return the status of these transfers.
>

Agreed - and my bad for saying "whatever I want"; I guess, what I was trying to emphasize, comes from my experience with AudioArduino (and conversely, lack of experience with actual PCI/DMA cards). Namely, with AudioArduino (see also (*c1)), essentially I only had to relate to one formula:

    bytes_per_period = (rate_bytes/1[s])*(period_time[s])

... which is what was being returned (as increase) in .pointer there - moved the verbose to comment (*c2) at end of this mail.

In other words, I am assuming that _solely_ by increasing pointers by bytes_per_period in period_time, the driver should be able to persuade ALSA (and userspace) that transfers are going fine; I am cheating ALSA with that approach, but I'm not cheating full-duplex PortAudio :) However, it seems the PortAudio full-duplex drop is triggered by a `snd_pcm_delay` check, see comment (*c4); but even `snd_pcm_delay` depends only on appl_ptr and hw_ptr - and ultimately, on .pointer position.


>> I should be able to simulate a proper operation to userspace, just by
>> increasing this .pointer value properly. However, even if I do that,
>> I still manage to somehow trigger a full-duplex drop in the PortAudio
>> userspace layer
>
> A buffer length of about 1 ms is very likely to result in over/underruns,
> regardless of what your .pointer callback does.
>
> Better try with a 32000-frame buffer first.
>

I think this is a key comment - I'll have to try and understand it better, because buffer sizes that big didn't really occur to me. And that is because, even if I multiply both sides of the above formula with a constant A > 1:

    A*bytes_per_period = (rate_bytes/1[s])*(A*period_time[s])

... the ratio doesn't change; so it's not immediately obvious to me, why a very large buffer would have helped (I have tried with period_size = 1024 frames, and buffer apparently 2x that - that still does a full duplex drop).

The only thing I can think of is the jitter (shown on the .gifs): the (hr)timer functions can easily be off by 30 or more microseconds, which is on the order of a sample (frame) period (1/44100 ~= 22.6e-06). So for a period on order of 1ms, error would be 22e-6/1e-3 = 0.022 => 2.2%; while for period of 16000 frames, period time is 16000/44100 = 0.362812 = 362.8 ms; so error there is less: 22e-6/362e-3 = 6.07735e-05 => 0.006%. Beyond this, is there any other reason why small buffer/period size is likely to result in over/underruns?


>>>>> Your driver's .pointer callback must report the *actual* position at
>>>>> which the hardware has finished reading from the buffer
>>>
>>> ... for a playback stream, or finished reading, for a capture stream.
>>
>> What if the pointer granularity is very coarse? E g, some hardware might
>> only be able what period you're in (IIRC, I've seen this on the Tegra
>> platform), rather than the actual sample. Would you recommend to report
>> the latest period boundary in that case, or interpolating it with timers?
>
> By reporting position x, the driver guarantees that the device has
> finished reading (for a playback stream) before x, and that the
> application is allowed to overwrite the buffer before x with new sample
> data.
>
> When the driver does not know the current position of the DMA
> controller, it must report the last known 'safe' position (and set
> SNDRV_PCM_INFO_BLOCK_TRANSFER).

An essential comment, much appreciated - I was not clearly aware of the "finished" aspect, and in particular of the "overwrite" aspect; will definitely take that into account from now on.

With that in mind, could the following be said?: In a virtual driver context, given that there is no underlying hardware to speak of, a report of .pointer at position x is merely an _assertion_ from the driver to ALSA: "Hey, I checked the hardware, this position x has already been processed by the stream"


>> On the PC side, the driver's .pointer callback can be triggered both
>> by userspace call to `snd_pcm_readi`, and (apparently) independently
>> of it - but (surprising for me) it is not necessarily periodic!
>
> The .pointer callback is triggered by snd_pcm_period_elapsed() (because
> some more data, or even another period, might have been transferred in
> the meantime), whenever userspace writes or reads samples, or whenever
> userspace feels like asking for the current position.
>

Good to have this confirmed! By the way, I was experimenting with capturing the behavior of the original `dummy` in the meantime - and I think it nicely illustrates "some more data, ... might have been transferred in the meantime", but in a virtual driver. Here is an animated gif:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/capttest_03.gif
    http://sdaaubckp.sourceforge.net/post/alsa-capttest/_cappics03/ (source images/PDFs)

More verbose discussion in comment (*c3) at end of this mail.


>> once the data is in the `dma_area`, the rest of the ALSA engine will
>> make sure that data ends up in `audiobuf` in user-space, upon a call
>> to `snd_pcm_readi` as given above.
>
> Yes.  (When using snd_pcm_mmap_*, the dma_area is mapped to userspace,
> and ALSA itself will never access the contents of the buffer.)
>

Thanks for this - it's great to have confirmed, what is supposed to be mapped to what! :)

By the way, I'm still not sure if mmap could also influence the occurence of these full duplex drops - while I can enforce use of `snd_pcm_readi` (as in the `captmini.c` test), the `dummy` driver does declare SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_MMAP_VALID - and I'm not sure if this may be a signal to PortAudio (when using the `patest_duplex_wire.c` test) to use the mmap'd versions of snd_pcm_ functions; I should check. Since in my version `dummy-2.6.32-patest.c`, I insist on memsetting the capture `dma_area` from the timer function tasklet; I guess it could well be, that interferes with proper mmap operation - thus making that version of the driver less robust (as mentioned in comment (*c3)) to "full-duplex drops", than the original `dummy`?


>>> The card then, uses [its] own clock, which is not burdened with
>>> anything else but filling buffers, meaning we can expect tight timing
>>> here; and when it generates an IRQ, it is handled by kernel with
>>> highest priority - ergo, not so much jitter. Timer functions, on the
>>> other hand, run in softIRQ context - meaning they (and their
>>> scheduling) could be preempted by the hardware IRQ of any other device
>>> on the system; ergo, more jitter. Is this reasonable to assume?
>>
>> Yes, but other hardware interrupts interfere only if other devices are
>> used at the same time.
>
> Also, it applies to kernel-space only. If you want to process anything
> in userspace, you can still be interrupted by any kernel process -
> hardIRQ, softIRQ or even other kernel tasks.

Got it - I take my development PC into consideration, and it's a netbook with a touchpad, USB mouse and ethernet network; I guess any of this could, in principle, cause hardware IRQ interference - however, I do take that as not very likely (or at least, not crucial to the full-duplex drop problem).


>> [...] where the `azx_interrupt` handler has been called (since it's
>> also possible to capture interrupt entries for power, for instance);
>
> The stream's SD_STS register tells whether this is an interrupt because
> a period boundary has been crossed.
>

Thanks for this too - as I don't really understand `hda-intel.c`, the only reason I emphasized this function is that I saw `snd_pcm_period_elapsed` called from there (it's also called from `azx_irq_pending_work`, but judging by that name, I doubted it'd have shown any periodic behavior on a plot). Now reading the `azx_interrupt` function makes a lot more sense.


>> when I see "wall clock" referred to in code, does that refer to
>> this, which I've called "Real Time"?
>
> It's called "wall clock" because Intel named the register this way;
> actually, it's the device's sample clock.
>

Heh - thanks for this, would never have guessed! :)

By the way, I just realized that `snd_pcm_update_hw_ptr0` in `sound/core/pcm_lib.c`, also refers to a SNDRV_PCM_INFO_HAS_WALL_CLOCK, (which in `include/uapi/sound/asound.h` has comment: "/* has audio wall clock for audio/system time sync */") and `substream->ops->wall_clock` (as part of `snd_pcm_ops` in `sound/pcm.h`). This part of the structure would again refer to a device's sample clock (if the device declares _HAS_WALL_CLOCK), right?


Before I wrap up, just to mention that I experimented a bit with PortAudio (`patest_duplex_wire.c`) and original `dummy` driver; and have a bit more on the conditions that trigger the full-duplex drop in comment (*c4) - it seems `snd_pcm_delay` is critical there, but even that can apparently be boiled down to .pointer positions.


Many thanks again for the excellent discussion; and I really hope also this mail will attract the same level of scrutiny - the feedback on where I'm going wrong, has really helped me get some of my misconceptions cleared,

Cheers!


:: Verbose comments:

(*c1) I've been working on http://imi.aau.dk/~sd/phd/index.php?title=AudioArduino , which should be understood as an academic exercise, simplified enough to serve as a practical introduction to soundcard operation - even if one misses a lot of details, like proper understanding of, say, PCI. That driver works using timer functions, and by basically trusting that `ftdi_write()` and the `ftdi_process_packet()` will do their thing in time: when the timer function hits for playback, I `ftdi_write` ammount of bytes per period from `dma_area` and consider it "played"; `ftdi_process_packet` is fired as interrupt from `ftdi_sio`, I collect that data in intermediary buffer - and when timer hits for capture, I `memcpy` ammount of bytes per period from intermediary to `dma_area`. All I've had to pay 
 attention to here was memory allocations, and having proper buffer/period wrap pointer arithmetic. And even with this simplified understanding, I can program the Arduino to echo back every b
yte it received from serial (what I call "digital duplex test"), load up a file in Audacity, press record, and see the played file be echoed back inside in the capture - never experienced any full-duplex drops (like here). And that driver doesn't even use hrtimer - it uses systimers, which can be unreliable up to a period of a jiffy (4ms in my case)! However, that driver works only with 44100/8bit/mono streams.

I now want to see if I can do this "digital duplex" test, but for 44100/16b/2ch (assuming the 2 Mbps bandwidth of `ftdi_sio` and Arduino will be enough). After realizing that I cannot use systimers anymore ( see http://stackoverflow.com/q/16920238/277826 ), and the switch to hrtimers, I started getting these full-duplex drops from PortAudio - even if observing the stream through an analyzer (at points TX and RX on the Arduino) would reveal that there are no interruptions in the stream, and that each byte in the played sequence is correctly "reflected"! So I thought - OK, must be something in my .pointer arithmetic is wrong, let's compare to something that works. So I tried `hda-intel`, it indeed doesn't do full-duplex drop - but since I don't really know what sort of a hardware it is, read
 ing its driver's source doesn't help me much. Then I thought - well, let's try `dummy` - given that I took the timer approach from there, analyzing it will hopefully reveal what is wrong wit
h my driver. So I just made the small modification described earlier (.pointer position calculated in tasklet, and writing of pulses in capture `dma_area`), and tried it. And imagine my surprise when `dummy` turned out to trigger these full-duplex drops in PortAudio as well! Today I just tried the original `dummy` 2.6.32 (which doesn't manipulate `dma_area` at all, and calcs .pointer position in .pointer) - with just an extra `trace_printk()` in .pointer - and while far more robust that my modified version (e.g. my version drops any time I switch workspace in Gnome with Ctrl-Alt-arrows; the original doesn't), it _still_ triggers a full-duplex drop!

So, now that my idea of using `dummy` as a comparison point has broken down, I'm actually genuinely interested in `dummy` for its own sake: given that it does nothing but increase .pointer (thus, very little CPU overhead which could influence things), and there is no hardware in respect to which we would calculate an (im)proper operation - how on earth can it trigger a full-duplex drop in PortAudio at all? And why - what is the condition that triggers it, then? Of course, I eventually hope that by understanding this, I'll be able to apply the conclusions to my 44100/16b/2ch AudioArduino case - but for now, I'd really like to have a better understanding of why this drop occurs in the context of a virtual driver to begin with. One problem could be, that so far I've taken that userspace has t
 o decide whether operation is proper solely based on what .pointer reports - there are likely other variables at play here, too; most importantly, `snd_pcm_delay` -see (*c4).


(*c2) I guess, what I was trying to emphasize, comes from my experience with AudioArduino (and conversely, lack of experience with actual PCI/DMA cards). Namely, with AudioArduino (see also (*c1)), essentially I only had to relate to one formula:

    bytes_per_period = (rate_bytes/1[s])*(period_time[s])

So, in that case, I knew I had 44100Hz/8b/mono, which translates to rate of 44100 Bytes/s. I also knew I was going to use systimer functions, set at a period of a jiffy (on my platform, 4ms) -> so, bytes_per_period = 44100*4e-3 = 176.4 ~= 176. So, at each timer function:

* for playback - I `ftdi_write` 176 bytes from `dma_area`, and increase `->pcm_buf_pos` by 176
* for capture - I memcpy 176 bytes from intermediate buffer to `dma_area`, and increase `->pcm_buf_pos` by 176

... and in .pointer, I return `bytes_to_frames(->pcm_buf_pos)` for either direction.

So - as I saw it at that point - there is no other information about the status of the transfers, other than the increase of the .pointer positions! Now, it's well a standing question if this is the _best_ way to solve this driver - for instance, maybe I should have used .copy/.silence callbacks (instead of dealing with `dma_area` inside the timer functions). However, it was _good enough_, in that I never experienced a full-duplex drop (nor other problems) in Audacity with it. So I reasoned: it must be that ALSA keeps track of time; then the only thing it needs, so as to check whether constant rate is held, is to see that pointer positions increase for the right ammount. And since here I explicitly write that ammount in each direction (playback or capture) in the driver - ALSA is kept "hap
 py", and that propagates to userspace.

Note that from this, to a pure virtual driver, there is only one step - one simply stops using the `ftdi_` functions, and stops manipulating the `dma_area`s. (not surprising, given that the whole timer function technique I saw first in `dummy`). That is what made me believe, that there is nothing else but the .pointer positions, that have an influence the on a proper streaming operation (as ALSA would see it); when I said I can "return whatever I want", I really meant - I can explicitly return a position increase, that should match the requested transfer rate, as per the above formula (which, as I see it, should "cheat" ALSA that all is fine).

However, I make a virtual driver with timer functions, set it to 44100Hz/16b/stereo (= 176000 Bytes/s, so four times the transfer rate of AudioArduino) - and I observe a full-duplex drop in PortAudio. So, obviously the model, where the only thing that matters is the above formula in context of timer functions, breaks down for this rate (on my platform at least) - apparently starting to break some timing constraints, maybe not necesarilly in ALSA, but definitely in PortAudio. And I'd like to learn more precisely what condition causes this breakage. In (*c4) I can see `snd_pcm_delay` has an influence - but seemingly, it also relies in great part on reported .pointer position.


(*c3) A capture of the original `dummy` driver, with `captmini.c`:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/capttest_03.gif
    http://sdaaubckp.sourceforge.net/post/alsa-capttest/_cappics03/ (source images/PDFs)

Basically ((in simplified terms, given `snd_pcm_update_hw_ptr0` is more complex, and does delay and XRUN calculations)): when .pointer is called from `snd_pcm_update_hw_ptr0`, it seems it is called repeatedly in quick succession, until at least `hw_ptr` for the substream matches the .pointer position. With the original `dummy` driver, since it calculates delays directly in the .pointer callback - it's values can keep on increasing by 1 or frames, while the rest of the code is updating `hw_ptr`; and so .pointer called to update again, and this goes on (apparently) until `snd_pcm_update_hw_ptr0` is satisfied - that it got enough frames according to the time expired, - which can be 5 or more times in quick succession. My modification `dummy-2.6.32-patest.c`, since it calculates the .pointer p
 osition in the timer tasklet, typically stays unchanged when `snd_pcm_update_hw_ptr0` inquires, so .pointer from there is called max 2 times in quick succession ( e.g. as on http://sdaaubckp
.sf.net/post/alsa-capttest/montage-dummy.png ).

But then, there is something strange again - if you focus on the top part of capttest_03.gif, which shows the behavior of the original `dummy` driver, it shows a somewhat comparable jitter to `hda-intel` - which is apparently much better than what my modified `dummy-2.6.32-patest.c` showed (on http://sdaaubckp.sourceforge.net/post/alsa-capttest/capttest.gif top part). I find this surprising, because - regardless of a) if the .pointer position is calculated in the tasklet or in .pointer; or b) if the tasklet had anything more to do than just call `snd_pcm_period_elapsed` - this should not have an influence on when the timer softirq first fires, since _both_ drivers re-schedule the timer function (`dummy_hrtimer_callback`) as first thing when it enters, and only then schedule a tasklet?

Having noted this, and looking back at http://sdaaubckp.sf.net/post/alsa-capttest/montage-hda-intel.png , I would have found the triplet of `.pointer` at about 1.20 ms also somewhat strange - namely, the .pointer has increased from 41 to 49 (causing the third call to .pointer from `snd_pcm_update_hw_ptr0`, until it ultimately syncs) - but there is no card IRQ running at the time? Then what could have changed the .pointer, which for `hda-intel` could be something like `azx_dev->posbuf`? This, of course ties in with my earlier (wrong) understanding that only card IRQ initiates a copy to main capture buffer; but if this "copy" happens transparently via DMA (like in my speculative breakdown above), then I see where this update - "more data transferred in the meantime" - could come from.


(*c4) Having experimented with `patest_duplex_wire.c` (set to 512 frames per period), a debug version of PortAudio, and the original `dummy` 2.6.32 driver (with but a `trace_printk` in its .pointer function); I realized that when a full-duplex drop occurs, the PortAudio log looks like this:

    ...
    Pa_IsStreamActive returned:
      PaError: 1 ( Invalid error code (value greater than zero) )
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:507 mrg:251  <<<<<<<<
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:498 mrg:242
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:495 mrg:239
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:492 mrg:236
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:490 mrg:234
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:487 mrg:231
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:484 mrg:228
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:481 mrg:225
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 6 dly:478 mrg:222
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 5 dly:475 mrg:219
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 5 dly:472 mrg:216
    ...
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 1 dly:261 mrg:5
    ContinuePoll: Trying to poll again for playback frames, pollTimeout: 1 dly:258 mrg:2
    ContinuePoll: Stopping poll for playback
    PaAlsaStream_WaitForFrames: full-duplex (not xrun): Drop input, a period's worth - fra:769
    ContinuePoll: Stopping poll for capture
    CallbackThreadFunc: Input underflow fra:777 urn:0 orn:0
    CallbackThreadFunc: Input underflow fra:265 urn:0 orn:0                                 >>>>>>>>
    play index = 45056 ; rec/capt index = 88064
    Pa_IsStreamActive called:
      PaStream* stream: 0x0x99037e0
    Pa_IsStreamActive returned:
      PaError: 1 ( Invalid error code (value greater than zero) )
    ...

When a `patest_duplex_wire.c` completes successfully (without a drop), it turns out the section between the ">>>" and "<<<" is never in the logs - meaning the "ContinuePoll" messages, in addition to the "Drop input" and input underflow messages, are also a sign of a drop. (NB: It seems there is also such correlation between `ContinuePoll` and the full-duplex drop in logs from starting post of this thread; although `ContinuePoll` there can also appear in a context of an XRUN.)

So I looked into `src/hostapi/alsa/pa_linux_alsa.c`, and `ContinuePoll` does this:

    ... snd_pcm_delay( otherComponent->pcm, &delay )  ...
    ...
    if( StreamDirection_Out == streamDir ) {
      /* Number of eligible frames before capture overrun */
      delay = otherComponent->bufferSize - delay;
    }
    margin = delay - otherComponent->framesPerBuffer / 2;
    if( margin < 0 ) { ...
      PA_DEBUG(( "%s: Stopping poll ....
      *continuePoll = 0;
    } else if( margin < otherComponent->framesPerBuffer ) {
      *pollTimeout = CalculatePollTimeout( stream, margin );
      PA_DEBUG(( "%s: Trying to poll again for %s frames, pollTimeout: %d dly:%d mrg:%d\n",
                  __FUNCTION__, StreamDirection_In == streamDir ? "capture" : "playback", *pollTimeout , delay, margin )); // modded
    }

So, apparently `snd_pcm_delay` is being used for calculation of `margin` and `delay` - and should the `margin` drop to below zero, the poll is stopped. Apparently, this polling is started, ended and continued from `PaAlsaStream_WaitForFrames`, where there is the following loop:

    while( pollPlayback || pollCapture ) {
      ...
      /* @concern FullDuplex If only one of two pcms is ready we may want to compromise between the two.
       * If there is less than half a period's worth of samples left of frames in the other pcm's buffer we will
       * stop polling.
       */
      if( self->capture.pcm && self->playback.pcm ) {
        if( pollCapture && !pollPlayback ) {
          PA_ENSURE( ContinuePoll( self, StreamDirection_In, &pollTimeout, &pollCapture ) );
        } else if( pollPlayback && !pollCapture ) {
          PA_ENSURE( ContinuePoll( self, StreamDirection_Out, &pollTimeout, &pollPlayback ) );
        }
      }
    } // end while

So, `ContinuePoll` can set `pollCapture` or `pollPlayback` (via `*continuePoll`) to 0, which will break the while loop - and right after this while loop, is the `if( !xrun ) ...` check, cited in the starting post of this thread, which determines the full-duplex drop in PortAudio.

In other words, now it looks like it is a `snd_pcm_delay` check failure, that triggers the full-duplex drop in PortAudio. And as a reminder, this check fails for "otherComponent" stream - specifically, full-duplex drop happens if we're in full-duplex mode (so both capture and playback are running), and **playback** is not ready; since the condition for the full-duplex drop is: `if( self->capture.pcm && self->playback.pcm ) { if( !self->playback.ready && !self->neverDropInput ) ...`. (( I'm still not clear on what sets `playback.ready` to 0 - `ContinuePoll` apparently doesn't ))

So, given I haven't met `snd_pcm_delay` by now, I think I should look more into it. The docs say: "For playback ... It is as such the overall latency from the write call to the final DAC. For capture ... It is as such the overall latency from the initial ADC to the read call.", which I have a problem translating to virtual driver context (given there is no actual ADCs nor DACs). However, I found this thread:

    "[alsa-devel] What does snd_pcm_delay() actually return?"
    http://mailman.alsa-project.org/pipermail/alsa-devel/2008-June/008421.html
    > In the driver implementation level, snd_pcm_delay() simply returns the
    > difference between appl_ptr and hw_ptr.  It means how many samples are
    > ahead on the buffer from the point currently being played.
    > However, if you stop feeding samples now, snd_pcm_delay() returns the
    > least time XRUN occurs. [...]
    > The implementation of snd_pcm_delay() (at least in the driver level)
    > purely depends on the accuracy of PCM pointer callback of each
    > driver.  So, if the driver returns more accurate hw_ptr via pointer
    > callback, you'll get more accurate value of snd_pcm_delay().  In the
    > worst case, it may be bigger up to one period size than the real
    > delay.

... so one way or another, it boils down to appl_ptr, hw_ptr - and what is being returned as .pointer position. I think the next thing, is to see what triggers the "ContinuePoll" altogether - since its seems its presence in PortAudio debug logs is not really a good sign.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (playback)
  2013-07-25  8:37     ` Clemens Ladisch
  2013-08-04  0:05       ` Smilen Dimitrov
@ 2013-08-14 14:30       ` Smilen Dimitrov
  2013-08-15  4:17         ` Raymond Yau
  2013-09-13  6:23       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (full-duplex: latency.c) Smilen Dimitrov
  2013-10-21 14:48       ` [Solved] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex Smilen Dimitrov
  3 siblings, 1 reply; 16+ messages in thread
From: Smilen Dimitrov @ 2013-08-14 14:30 UTC (permalink / raw
  To: alsa-devel

Hi all,

Since the PortAudio full-duplex problem I have, depends on both the capture and playback direction - I thought I'd also look into playback, again by comparing the `hda_intel` and `dummy` ALSA drivers, and post a writeup on it (given that only the capture direction was discussed so far). It would definitely be nice to get some feedback on it, so I know where am I going wrong with this - and apologies again for the verbosity.

This was greatly motivated by the following essential comment by Clemens, earlier in the thread:

>>> On 2013-07-25 10:37, Clemens Ladisch wrote:
>>>> Your driver's .pointer callback must report the *actual* position at
>>>> which the hardware has finished reading from the buffer
>> ... for a playback stream, or finished reading, for a capture stream.

I would restate that comment in a slightly stronger language: the playback and capture operations, although similar in many respects, are **fundamentally** different. I, unfortunately, couldn't appreciate the comment in its fullness, until I had some code for tests, and resulting pictures to look at :) These test scripts and images, which will be referred to below, are again posted at the link below (see the slightly updated Readme there):

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/


One of the misconceptions that I had earlier, was that in ALSA, the proper streaming operation essentially boils down to _only_ what the .pointer function returns for each streaming direction. That is probably still correct in a sense - but I guess, it is more correct to say, that: the proper streaming operation in ALSA is essentially determined by three variables/parameters/properties:

* the value returned by the .pointer function (the .pointer position);
* the hw_ptr; and
* the appl_ptr

... of each stream; all of them expressed in units of frames. I think I implicitly took them to mean the same, because they are named the same - however, since the playback and capture direction are *fundamentally* different, so is the meaning of these variables, depending on which stream direction they are attributed to:

* capture:
** .pointer - actual position at which the hardware has completed capturing (finished reading)
** hw_ptr   - follows .pointer - but late: only after a call to `snd_pcm_update_hw_ptr0`
** appl_ptr - follows hw_ptr - but later: only after `snd_pcm_readi` (or its ioctl) has returned the given amount of bytes to userspace
* playback:
** appl_ptr - how many frames has application already written to ALSA (after `snd_pcm_writei` returns)
** .pointer - actual position at which the hardware has finished reading from ALSA playback buffer (a.k.a. number of samples already played back)
** hw_ptr   - follows .pointer - but late: only after a call to `snd_pcm_update_hw_ptr0`

Also, .pointer is a variable that wraps at PCM buffer_size (in frames); hw_ptr and appl_ptr are cumulative (although they'd of course wrap too, if they hit the size of the unsigned long integer they are stored in). Additionally, for either direction, these kernel/driver variables are exposed to userspace via `snd_pcm_delay` and `snd_pcm_avail` functions/variables (although it seems possible to retrieve hw_ptr and appl_ptr directly by calling the `snd_pcm_status()` function, and retrieving `snd_pcm_status_t`/`snd_pcm_status` structure).


It should be mentioned here, that using `snd_pcm_readi`/`snd_pcm_writei` is just one way/method of interacting with ALSA from userspace; there are, as far as I can see, at least five. I first became aware of this stumbling upon http://alumnos.elo.utfsm.cl/~yanez/alsa-sample-programs/ , which mentions METHOD_DIRECT_RW, METHOD_DIRECT_MMAP, METHOD_ASYNC_RW, METHOD_ASYNC_MMAP, METHOD_RW_AND_POLL. But then, I realized there is a userspace program in `alsa-lib/test/pcm.c`, which refers to methods: "write", "write_and_poll", "async", "async_direct", "direct_interleaved", "direct_noninterleaved", and "direct_write"; where all but the first three are mmap-based. I have chosen `snd_pcm_readi`/`snd_pcm_writei` for the tests, in order to keep the kernel debug log acquisitions as short and simple as possible (so as to make their filtering for plotting easier) - however, it's notable that PortAudio uses a poll-based approach instead.

Ignoring these other approaches for now - when using `snd_pcm_readi`/`snd_pcm_writei`, I guess the only hint that the application has of .pointer/hw_ptr/appl_ptr as a whole, is through the number of frames returned for that request. So, say we have this for capture:

    ret_frames = snd_pcm_readi(capture_pcm_handle, audiobuf, 32);

With this, userspace has requested 32 capture/record frames from ALSA. When the function returns, the returned `ret_frames` can be:

* ret_frames = 32       - exactly the amount which was requested; which means no problem
* 0 <= ret_frames < 32  - less than requested 32; which means that an input underflow, or capture underrun, has happened - which may be possible to correct later
* ret_frames < 0        - a negative number; which means an outright error has happened, which is unrecoverable

When we look at playback:

    ret_frames = snd_pcm_writei(playback_pcm_handle, audiobuf, 32);

... the meaning of the returned `ret_frames` is nearly the same as in the capture case - except here, for playback, 0 <= ret_frames < 32 means that:

* the userspace app requests to write 32 frames to ALSA's playback buffer;
* ALSA managed to write, say, 16 frames of the 32, and the buffer ended up full - so ALSA returns 16 as `ret_frames`;
* since in this case, userspace wrote _more_ frames than ALSA could handle, this is an output overflow, or playback overrun, condition.

For completion, 0 <= ret_frames < 32 for the capture case means that:

* the userspace app requests to read 32 frames from ALSA's capture buffer;
* ALSA managed to read, say, only 16 frames, and it hit the limits of the buffer, beyond which there is no more data to read - so ALSA returns 16 as `ret_frames`;
* since in this case, userspace got _less_ frames from ALSA than it requested, this is an input underflow, or capture underrun, condition.

Note that here, it is likely that ALSA decides what to return as `ret_frames`, based on previous appl_ptr and .pointer/hw_ptr; however, the value that is returned as `ret_frames`, soon enough also becomes the new value of appl_ptr, for a given stream direction (playback or capture).


With this in mind, here is my experience trying to profile the playback operation. I was using the `playmini.c` file in the `alsa-capttest` folder, again ran by `run-alsa-capttest.sh` to obtain kernel debug logs and plots. I first started the same way as I did in `captmini.c` - that is (essentially):

  ... // enable ftrace logging
  ret1 = snd_pcm_writei(playback_pcm_handle, audiobuf, period_chunksize_frames);
  ret2 = snd_pcm_writei(playback_pcm_handle, audiobuf, period_chunksize_frames);
  ... // disable ftrace logging

But, that resulted with acquisitions which are rather short - and only a single .pointer firing would be captured, e.g. as on

    http://sdaaubckp.sf.net/post/alsa-capttest/_cappics04/captures-2013-08-10-02-25-46-shp-trace-both.pdf

The blue lines, showing the calls from userspace, indicate that `snd_pcm_writei` could return in 150 μs (for `dummy-patest`, my version of `dummy`) to 290 μs (for `hda-intel`); this is far shorter than the expected period of 32/44100 ~= 726 μs - which otherwise seems to have been the (average approximate) time taken by `snd_pcm_readi` to return (in the capture direction case)! Now, what is confusing to me here, is the issue of blocking; note the docs say:

    ALSA project - the C library reference: PCM (digital audio) interface
    http://www.alsa-project.org/alsa-doc/alsa-lib/pcm.html

    > In blocked behaviour, these I/O functions stop and wait until there is
    > a room in the ring buffer (playback) or until there are a new samples
    > (capture).
    >
    > The ALSA PCM API uses a different behaviour when the device is opened
    > with blocked or non-blocked mode. The mode can be specified with mode
    > argument in snd_pcm_open() function. The blocked mode is the default
    > (without SND_PCM_NONBLOCK mode).

At first read, to me this would mean, that both `snd_pcm_writei` and `snd_pcm_readi` would block, until their request for N (say, period_size, here 32) frames (either for writing or for reading) has been honored; or in other words, they should both block for approximately the period_size time (726 μs for 32 frames @ CD-quality). However, the debug logs show that `snd_pcm_writei` can return in far less time - in nearly a fifth part of the period; so quite obviously, `snd_pcm_writei` doesn't block for the entire period_size time.

At second glance, it does say "until there is room in the ring buffer (playback)". In other words, this "ring buffer" probably refers to the `dma_area` of the playback stream. In effect, what we're talking about here is blocking until data from userspace is transferred to the ALSA driver's `dma_area`. And this `dma_area` is ultimately kernel space of the same PC, and so a copy from userspace to kernelspace, is indeed likely to complete relatively fast - since for this direction (playback), it doesn't need to refer to the card (external hardware) at all! In other words:

* Userspace starts with `snd_pcm_writei`
* `snd_pcm_writei` starts blocking
** Kernel receives the `snd_pcm_writei` / playback ioctl, checks and sees that (say) the playback `dma_area` is empty,
** thus kernel accepts the bytes from userspace, writing them into `dma_area`
** Given that `dma_area` is part of the same kernel, this copy completes relatively fast
* `snd_pcm_writei` stops blocking, and returns the copied amount of frames

In contrast, in the capture direction, `snd_pcm_readi` blocks until N (say 32) frames are available - but whether those frames are available, depends ultimately on the card hardware; and since we have a sampling rate specification, those frames cannot be delivered by the card any faster than period_time=period_size/rate; thus blocking for at least period_time in the capture direction is implied by default.


So, assuming the above is correct, I first tried a bit with `snd_pcm_wait`, for which the docs say:

    http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
    Wait for a PCM to become ready.

But, apparently, the "PCM to become ready" for the playback direction, again is in reference to whether there is space in `dma_area`. So if you start with buffer_size 64 frames; and you do a first `snd_pcm_writei` successfully with 32 frames; you're still left with space of 32 frames to write into in the playback buffer. So if you run `snd_pcm_wait` here, it returns fast - because there is space in the buffer; it doesn't wait for the buffer to finish playing! So this still didn't help me get a better debug acquisition - this is (somewhat) documented in the `doPlayback_v01()` function in `playmini.c`, which is the source for this .gif:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/capttest_04_shp.gif

So, the typical response I'd get from the `playmini.c` program, for two consecutive `snd_pcm_writei`, would be that the first 32 frames of `writei` were OK, the second were "Broken pipe" (-32); this being for the `hda_intel` driver (note that on the gifs/pdfs, the `dummy` diagram is on top, `hda_intel` on bottom - but on the text overlay, its the opposite: the first two lines are for `hda_intel`, and the second lines for `dummy`). And during these tests, the `dummy_patest` driver would return 32/OK for both `_writei` calls - but even this caused captures so short, that I cannot see the timer function run even once (which made things really confusing for me, in the sense that: if both `writei`s returned successfully, before the timer function even had a chance to increase .pointer - what made them complete with success then?). Note that more complex userspace ALSA code (like the one in PortAudio) usually performs a write, followed by a poll of the file descriptor - never two w
rites one immediately after another, like attempted here.

It's also notable that the playback stream gets started "for real" only upon the first `writei` command - only then does the _kernel_ `snd_pcm_start()` function get called. If we try to call `snd_pcm_start(playbck_pcm_handle)` from userspace before the first `writei`, that one will simply call something like `snd_pcm_pre_start` in the kernel, (see the .csv source for e.g.:

    http://sdaaubckp.sf.net/post/alsa-capttest/_cappics04/captures-2013-08-10-09-56-34-shp-adsws-trace-both.pdf

) - and then wait until the first `writei`, to call the kernel `snd_pcm_start()`.


By trial and error, I eventually realized that a specific delay (first introduced by multiple `fprintf's`) between the two `snd_pcm_writei`s, makes the `playmini.c` much more likely to complete both writes with success. So I thought I'd look into that - and while at first I thought I'd have to somehow "trigger" .pointer updates from userspace (say by calling `snd_pcm_avail_delay`), it turns out that just a delay - here done via `nanosleep` - is enough. So I used the function `doPlayback_v02()` in `playmini.c`, and additionally used the script `playdelay.sh` to re-run 100 tests of playmini.c and log how many of them fail - for each delay step of 10 μs up to 1 ms; and the corresponding image looks like this:

    http://sdaaubckp.sf.net/post/alsa-capttest/playdelay-hda-intel_v02.png

This image also shows the mean and median of playbacks' `avail` frames; the median pb_avail frames values are: 37, 33, 41, 49, 57, [0] - it's visible that the mean attempts to track this sequence too. A "sweet spot" for the `nanosleep` between the two `snd_pcm_writei` commands, where the number of errors is minimal (here 1 error in 100 runs), is visible - and it "moves" slightly depending on whether `snd_pcm_avail_delay` was used (280 μs - as above) or not (310 μs - notably close to 363 μs, which is half the period time for this case, 726 μs):

    http://sdaaubckp.sf.net/post/alsa-capttest/playdelay-hda-intel_v03.png

... which, in turn, implies that `snd_pcm_avail_delay` costs about 30 μs on this platform. I also noticed here, that every time something like `snd_pcm_avail_delay` or `snd_pcm_status` is used either before or after the "sweet spot" zone, it will trigger an XRUN, which afterwards propagates to all subsequent ALSA calls; the only command that doesn't do this is `snd_pcm_avail_update`, which is described in the docs as "light": "The position is not synced with hardware (driver) position in the sound ring buffer in this function. This function is a light version of snd_pcm_avail()." In relation to this, I also found this (massive) alsa-devel thread from 2008:

    "Re: What does snd_pcm_delay() actually return?"
    http://thread.gmane.org/gmane.linux.alsa.devel/53841/focus=54050

    > The period-based refresh makes it hard to use the fifo effectively.  If
    > the card fifo is allowed to 'suck' all the data from the ringbuffer then
    > it makes it look like an underrun. Also it makes time appear to run fast
    > until the fifo is filled up.
    >
    > The 'fast time' creates problems for ALSA on playback start, because
    > alsa assumes that it will take a whole period for a period of data to be
    > consumed, while the driver is capable of consuming multiple periods
    > almost instantly.  In my driver I have to throttle the rate that data is
    > transferred to the card fifos.

    http://thread.gmane.org/gmane.linux.alsa.devel/53841/focus=53984

    > Yes, this is exactly what I am experiencing. At stream start my
    > estimations (based on update_avail) are way off. Afterwards everything
    > is fine. As a dirty workaround to fix this I halve the initial sleep
    > time always so that I can make sure I don't sleep for too long and get
    > an xrun. But that's really ugly, because halving it is just a wild
    > guess and it isn't even necessary on PCI hardware.

It seems these quotes refer to something similar that I'm seeing with two `writei`'s in a row (need for a specific sleep, possibly half a period, to get to the "sweet spot"); but I cannot tell for sure right now.


Anyways, knowing this delay, I finally came to this piece of (here, pseudo) code:

  ... // enable ftrace logging
  ret1 = snd_pcm_writei(playback_pcm_handle, audiobuf, period_chunksize_frames=32);
  nanosleep(310 μs , ...);
  ret2 = snd_pcm_writei(playback_pcm_handle, audiobuf, period_chunksize_frames=32);
  snd_pcm_drain(playbck_pcm_handle);
  ... // disable ftrace logging

... which represents the essence of the `doPlayback_v03()` function in `playmini.c`, used to obtain most of the other playback related acquisitions/plots. To begin with, here is an animation of all those `playmini` test runs, which completed successfully for both `dummy-patest` and `hda-intel` drivers:

    http://sdaaubckp.sf.net/post/alsa-capttest/capttest_04.gif

Basically, in the above code, we know that the first `writei` call will succeed for sure - because it's the first call, and at that time, the playback `dma_area` is empty - and will return quickly. With the `nanosleep` we ensure that the second `writei` call will be in the "sweet spot" - and hopefully, also succeed, and thus return quickly. Since we have no intention of writing any further data, we let ALSA know that by calling `snd_pcm_drain`, for which the docs note: "For playback wait for all pending frames to be played and then stop the PCM.". It was also called in the code before - but here it is specifically added, so it blocks until card has finished playback before stopping ftrace logging - so we can obtain a complete debug log acquisition of the playback process. And, it does work indeed - because, at least, we start getting timers/interrupts firing in the debug logs, as shown in the `capttest_04.gif`.

I found the behavior of playback for `hda-intel` somewhat surprising at first, because it is quite different from the capture case; compare the cases of:

    http://sdaaubckp.sf.net/post/alsa-capttest/capttest.gif     (capture  - `hda-intel`: bottom plot)
    http://sdaaubckp.sf.net/post/alsa-capttest/capttest_04.gif  (playback - `hda-intel`: bottom plot)

In the capture case, there are three IRQs fired; starting almost immediately after `readi`, they basically delineate the time taken by two periods. In the playback case, we have four IRQs fired: again the first one fires soon after the first `writei` - but the second fires after only _half_ a period, not after a period like in the first case! From this point on, however, the 3rd and 4th IRQ _do_ fire after a period!


So, one of these playback test run acquisitions/logs (captures-2013-08-11-05-15-21) was taken to be the source of yet another annotated montage; first, for the `hda-intel` driver it is:

    http://sdaaubckp.sf.net/post/alsa-capttest/montage-hda-intel-p.png  (also .pdf)

While I have learned that modern cards do not have on-board buffers, I have still drawn an "intern playback"  buffer for the "Card Time" axis, because I think it could be a useful tool in understanding what should happen. Here's my speculative breakdown, on what (I think) happens here:

* The first `snd_pcm_writei` fires; right before it, the playback `dma_area` is "empty"
* ALSA then starts the process via kernel `snd_pcm_start` soon after
* Approx 50 μs after that (or about 100 μs after `writei` first fired), card responds with an IRQ
** Strangely enough, this first IRQ does *not* trigger a .pointer !
* At about the same time, `snd_pcm_writei` probably returned with 32 frames written;
** so already at about this time, we can count on appl_ptr being set to 32 (or `dma_area` is "half full"; hence another drawing of the "buffer")
** Also about this time, the `nanosleep` (not drawn) in userspace should start
* Some 332 μs (approx half a period time, which is 726/2 = 363 μs) after the first card IRQ, the second card IRQ fires
** this one apparently informs ALSA that playback has started (so .pointer would be at 0 here) - because also here, .pointer is _not_ fired in the context of the IRQ handler
* Time goes by, `nanosleep` has expired, and the second `snd_pcm_writei` is fired in userspace
* Soon after that, .pointer is called for the first time, in the context of the playback ioctl handler
** The values seen by the first pointer are (in frames): hw_ptr = 0, .pointer = 17, appl_ptr = 32; engine sees hw_ptr < .pointer < appl_ptr = 32
** hw_ptr would become = .pointer (=17) very soon after
** so at this point, engine sees 0 < hw_ptr=17 < appl_ptr=32 - which is probably seen as a good sign: hw_ptr is where it's supposed to be after approx half a period; it still hasn't gone over appl_ptr yet, so playback is still active
** and since there is still space in the playback `dma_area` buffer, the ioctl allows `_writei` to complete successfully in userspace
* `writei` completes in userspace, returning 32 more frames; now appl_ptr should be at 64 (the `dma_area` buffer is currently full - meaning if there was a next write, it would wrap)
* `snd_pcm_drain` fires afterward in userspace - triggerring again the playback ioctl
* `snd_pcm_drain` is fired in kernel space soon after, apparently waiting for the playback to complete
* Some 738 μs (approx the period time of 726 μs) after the second IRQ, the third card IRQ fires
* Soon after, .pointer is called for the second time, in context of this third card IRQ
** The values seen by the second pointer are (in frames): hw_ptr = 17, .pointer = 33, appl_ptr = 64
** Engine again sees hw_ptr < .pointer < appl_ptr - and ultimately, 0 < hw_ptr = 33 < appl_ptr = 64 - again probably seen as a good sign
* Time goes by - Some 763 μs (again approx the period time of 726 μs) after the third IRQ, the fourth card IRQ fires
* Soon after, .pointer is called for the third time, in context of this fourth card IRQ
** The values seen by the second pointer are (in frames): hw_ptr = 33, .pointer = 1, appl_ptr = 64
** this means hw_ptr has wrapped - so all 64 requested (appl_ptr) frames have finished playing
** engine thus determines `snd_pcm_drain_done()` in kernelspace
* Soon after, `snd_pcm_drain()` exits in userspace - and the debug acquisition completes
* ((there is a "ghost buffer" at the end of capture in "Card Time", to indicate where .pointer would have to be - at approx quarter buffer size - _had_ the playback continued; which it doesn't in this case))


So, this tells me there is probably something like a condition, of either (hw_ptr < appl_ptr) after a .pointer call - or (hw_ptr < .pointer < appl_ptr) right before/during a .pointer call - (with wrapping handled in both cases), which needs to be satisfied, so that the ALSA engine determines that a playback stream is proceeding as expected. I'm not really sure, which of these would be the stronger condition. We can also look at some acquisitions where `hda_intel` fails (vs. `dummy-patest`, which doesn't):

    http://sdaaubckp.sf.net/post/alsa-capttest/capttest_04_bhda.gif (playback - `hda-intel`: bottom plot)

In most of these, .pointer fails to be fired after the second card IRQ (although, one of these didn't get to acquire any card IRQs at all). When .pointer is fired between second and third card IRQ, e.g as in:

    http://sdaaubckp.sf.net/post/alsa-capttest/_cappics04/captures-2013-08-11-11-56-38-bhda-trace-both.pdf

... at that point .pointer reads: hw_ptr = 0, .pointer = 1, appl_ptr = 32; .pointer here should be 17. The second .pointer we have: hw_ptr = 1, .pointer = 33, appl_ptr = 32 (vs. hw_ptr = 17, .pointer = 33, appl_ptr = 64); this is apparently a cause for a call to `azx_pcm_trigger` to stop stream (which otherwise happens first at `snd_pcm_drain_done()`); and after that, the `snd_pcm_drain()` call exits quickly (given debug acquisition finishes soon after, and no further events are reported on that plot). Now, the second .pointer certainly doesn't satisfy (hw_ptr < .pointer < appl_ptr), nor are the values of .pointer there where they should be according to time expired since start of playback - but I still cannot tell for sure, if this is the exact condition that causes the failure of the second userspace `writei` call.


Anyways, we can now take a look at the `dummy-patest` driver in the playback direction, whose montage of successful run is at:

    http://sdaaubckp.sf.net/post/alsa-capttest/montage-dummy-p.png  (also .pdf)

This image contains sometimes a "ghost copy" of the buffer in the CPU1 lane, because there is a bit of space there I could use; it is simply meant as a visual tool, to see what ALSA would "think" about the "card playback buffer" position (which in this case, for a virtual driver with no hardware, is simulated by the values returned by .pointer, calculated based on time delta in the timer tasklet). Anyways, a brief speculative breakdown would be:

* The first `snd_pcm_writei` fires; right before it, the playback `dma_area` is "empty"
* ALSA then starts the process via `snd_pcm_start` soon after
** Within this, the timer function is scheduled to fire after a period_size time - but there is no firing of "first" timer like in the `hda_intel` case
* Soon after, `snd_pcm_writei` probably returned with 32 frames written;
** so already at about this time, we can count on appl_ptr being set to 32 (or `dma_area` is "half full"; hence another drawing of the "buffer")
** Also about this time, the `nanosleep` (not drawn) in userspace should start
* Some time goes by - and the second `snd_pcm_writei` manages to fire in userspace _before_ the timer function even fires
* but then, the timer functions interrupts on CPU0, right before...
* ... the playback_ioctl handler is raised on CPU1!
** The first .pointer is called in context of the playback_ioctl;
** The values seen by the first pointer are (in frames): hw_ptr = 0, .pointer = 0, appl_ptr = 32; this is apparently seen as good sign by the engine, as `_writei` is allowed to complete successfully..
* `writei` completes in userspace, returning 32 more frames; now appl_ptr should be at 64 (the `dma_area` buffer is currently full - meaning if there was a next write, it would wrap)
* `snd_pcm_drain` fires afterward in userspace - triggering again the playback ioctl
* `snd_pcm_drain` is fired in kernel space soon after, apparently waiting for the playback to complete
* Soon after, .pointer is called for the second time, in context of the _drain playback_ioctl handler
** The values seen by the second pointer are (in frames): hw_ptr = 0, .pointer = 38, appl_ptr = 64; this is apparently still good
* Soon after, the second timer function is called
* Soon after, .pointer is called for the third time, in the context of the second timer function
** The values seen by the second pointer are (in frames): hw_ptr = 38, .pointer = 10, appl_ptr = 64; this is apparently good - indicating .pointer has wrapped... but then, it wrapped at 10 frames over, meaning "card" played _more_ samples than requested; but that seems not to be a cause of concern
** engine thus determines `snd_pcm_drain_done()` in kernelspace
** `dummy_pcm_trigger` is called soon after to stop the stream;
* Soon after, `snd_pcm_drain()` exits in userspace - and the debug acquisition completes

We can also look at some acquisitions where `dummy-patest` fails (vs. `hda_intel`, which doesn't):

    http://sdaaubckp.sf.net/post/alsa-capttest/capttest_04_bdum.gif

A quick scan of the top of that animated plot, tells us that in those `dummy-patest` acquisitions, the second timer function doesn't even fire; implying that the stream was stopped already at the first firing of pointer (or the second `writei`). One of those acquisitions is:

    http://sdaaubckp.sf.net/post/alsa-capttest/_cappics04/captures-2013-08-11-11-40-44-bdum-trace-both.pdf

Here we can see that also .pointer fires only once, and it sees values hw_ptr = 0, .pointer = 37, appl_ptr = 32; and as we cannot have played more frames than requested after a single `_writei`, the engine rightly decides something is wrong here - and rightly issues a `_trigger` to stop immediately afterwards.


Also, we can have a brief look at the original dummy driver. First, recall that when we compare the capture operation in the original `dummy` vs. `dummy-patest`:

    http://sdaaubckp.sf.net/post/alsa-capttest/capttest.gif     (capture  - `dummy-patest`: top plot)
    http://sdaaubckp.sf.net/post/alsa-capttest/capttest_03.gif  (capture  -  orig `dummy` : top plot)

... the original `dummy`, being able to provide a .pointer that increases each frame, can trigger `snd_pcm_update_hw_ptr0` (and thus the .pointer function) to repeatedly update multiple times; `dummy-patest`, which calculates .pointer position only once in the timer tasklet, doesn't trigger a `snd_pcm_update_hw_ptr0` (and the corresponding .pointer) update more than twice in a row.

The interesting thing is, that in the playback direction, there is no such distinction:

    http://sdaaubckp.sf.net/post/alsa-capttest/capttest_04.gif     (playback - `dummy-patest`: top plot)
    http://sdaaubckp.sf.net/post/alsa-capttest/capttest_04_or.gif  (playback -  orig `dummy` : top plot)

In both cases, the .pointer in context of `snd_pcm_update_hw_ptr0` is called at pretty much the same times. I would guess, that this is because of the fundamental difference between the capture and playback direction - in the capture direction, the card is the initiator of delivering frames to the PC, and .pointer indicates the position that the card has reached in capturing - and it's in the best interest of ALSA to have the latest .pointer position stored in hw_ptr; thus if ALSA keeps on getting new values in .pointer, it will repeatedly try to update to them. But, in the playback direction, userspace is the initiator of delivering frames to the card, and as such ALSA doesn't need to continuously update to have the latest .pointer should it change - it can make do, apparently simply by checking .pointer "once in a while", and making sure the card keeps track with playback as demanded by userspace.


Before I wrap up, here is a small (and crude) ASCII table, summarizing the difference in behavior between the `hda-intel` and `dummy` drivers (here `dummy` refers both to the original and `dummy-patest`, since they both schedule their timer functions the same way) in the context of `captmini`/`playmini` tests, as I see it so far:

           hda-intel               dummy
    0   readi     writei   |   readi     writei
    1   IRQ.p/0   IRQ      |
    16            IRQ/0    |
    32  IRQ.p/32           |   Tmr.p/32  Tmr.p/32
    48            IRQ.p/32 |
    64  IRQ.p              |   Tmr.p     Tmr.p
    16            IRQ.p    |
    ...

Here time is shown through frames, assuming period_size is 32 (so half a period is 16, two periods is 64 - which is also buffer_size). The table shows a comparison of the firing of "period pulses": in case of `hda-intel` provided by a card IRQ; in case of the virtual `dummy` driver provided by timer functions. The `.p`, where present, means that .pointer is expected to be called in context of that callback. The presence of IRQ at "1" for `hda-intel` means "acknowledgment" interrupts are fired immediately after the first command is issued - which doesn't happen in `dummy`. The slash with number (/0, /32), where present, refers to what .pointer position is expected to be reported at that time. This should help make visible that the playback stream for `hda-intel` is "offset" for half a period in respect to the capture one - which again doesn't happen for `dummy`. I think it will be possible to add some variables to `dummy`, and force it to fire its timer functions with the same
 asymmetric capture/playback pattern as `hda-intel` - whether this will fix the PortAudio full-duplex drop, remains to be seen...


Well, that is as much I can fit into an email this time :)
Many thanks for any comments - especially if anyone sees anything wrong in this analysis,
Cheers!
_______________________________________________
Alsa-devel mailing list
Alsa-devel@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (playback)
  2013-08-14 14:30       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (playback) Smilen Dimitrov
@ 2013-08-15  4:17         ` Raymond Yau
  2013-08-16  5:20           ` Smilen Dimitrov
  0 siblings, 1 reply; 16+ messages in thread
From: Raymond Yau @ 2013-08-15  4:17 UTC (permalink / raw
  To: ALSA Development Mailing List

>
> While I have learned that modern cards do not have on-board buffers,



Take a look at Intel® High Definition Audio Specification Document Change
Notification
4.6 Energy Efficient HD Audio (EEAudio) Mechanism

There is controller (HW) buffer <local fifo> in the Figure 14 HD Audio DMA
and buffering
_______________________________________________
Alsa-devel mailing list
Alsa-devel@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (playback)
  2013-08-15  4:17         ` Raymond Yau
@ 2013-08-16  5:20           ` Smilen Dimitrov
  0 siblings, 0 replies; 16+ messages in thread
From: Smilen Dimitrov @ 2013-08-16  5:20 UTC (permalink / raw
  To: ALSA Development Mailing List

Hi Raymond,

Many thanks for your response!


>>
>> While I have learned that modern cards do not have on-board buffers,
>
> Take a look at Intel® High Definition Audio Specification Document Change
> Notification
> 4.6 Energy Efficient HD Audio (EEAudio) Mechanism
>
> There is controller (HW) buffer <local fifo> in the Figure 14 HD Audio DMA
> and buffering

Thanks for spotting that - wasn't aware of this document. I had a bit of trouble tracking it down using those terms - ended up finding it on a page entitled "Intel® High Definition Audio Energy Efficient Buffering: Spec".

Now, that comment of mine, stems from this:

>>> I'm assuming that the card has it's own intern capture buffer memory
>>> on board;
>>
>> No modern card has this.  All data is immediately read from/written to
>> main memory.
>>
>
> Thanks for noting this - I wish I knew better :) [...]

... and come to think of it, I did start the discussion referring to "capture" there. Reading the Intel spec DCN, "capture" is never mentioned, while "played" is mentioned once. Does this mean that the referred FIFO for hda-intel is just for playback, while there is no such counterpart for capture? Or is it that for hda-intel on-board cards, they have FIFO's in both direction, but other modern cards don't necessarily? Or maybe I introduced the problem, by carelessly referring to "buffer memory", which might mean true random access memory - while modern cards may have FIFO's (which I'd guess are electrically simpler)?


Thanks for any clarifications,
Cheers!

_______________________________________________
Alsa-devel mailing list
Alsa-devel@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (full-duplex: latency.c)
  2013-07-25  8:37     ` Clemens Ladisch
  2013-08-04  0:05       ` Smilen Dimitrov
  2013-08-14 14:30       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (playback) Smilen Dimitrov
@ 2013-09-13  6:23       ` Smilen Dimitrov
  2013-09-17 16:07         ` Smilen Dimitrov
  2013-10-21 14:48       ` [Solved] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex Smilen Dimitrov
  3 siblings, 1 reply; 16+ messages in thread
From: Smilen Dimitrov @ 2013-09-13  6:23 UTC (permalink / raw
  To: alsa-devel

Hi all,

Since the PortAudio full-duplex problem I have, depends on both the capture and playback direction - I thought I'd also look into full-duplex, at least from the perspective of ALSA, and post a writeup on that. This is again done by comparing the `hda_intel` and `dummy` ALSA drivers, but ran under a different set of userspace programs. Even though I'll have specific questions, please let me know if anything pokes your eye as wrong in this email; and as always, apologies for the verbosity.


First of all, I wondered if there is a "plain ALSA" program, which would be an equivalent to PortAudio's `patest(_duplex)_wire.c`, and could be used to do a full-duplex test. I never would have guessed by the name alone, but it seems there is - in form of `alsa-lib`'s `test/latency.c`. I had quite a hard time figuring out exactly how this `latency.c` is supposed to behave - so after playing with it a bit, I took the liberty to post a page about it on the wiki:

    http://www.alsa-project.org/main/index.php/Test_latency.c

... which also includes some valuable snippets from `alsa-devel` discussions - hope that's OK. While the snippets go a long way in explaining a lot (such as the latency being visible as the playback stream being a number of frames ahead of the capture one), what confuses me is this:

* I specify `-m 128 -M 128` as arguments, which forces latency=128 (frames) in the code
* The program confirms that, saying: "Trying latency 128 frames ..."
* Before the read/write loop, `writebuf(phandle, buffer, latency ...)` is called **twice**, and yet:
** the very first `appl_ptr` obtained from a playback .pointer is 128 (in most of my captures)
** At "Success" end, we may have Playback: *** frames = 44288 *** and Capture: *** frames = 44160 *** (like in the output under #Usage on the wiki page); and their difference is 44288-44160 = 128

So my question is: how is it possible to do a write of 128 frames *twice*; and yet still get both the first `appl_ptr`, and p/c stream difference to be 128? Shouldn't it be 2*128 = 256, if we started with writing 128 frames twice?


Then, I wondered if there is an ALSA equivalent to PortAudio's use of a single/"wire" callback to specify "proper"/"hard" full-duplex operation. To me, it seems that it is the command `snd_pcm_link`, for which the docs say: "The two PCMs will start/stop/prepare in sync". Is this what defines a proper full-duplex operation - that the playback and capture streams are running in sync? A confirmation for this can be seen to come from portaudio-v19/src/hostapi/alsa/pa_linux_alsa.c:


    /* this will cause the two streams to automatically start/stop/prepare in sync.
     * We only need to execute these operations on one of the pair.
     * A: We don't want to do this on a blocking stream.
     */
    if( self->callbackMode && self->capture.pcm && self->playback.pcm )
    {
        int err = snd_pcm_link( self->capture.pcm, self->playback.pcm );
        if( err == 0 )
            self->pcmsSynced = 1;
        else
            PA_DEBUG(( "%s: Unable to sync pcms: %s\n", __FUNCTION__, snd_strerror( err ) ));
    }

... where we have the only instance of `snd_pcm_link`, being used only when capture and playback streams both exist in the same context (besides the callbackMode) - which, I gather, is only possible when a single/"wire" callback is used; is this correct? But then, why wouldn't we want to do that on a blocking stream? Also, would that "blocking" refer to actual blocking file descriptors only - or can it also be understood to include polling waits as well?


Anyways, I have again posted the scripts for, and debug log acquisitions + visualisations of, the `latency` tests in:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/

There is a modified version of `latency.c` used there, called `latency-mod.c`; the drivers are supposed to have `trace_printk`s as per `latency-drv.patch`; the related scripts are `run-alsa-lattest.sh`, `traceFGLatLogfile2Csv.py`, `traceFGLatLogGraph.gp` and `lat-anim.pl`. The debug logs are in `captures-alsa-captlat.tar.xz`, and their PDF plots are in the `_cappics05/` subfolder. Please see the end of the updated `Readme`, in the same directory, for more information about these.


To begin with, I tried acquiring the operation of `hda-intel` under `latency-mod.c` with various parameters, to see how it would behave - these parameters are encoded in the respective file/directory names. First, here is `hda-intel` driver (and card), under `latency-mod.c` without poll (`np`), without Round Robin scheduling (`ns`) and without blocking (`nb`), with latency 128 frames, and duration of test 256 frames:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-hda-128-256-np-0-ns-nb.gif

(( Note that all of the provided acquisitions are without Round Robin scheduling (`ns`), with latency 128 frames, and duration of test 256 frames. Also most of the acquisitions represent successfully completed `latency` runs - except for the last two, which are indicated with an `-f` in their names ))

The .gif, as usual, is not very legible - but you can view the PDFs of the individual frames given in the `_cappics05` subfolder. What is clear though, is that .pointer is being called very frequently - and the reason for this, are the very frequent calls to `snd_pcm_readi`; which are frequent because neither blocking, nor polling is used (and so `readbuf` in `latency-mod.c` spins in its loop as fast as it can).

Once we start using polling (`yp`) with `latency`, the situation settles down; note that "polling" in `latency.c` uses a `snd_pcm_wait` to wait a given time (in milliseconds) on the capture file handle. As I had some problems with poll at first, I thought it was due to the poll waiting time, which was fixed to 1000 ms in the original `latency.c` - so I added a `--polltime` option in `latency-mod.c`. After some reboots, I couldn't reproduce that problem any more, and it looks like the operation with polling is more-less the same with 1 ms wait time vs. 1000 ms wait time:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-hda-128-256-yp-1-ns-nb.gif
    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-hda-128-256-yp-1000-ns-nb.gif

Finally, I tried using blocking (`yb`) with `latency` instead - and it turns out, it doesn't look all that much different from polling with `hda-intel`:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-hda-128-256-np-0-ns-yb.gif

... and so I finally settled on using these `latency` settings (128-256-np-0-ns-yb) for later acquisition of `dummy` operation as well. Anyways, in terms of `hda-intel`, I gather the following can be said generally for the operation under `latency(-mod).c`:

* The operation starts with two or three violet "CardIRQ?"s, spanning approx a quarter of the buffer period (the time corresponding to period_size)
** (violet "CardIRQ?" means that an audio driver interrupt handler function, here `azx_interrupt`, has been detected - but no .pointer was called in its context)
* Approx a whole buffer period later from the very first violet "CardIRQ?", the first "proper" "CardIRQ?" is fired, and it is capture (blue)
* Approx a quarter of the buffer period later from the first capture "CardIRQ?", the first playback (red) "CardIRQ?" fires
* The capture/playback "CardIRQ?" then repeat after a buffer period respectively (keeping the quarter buffer period phase between them) until the end of the program

This seems to match the behavior of the separate capture and playback "montage"s for `hda-intel`, discussed earlier in the thread. The interesting thing for me here is: even if the `latency` program starts by doing two `snd_pcm_writei`s - the driver operation still starts with a capture IRQ, and the playback series is delayed in respect to capture for a quarter buffer period; is this correctly surmised?


For a more detailed analysis, the `lat-anim.pl` script can be used to render a single debug acquisition as an animation; here is one such example for `hda-intel` under `latency`:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-2013-09-02-23-30-38-hda-128-256-np-0-ns-yb.mpg

... try it with:

    vlc --repeat --rate 3.0 captlat-2013-09-02-23-30-38-hda-128-256-np-0-ns-yb.mpg

Again, the supplied `.mpg`s are not very legible - however, you can use `lat-anim.pl` and the provided debug logs to generate the high-res animation frames (and videos) yourself; although note that rendering frames from a debug acquisition like the above takes about an hour on my machine (see the `Readme` for more). By default, an animation frame is 800x600, and in that format looks like this:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-2013-09-11-00-36-17-duM-128-256-np-0-ns-yb_frame_00269.png


On the left hand side, there is visualization of playback (left block) and capture (right block) pointers - from top to bottom:

* For playback, `appl_ptr` changes first - `_pointer` and then `hw_ptr` try to follow it
* For capture, `_pointer` changes first - `hw_ptr` and then `appl_ptr` try to follow it

On the right hand side, there is the "Card Time" lane overlay, showing the a repeat of the `_pointer` values (violet); and interpolated playback (red) and capture (blue) position (again, see `Readme` for more). The interpolated positions start with the first proper capture or playback "cardIRQ?", and then grow as per sampling rate period (here 1/44100), so they simulate the card sampling clock. The animation in this lane shows that:

* The capture and playback streams indeed seem to be close to synchronized - as expected from `snd_pcm_link` - by looking at their interpolated positions (the small delta between those values is likely due to calculation errors)
* The respective `_pointers` do seem to track closely the interpolated positions - and they typically cross about half a buffer (in this case, a period_size), when they update (close to their respective "cardIRQ?"s)

The interesting thing here for me, is the fact that the _pointer/interpolated positions indeed seem to be synchronized - even if the capture and playback IRQ sequences are quarter buffer period apart! At first, I expected that since the c/p IRQ sequences are quarter buffer period apart - then also the respective interpolated/_pointer values would also have to be quarter buffer_size (half period_size) apart!? But seen like this, it seems that at the very first (violet) cardIRQ, the sampling clock is started synchronously for both playback and capture on the card - and so, _even_ if the playback IRQs hit a little bit later, the PC still is informed of relatively accurate card values (esp. given that, as surmised earlier in the thread, the `_pointer` values seem to propagate from the card to the PC via DMA, and thus do not rely on interrupts "reporting" them). Is this somewhat correctly understood?

[[ This confused me greatly, especially since one of the very first captures (which I also used for developing the scripts) was this:

    http://sdaaubckp.sf.net/post/alsa-capttest/_cappics05/captlat-2013-08-20-09-33-12-trace-hda-intel.pdf

There the very first playback _pointer has a value of 48 - barely 200 μs (duration of some 8 frames) after the `snd_pcm_start`! However, it is about 48/128 = (1/2.66667)th of the buffer_size, whereas a quarter buffer_size is 32 frames - and since I got this in no other debug log, I guess this was an anomaly/error of sorts (even if `latency` reported successful completion). ]]


Further on to the `dummy` driver, the original version (with the extra `trace_printk`), behaves like this for a blocking `latency` test:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-dum-128-256-np-0-ns-yb.gif

Here, even with blocking, .pointer is quite frequently called - however, as noted earlier in this thread, that is because this driver version re-calculates the _pointer value in the .pointer function; and so the value of _pointer changes by the time `hw_ptr` is set to _pointer's last value. And since in the capture direction, ALSA insists having `hw_ptr` set to _pointer, it repeats this process.


The behavior of the modified `dummy` driver (`dummy-2.6.32-patest.c` with a respective `trace_printk`) looks like this:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-duM-128-256-np-0-ns-yb.gif

Since `dummy-2.6.32-patest` recalculates _pointer values in the hrtimer callbacks (and the .pointer function simply returns it) - the ALSA engine can quickly update `hw_ptr` to the _pointer value without many repetitions; and so it resembles the behavior `hda-intel` a bit more. Apparently, due to the `snd_pcm_link`, the hrtimer callbacks for playback and capture schedule very close to one another - causing the future callbacks to be answered in the context of a single IRQ; however, also here, the capture callback seems to be answered first. Looking at one of its debug logs as an animation:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-2013-09-11-02-21-08-duM-128-256-np-0-ns-yb.mpg

... also the update behavior of the pointers and the interpolated positions seems close to the behavior of `hda-intel`.


While also `hda-intel`can fail (with an XRUN) in a `latency` test like above, I was more interested in a failure of the modified `dummy-2.6.32-patest` - so here is one such debug plot:

    http://sdaaubckp.sf.net/post/alsa-capttest/_cappics05/captlat-2013-09-11-13-46-43-duM-128-256-np-0-ns-yb-f-trace-dummy.pdf

... and as an animation:

    http://sdaaubckp.sf.net/post/alsa-capttest/captlat-2013-09-11-13-46-43-duM-128-256-np-0-ns-yb-f.mpg

At end of this run, `latency-mod.c` would have reported playback/capture frames at end, and an array logging the status of each `readbuf`/`writebuf` call in `latency-mod.c`:

* 192/128 xrun: [rd wr] 0: [64 64] 1: [64 -32] 2: [0 0] 3: [0 0]   ...

... which would mean that the second `writebuf` failed. However, in the PDF plot, we can see the first `_readi` called right at start; after the first "cardIRQ?", first `_writei` and second `_readi` are called. In the successful acquisitions, this pattern repeats to the end - but in this failing one, the failing `_writei` is called after an `xrun()` is decided in context of the third playback "cardIRQ?" (this is not shown on the PDF - you'll have to look at the `.csv` file to see the `xrun()`). This means that - for some reason - both `_writei and `_readi` are *skipped* after the second "cardIRQ?"; and I cannot really tell what would be the reason for it. I may try to look into this further - but as I remember, the PortAudio full-duplex problem was not a typical XRUN, so maybe I'll go back directly to PortAudio from this point.


Thanks for bearing with me so far - and I'm looking forward to any answers/corrections,
Cheers!


_______________________________________________
Alsa-devel mailing list
Alsa-devel@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (full-duplex: latency.c)
  2013-09-13  6:23       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (full-duplex: latency.c) Smilen Dimitrov
@ 2013-09-17 16:07         ` Smilen Dimitrov
  0 siblings, 0 replies; 16+ messages in thread
From: Smilen Dimitrov @ 2013-09-17 16:07 UTC (permalink / raw
  To: ALSA Development Mailing List

I just tried to look a bit more into this:

> .... This means that - for
> some reason - both `_writei and `_readi` are *skipped* after the
> second "cardIRQ?"; and I cannot really tell what would be the reason
> for it. I may try to look into this further - but as I remember, the
> PortAudio full-duplex problem was not a typical XRUN, so maybe I'll
> go back directly to PortAudio from this point.
> 

... so I came up with another script, `comparecsv.pl`, posted again here:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/

The idea was to use it to compare that section (after the second "cardIRQ?") in a failing and non-failing acquisition of the modified `dummy` (*duM*) driver; so I've tried it as:

    perl comparecsv.pl \
      -i captlat-2013-09-11-00-37-48-duM-128-256-np-0-ns-yb/trace-dummy.csv -o 1580 \
      -i captlat-2013-09-11-13-46-43-duM-128-256-np-0-ns-yb-f/trace-dummy.csv -o 1054 \
    -l 844 > compcsv.txt

This generates a text-only side-by-side comparison of the two input files at given line number offsets and for the given length in lines:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/_cappics05/compcsv.txt

... and can be used to generate a more visual, HTML side-by-side comparison:

    http://sdaaubckp.sourceforge.net/post/alsa-capttest/_cappics05/compcsvLR.html

... although also `meld` could be used (see the `Readme` for more details). 

Unfortunately, these side-by-side displays are still difficult enough to read, to allow for some clear conclusions. 

Then I thought to compare a given failed capture, to all other (six) successful captures, by quickly traversing through the .csv's, and finding which kernel commands in the failed one, do not appear in the successful ones, using `awk`. However:

    $ awk -F, \
      'FNR==NR{if(!($8 in a)){a[$8]=0;b[$8]=sprintf("%s %s",$1,$4)}};FNR!=NR{if($8 in a){a[$8]++;}};END{for(i in a){if(a[i]==0){print i,b[i];}}}' \
      captlat-2013-09-11-17-14-02-duM-128-256-np-0-ns-yb-f/trace-dummy.csv captlat-2013-0*duM*b/*.csv

    snd_pcm_do_stop();  0.005481 lt-latency
    xrun()              0.004957 rsyslogd

... unfortunately, with this failed acquisition (2013-09-11-17-14-02*-f), I only get the effect (`xrun`, `snd_pcm_do_stop`) detected - not anything I can attribute to the cause. The other failed .csv acquisition is a bit more revealing:

    $ awk -F, \
      'FNR==NR{if(!($8 in a)){a[$8]=0;b[$8]=sprintf("%s %s",$1,$4)}};FNR!=NR{if($8 in a){a[$8]++;}};END{for(i in a){if(a[i]==0){print i,b[i];}}}' \
      captlat-2013-09-11-13-46-43-duM-128-256-np-0-ns-yb-f/trace-dummy.csv captlat-2013-0*duM*b/*.csv
    
    account_idle_ticks()    0.002878 <idle>       o
    raise_softirq()         0.000547 <idle>       m
    snd_pcm_do_stop();      0.005192 lt-latency
    kthread_should_stop();  0.000873 ksoftirqd/0  o
    rcu_needs_cpu()         0.000530 <idle>       m
    xrun()                  0.004868 rsyslogd
    wakeup_softirqd()       0.000548 <idle>       o

... where I've manually added whether the functions appear "o"nce or "m"ultiple times in the failed (2013-09-11-13-46-43*-f) acquisition `.csv`. RCU, I learned, refers to "read-copy-update" (a type of locking/synchronization mechanism, apparently), and the only thing I gather from this, is that the reference to `rcu_needs_cpu` and `account_idle_ticks` probably means that Linux at acquistion time decided to do some housekeeping, which possibly preempted the scheduled execution of writei/readi from userspace.

Looking back at the above *.html side-by-side comparison - noting it can only reveal differences between specific acquisitions, for those particular acquisitions and regions, one thing visible is that the `sys_ioctl`s (which should eventually call the `snd_pcm_lib_read1/write1`) are not called at all in the failed one - and neither are `​snd_pcm_update_state`, `​pick_next_task_fair` nor `​schedule`. Also,

* successful has: check_preempt_curr -> resched_task (twice, but not that far away from each other)
*     failed has: check_preempt_curr -> check_preempt_wakeup -> update_curr -> wakeup_preempt_entity​.​clone.​88

... which looks like it confirms the theory that preemption happened in the failed one - but does not reveal due to what. Then again, as seen previously, `check_preempt_wakeup` can also occur in successful captures. In the failed capture, I can also see the process `eog` doing a `_pollwait` quite a bit, but not sure if that could be the reason for triggering the xrun. 

Another possibly weird thing can be seen in the *.html file, at start:

( successful  vs.  failed ) 
1 	snd_pcm_update_hw_ptr​0()​    1 	snd_pcm_update_hw_ptr​0()​
2 	dummy_hrtimer_pointer​()​	
3 	dummy_pcm_pointer()​	    2 	dummy_pcm_pointer()​
                                    3 	dummy_hrtimer_pointer​()​

In the code, `dummy_pcm_pointer` calls `dummy_hrtimer_pointer` - however here, in the successful capture the order is inverted; but that seems to be an artifact of sorting - in the `.csv` capture, both *_hrtimer and *_pcm_pointer have the same timestamp of 0.003225. So that doesn't say much either :/


So, unfortunately, a quick look seems to be not enough to properly determine a reason for an xrun() based on kernel trace acquisitions - if anyone can point a more proper way, I'd really appreciate it. 


Thanks in advance for any comments,  
Cheers!

_______________________________________________
Alsa-devel mailing list
Alsa-devel@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Solved] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex
  2013-07-25  8:37     ` Clemens Ladisch
                         ` (2 preceding siblings ...)
  2013-09-13  6:23       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (full-duplex: latency.c) Smilen Dimitrov
@ 2013-10-21 14:48       ` Smilen Dimitrov
  3 siblings, 0 replies; 16+ messages in thread
From: Smilen Dimitrov @ 2013-10-21 14:48 UTC (permalink / raw
  To: portaudio, alsa-devel, audacity-devel

Hi list(s),

I hope I'll be forgiven for bumping all lists again - just wanted to confirm that indeed, the problem I stated in the start of this thread, was not with Audacity nor PortAudio nor ALSA as such (even with the older versions I've used); the problem was my modification of the ALSA `dummy` driver, submitted previously as `dummy-2.6.32-patest-fix.c` (or "dummy-mod" for short).

I'd also announce, that it seems that the driver `dummy-2.6.32-patest-fix.c` (or "dummy-fix"), uploaded in this directory:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/

... fixes the problems with the full-duplex "drop input" I experienced in Audacity - but I'll still use this opportunity to ask some questions. I'll try to be as brief as I can here, some more info is in the `Readme` in the `fix` directory.

As a reminder, the full-duplex "drop input" looked like this for me:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/dummy-mod-fddrop.png

... or, as the screenshot shows, very soon after a start of capture in full-duplex mode, `dummy-mod` would trigger a full-duplex "drop input" in PortAudio, which would propagate to Audacity.

The behavior of `dummy-fix` now is like this:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/dummy-fix-ok.png

... or, as the screenshot shows, Audacity can now run for 10 mins in full-duplex capture mode with `dummy-fix`, without a full-duplex "drop input" being triggered; which is as good as I need it, I guess.


First, though, I have a question on the "nature" of full-duplex. The way I see it now, there are two distinct contexts for full-duplex use, which I'd call:

* monitoring - you want to listen on the speakers, what is recorded on the microphone (playback of the capture stream)
* studio/overlay recording - you want to play a background track, and you want to record the singer singing to that track "in sync" with the played track

In the monitoring case, I guess one doesn't care much for stream synchronization - input stream will arrive when it arrives (after inherent latencies of the system); in the mean time you can just play silence - and as soon as input data is available, you can play that too; it is "full-duplex" only in the sense that the playback and capture streams are running "at the same time" generally. On the other hand, for "overlay" recording, one would probably want the recorded stream as closely synchronized as possible to the playback stream. Is this correctly understood?

>From this, I guess that ALSA's `latency.c` achieves the full-duplex synchronization (of the "overlay" kind) by calling `snd_pcm_link`, *and* by writing a 2*period_size worth of playback data (let's call this playback pre-buffer) *before* the full-duplex operation starts. However, I couldn't see anything like this "playback pre-buffer" in PortAudio, even if `pa_linux_alsa.c` does call `snd_pcm_link`. Then, I couldn't see a "playback pre-buffer" in PortAudio's `patest_duplex.c` either, but I thought maybe this program is meant to demonstrate a full-duplex of the monitoring kind (and thus it doesn't need such prebuffering). So my question is - does PortAudio do this kind of playback pre-buffer that I may have missed; and if it doesn't, does Audacity do it?


Back to topic - so, while I suspected anything from the massive printouts from ALSA/PortAudio debugs and kernel message printouts to (un)reliability of hrtimers in the Linux kernel as the cause of trouble, it turns out that isn't the problem - the issue got solved as soon as I managed to simulate the IRQ .pointer behavior of `hda-intel`, as timer callback .pointer behavior within `dummy-fix`.

First of all, the full-duplex "drop input" seems to be triggered, initially, by a polling error of the playback stream in PortAudio. I'm still not exactly clear on which stream it is (due to the PortAudio code using "thisComponent" and "otherComponent"), but both the error condition of `snd_pcm_playback_poll` in ALSA, and further behavior of the `margin` variable in the PortAudio code in `ContinuePoll`, seem to indicate that a hw_ptr is not increasing. While the original `snd-dummy` always recalculates the .pointer position in the .pointer function - I had moved that calculation in the timer callback in `dummy-mod`, and the .pointer function then simply returned the last calculated value. So in `dummy-fix`, .pointer function again recalculates the position (almost) every call - but this wa
 s not the entirety of the fix.

The fix is in simulation of this behavior of `hda-intel`:

    For period sizes > 64 frames; the period IRQ (or timer function) for the playback stream should be delayed early for some 48 frames (at CD quality, 48/44100 = 1.088 ms); however it should return the proper expected .pointer position (at periods, that is typically N*period_size+1 in frames, where N=0,1,(2..))

Now this is what puzzles me most: _why_ should the playback stream (in particular) be delayed early? I noticed this behavior by first analyzing the period IRQ positions of `hda-intel` (the plot shows use of both ALSA `latency-mod.c`, and PortAudio `patest_duplex_wire.c`, as user-space programs):

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/collectmirq_hda.png

Note here that: even as period sizes increase - the playback (red) is delayed early from the capture (blue), for approximately the same amount of time. As this plots (as closely as possible) the IRQs the card issues, that means that the card hardware actually issues the playback interrupts early. Why?

In comparison, in `dummy-mod` there were no discernible time offsets between capture and playback timer callbacks:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/collectmirq_duM.png

... and seemingly, this is what caused the full-duplex drop. Now, `dummy-fix` behaves rather similarly to `hda-intel` in that respect:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/collectmirq_duF.png

... and here is a plot, that shows how `dummy-fix` approximates `hda-intel` a bit more closely - for period_size 256 and buffer_size 512 frames:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/cmirq_hda_duF_512_256.png


Another interesting thing is, that `hda-intel` does not behave the same for period_size <= 64 frames; in that case, the playback is delayed late, not early. In earlier mails in this thread, I tried to analyze smaller period sizes (so as to limit the ammount of kernel data to be analyzed and plotted) - and this made me interpret the offsets as "quarter period"; obviously that approach failed. (Other problems I had was 16UL*100000000UL not actually fitting in unsigned long, but requiring unsigned long long; and a bug in Gnuplot when using palette, which inverted the capture and playback colors in the plots, making me code the wrong offsets). The interesting thing, though, is that when I try to simulate that behavior in `dummy-fix`:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/cmirq_hda_duF_128_PERIOD64F.png

... even if the behavior is quite close (left half is `hda-intel`, right half is `dummy-fix`), the `dummy-fix` tends to XRUN a _lot_ in that case; however, it should be said that `hda-intel` also tends to XRUN quite a bit (though not as much) for period_size 64 frames. Going back to running the periods (timer callbacks) of capture and playback streams without significant offsets (close to each other):

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/cmirq_hda_duF_128_64.png

... seems to make `dummy-fix` much more reliable (very few XRUNs). Why?


Finally, there is also some test code, that allows for acquiring .pointer positions with Audacity - this code is somewhat simpler though, and renders using the timestamps of the .pointer printouts (not the timestamps of the causing IRQ/timers, which would have happened a bit earlier), but still looks good enough, I guess. This is the comparison of `hda-intel` and `dummy-fix` in full-duplex mode:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/collectmirq_acity_dup.png

... but interestingly, Audacity (or PortAudio) seems to settle on slightly different period_sizes for `hda-intel` vs. `dummy-fix` in capture-only mode:

    http://sdaaubckp.sourceforge.net/post/alsa-patest-duplex/fix/collectmirq_acity_cap.png

... or period_size / buffer_size (period_per_buffer) as a table (for my dev platform, at least):

    | audacity |   capture-only  |   full-duplex   |
    |----------------------------------------------|
    |dummy-fix | 1102 / 4408 (4) | 2048 / 4096 (2) |
    |hda-intel | 1088 / 4352 (4) | 2048 / 4096 (2) |
    ------------------------------------------------

Would anyone have an idea, why would Audacity (or PortAudio?) choose the same settings for the two drivers in full-duplex mode, but differing settings in capture-only mode?


To summarize, I haven't really found the exact conditions which trigger the full-duplex drop input detection in PortAudio - but it seems I've fixed the problem, by replicating the early delay of playback vs. capture timers behavior of `hda-intel`; hope it's robust enough, so I don't come back crying to the list(s) about new significant bugs found `:)` However, I'd still love to hear if anyone has answers to my questions above - or to a more simplified understanding of what condition actually triggers this drop (or, indeed, any comments `:)`).

Many thanks for all the responses in this thread so far (most of it found on alsa-devel) - I doubt I would have arrived at this point without that help; much appreciated,
Cheers!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-10-21 14:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-24  2:54 Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops Smilen Dimitrov
2013-07-24 13:03 ` Alan Horstmann
2013-07-25  0:29   ` Smilen Dimitrov
2013-07-25  8:37     ` Clemens Ladisch
2013-08-04  0:05       ` Smilen Dimitrov
2013-08-06 10:59         ` Clemens Ladisch
2013-08-06 11:41           ` David Henningsson
2013-08-06 13:04             ` Clemens Ladisch
2013-08-08  2:50           ` Smilen Dimitrov
2013-08-14 14:30       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (playback) Smilen Dimitrov
2013-08-15  4:17         ` Raymond Yau
2013-08-16  5:20           ` Smilen Dimitrov
2013-09-13  6:23       ` Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops (full-duplex: latency.c) Smilen Dimitrov
2013-09-17 16:07         ` Smilen Dimitrov
2013-10-21 14:48       ` [Solved] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex Smilen Dimitrov
2013-07-24 18:30 ` [Audacity-devel] Questions about virtual ALSA driver (dummy), PortAudio and full-duplex drops Richard Ash

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.