-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_buffered is inconsistent #321
Comments
I don't have time right now to modify the source code and check what the
minimum number of samples in the buffer can be---probably 1 is ok but
what might happen is that the connection will break off and have to
reconnect. I don't know ASIO well enough to make an informed guess,
though. Regardless, this buffer must be greater than 0.
It would be a drag to have to know the sampling rate for this parameter,
so I think changing it to samples is not the best option. Perhaps the
correct thing to do is to overload the constructor so that it could be
specified as an int or a float. That way if people really wanted to,
they could get the less-than-1-second buffers that they want. There
should at least be a warning if <=0 is specified, because such an inlet
will never work.
My two cents though is not to bother changing this at all. LSL always
pushes data in as timely a manner as it can. This parameter doesn't
affect throughput or performance. So, if samples are getting backlogged
that means there is something that isn't LSL's 'fault' that is blocking
transmission. In this case your options are either to have old data or
no data. I would always opt for old. The work around is programming a
timeout capability in the data acquisition loop. This isn't easy, but
this might be a case where the burden should be put on the user rather
than the developer of the library. 1 second of data is not that much
except in extreme cases that are probably too distant to warrant much
development effort/support.
…On 7/23/2018 7:51 PM, Chadwick Boulay wrote:
outlet
<https://github.com/sccn/labstreaminglayer/blob/master/LSL/liblsl/include/lsl_c.h#L468-L474>
inlet
<https://github.com/sccn/labstreaminglayer/blob/master/LSL/liblsl/include/lsl_c.h#L614-L628>
When creating an outlet or inlet, the user can specify the maximum
amount of data that can be buffered, in seconds. The data type for
this argument is int32_t.
In the non-typed wrappers (Python, Matlab), users can specify floats
and these will get typecast and rounded, creating problems. We can
make these wrappers check the argument to make sure it is an int > 0,
but this intersects with a separate issue.
In some use-cases, especially when a stream is carrying a processed
signal, only the latest sample is desired and anything older is to be
dropped. In these cases it is convenient to have a buffer of only 1
sample, otherwise the user is forced to fetch until there are no
samples remaining then use whatever the last sample was. This has come
up several times recently.
The docstring for |data_receiver| here
<https://github.com/sccn/labstreaminglayer/blob/master/LSL/liblsl/src/data_receiver.cpp#L20-L21>
suggests that |max_buflen| is number of seconds, as above, and indeed
the seconds-version is passed directly to the data_receiver
constructor during inlet creation, but if you follow how it's used
then it ends up being number of samples. (Gets written to a string,
then read in from the string by tcp_server as number of samples).
This is probably a bug. Can other devs please check this to make sure
I didn't miss something?
@tstenner <https://github.com/tstenner> @mgrivich
<https://github.com/mgrivich>
Is there a minimum number of samples that must be buffered? I think it
is 1, but some of the checks I see look to see if it is >= 0... and 0
causes the stream to be useless because there's no buffer.
Should |max_buflen|/|max_buffered| be changed to a float type so < 1
second can be used?
Or, probably better, can it be made to represent sample number instead
of seconds?
In either case, this will require most apps to be changed.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/sccn/labstreaminglayer/issues/321>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/ADch7md4FYLphZSfWNCj2zil1FyBM8Phks5uJg0zgaJpZM4VbbHh>.
|
Pull everything in a chunk and take only the interesting samples?
If that's the case, I'd correct the docs and be happy the (imho) more sane interpretation for the parameter is correct.
+1 for number of samples, changing the type would require a recompilation for all apps otherwise they would push an int which would then get converted to about 0. |
I think either all samples get transmitted or they don't even reach the network level, so asio shouldn't care.
That will cause more problems than it solves and float
But number of samples works in all cases and is a lot easier to reason about in terms of speed and resource usage. Also the buffer doesn't work in seconds but rather in number of samples, so it would have to be translated somewhere anyway.
Or a 'drop everything older than the last XY samples' function which would be quite easy. |
For the outlet this is easy because it (should!) know its sampling rate, if any. I think @dmedine was referring to the consumer/inlet, where you don't necessarily care what the sampling rate is at inlet creation time, and it is more likely that when you think about insurance against dropped connections that you're thinking in time, not samples. But I think that anyone that actually specifies this value will also be able to query the sampling rate. It really bothers me that 'time' is an int, especially when this int (should!) always get multiplied with sampling rate. It also bothers me that it is called
That is only guaranteed to work if the max chunk size is greater than the buffer size, otherwise you have to pull chunks until you come up empty then use the last N samples of the last non-empty chunk. This is a pretty common anti-pattern that real-time users are required to do.
Even if we don't change the type, the outlet apps that have specified a buffer duration did so with the expectation that that it would guard against connection interruptions... but if we go from duration to n-samples then the buffer would end up being much much shorter than expected and would probably underrun at the slightest interruption. That being said, most apps that use the C++ API opt not to provide this argument so it defaults to 360 seconds, and we can simply change the default to 0 which gets caught during creation and is modified to int(360*fs) which is functionally equivalent to what happens now. So it's quite possible that any apps that didn't explicitly set the buffer duration will not have to be recompiled. I think he's still on vacation, but I will try to get @chkothe 's input when he gets back. |
Full agreement here, but that's the easiest way to do it at the moment.
Default arguments only change on compilation time, so for version n+1 we could output a warning if the parameter is 360 and remove the warning in n+2 in the hope that everyone has caught up. |
BTW, I just tested in Python creating an outlet with max_buffered=1 using either FS=0.9 or FS=1.0. The 0.9 never becomes visible to the inlet but the 1.0 works perfectly. import pylsl
import time
import numpy as np
FS = 0.9
NCHANNELS = 8
info = pylsl.StreamInfo(name='Test', type='EEG', channel_count=NCHANNELS, nominal_srate=FS,
channel_format='float32', source_id='myuid34234')
# next make an outlet
outlet = pylsl.StreamOutlet(info, max_buffered=1)
print("now sending data...")
n_pushed = 0
t_start = time.time()
while True:
time.sleep(max([0, (n_pushed / FS) - time.time() + t_start]))
outlet.push_sample(np.random.rand(NCHANNELS))
n_pushed += 1 |
BTW, I just tested in Python creating an outlet with max_buffered=1
using either FS=0.9 or FS=1.0. The 0.9 never becomes visible to the
inlet but the 1.0 works perfectly.
Because .9 gets rounded to 0 and then when the tcp_server is constructed
it has a buffer size of 0 samples.
…
Reply to this email directly, view it on GitHub
<https://github.com/sccn/labstreaminglayer/issues/321#issuecomment-407443526>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADch7iV8sO2xo1SyHjbl6qFrKg1e3mp7ks5uJznegaJpZM4VbbHh>.
|
Oops! The above statement is incorrect. If this parameter is 0, the tcp_server is simply never created (https://github.com/sccn/labstreaminglayer/blob/master/LSL/liblsl/src/tcp_server.cpp#L396-L398) |
Let me also clarify something that appears to be confused (though maybe
it's just me that's confused here). The docs are correct. The time is
specified in seconds and is converted to samples here:
https://github.com/sccn/labstreaminglayer/blob/master/LSL/liblsl/src/lsl_inlet_c.cpp#L27.
Chadwick is correct that this gets turned into a string so that the
information can get piped into the creation of the tcp_server later down
the line, which is where the actual allocation of this memory is
actually performed (on the sending side, by the way).
By the way, this has been bugging me all day so I changed my mind about
not having enough time and quickly checked this. I changed the line
cited above so that it doesn't multiply by the nominal sampling rate
when the inlet is created. I tested it using the cpp example programs
and you can indeed specify the buffer size to 1. I only ran it for a few
minutes, but it does not appear to affect performance in terms of memory
or CPU usage, but of course there are many, many implications for ways
that doing this can hurt you---as I'm sure @Chadwick and @tstenner are
quite well aware :)
|
I also like @tstenner's suggestion that we add a function to somehow take the newest N samples on demand if requested. This is tricky to implement, and I don't want to get in the habit of adding a whole bunch of patchy features to address many different special cases, but I feel quite hesitant to change this---even though, in truth, I would guess that the majority of people that are actually specifying a non-default value here are people who want a 1 sample buffer length. If this is in fact true, a simpler (but kludgier) solution is to add an optional argument to the inlet constructor to override the buffer length and force it to be 1 sample. |
RE the Python example, I guess my point was that this is a mistake that a user could conceivably make and there are currently no warnings or safe guards against it. IMO this needs to be fixed in the core, not the wrapper. If we're modifying the core lib anyway, then let's fix it the right way, as soon as we agree what that is. If we leave it as int-seconds, then we need to fix and clarify some docstrings, raise errors when the buffer is 0, fix the problem with data_receiver, and it would be helpful to add a Data receiver gets initialized with max_buflen seconds in the If we go with int-samples, then there are a bunch of externally-facing things to change, but it makes the details of the implementation much simpler. And as you were saying, if anyone is interested in using a non-default value then they should be proficient enough to be able to do other things like query the sampling rate. Edit: Not quite what you said, but I was referring to the proficiency of these power users. |
Also, optional arguments aren't a thing in C functions, so anything that uses the C API will need to provide these arguments explicitly. That includes at least the Python and Matlab wrappers. |
On 7/23/2018 10:51 AM, Chadwick Boulay wrote:
use-cases, especially when a stream is carrying a processed signal,
only the latest sample is desired and anything older is to be dropped.
In these cases it is convenient to have a buffer of only 1 sample,
otherwise the user is forced to fetch until there are no samples
remaining then use whatever the last sample was. This has come up
several times recently.
Setting a buffer to only 1 sample seems like a very bad idea. If LSL
throws away data to make room for new data, I would call that a critical
bug, not a feature that should be used.
I don't see the problem with the standard method of pulling till
timestamp is zero.
The docstring for |data_receiver| here
<https://github.com/sccn/labstreaminglayer/blob/master/LSL/liblsl/src/data_receiver.cpp#L20-L21>
suggests that |max_buflen| is number of seconds, as above, and indeed
the seconds-version is passed directly to the data_receiver
constructor during inlet creation, but if you follow how it's used
then it ends up being number of samples. (Gets written to a string,
then read in from the string by tcp_server as number of samples).
This is probably a bug. Can other devs please check this to make sure
I didn't miss something?
@tstenner <https://github.com/tstenner> @mgrivich
<https://github.com/mgrivich>
Is there a minimum number of samples that must be buffered? I think it
is 1, but some of the checks I see look to see if it is >= 0... and 0
causes the stream to be useless because there's no buffer.
Should |max_buflen|/|max_buffered| be changed to a float type so < 1
second can be used?
Or, probably better, can it be made to represent sample number instead
of seconds?
In either case, this will require most apps to be changed.
—
I'd say any buffer of less than a second is unwise, because networks can
hick up. And before we encourage/allow very short buffers we would need
to set up tests in toxic situations to see how LSL fails with small
buffers and lag spikes.
Right now, I'd say just fix the wrappers so that they don't allow the
user to request not integer values for buffer.
…
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/sccn/labstreaminglayer/issues/321>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AFC33X6EF98N-XUKNhG8DcLUc4VClT1Pks5uJg0zgaJpZM4VbbHh>.
|
On 7/24/2018 5:49 AM, Tristan Stenner wrote:
We already need to break the API for the next version,
We are not really breaking the API. We are fixing functions that have
always been bugged. This doesn't mean that we should go crazy and change
everything that is annoying us and end up with a Python 2 / Python 3
type split.
so we might as well think about other solutions, i.e. adding a
function to drop the all samples except for the newest N.
Again, I don't see why this kind of functionality is desirable. Data
loss is the worst, and adding functions to implement data loss seems to
be asking for trouble. If the user really doesn't care about old data,
they are free to pull it and not use it.
|
One of the key features of LSL is data synchronization. At the very least we need some very strong hints in the doc strings and examples about how to avoid this problem.
But it is designed to do that. If this is a critical bug then LSL should error-out whenever there is a buffer overflow and then people like me won't be misinformed about how LSL should be used.
In my Python example above, I did request an integer value for the buffer (=1) on the outlet end, but the sampling rate was < 1.0 so it still failed silently. If I set the buffer to 2 (so 1.8 seconds = 1 sample rounded down) then it works again, but if I set the inlet buffer to 1 second (still an int) then you get |
The solution here is better documentation. Perhaps a FAQ and/or examples for people who care about real-time latency. If you can't pull till timestamp == 0, you haven't caught up. You can also check lsl_local_clock() vs the timestamps of incoming data to measure lag more explicitly.
I'd be fine with having the core library round up rather than down, so that this of error doesn't happen. When Christian wrote it, it appears that he didn't anticipate regular sampling rates of < 1 Hz. |
Which still crashes with combinations of programs compiled against the wrong library version so it's by definition an ABI break, since the types change the API changes as well even though most compilers will only warn about it.
With a queue implementation that has a queryable element count, it's (in pseudocode) a simple
This is in principle the same as pulling everything and discarding the older samples, but it leaves out potentially costly copies and conversions. I'd rather implement it as one
This requires several buffers to read the data, discard the oldest data and put the newest samples back together. Simple example: there are 13 samples available and you want the newest 6, so you have to pull 6 samples until there are no more samples available while rotating the current and last buffer, then copy the newest 1 sample together with the older 5 samples. It's a common use case so I'd prefer the more error proof "drop all except the latest 6 samples, pull 6 samples, done`. |
With a queue implementation that has a queryable element count, it's
(in pseudocode) a simple
|queue.drop(std::max(0, queue.count() - sample_count)); return
queue.pop_all(); |
This is in principle the same as pulling everything and discarding the
older samples, but it leaves out potentially costly copies and
conversions. I'd rather implement it as one
|drop_and_keep_newest_n_samples| method so we don't get another 8
overloads for all pull functions.
Right, it's tricky to implement. LSL was designed assuming that no one
would ever want to do this, so actually implementing something like this
would require a lot of effort. It is not trivial to get a sample count
field into the consumer_queue object.
This requires several buffers to read the data, discard the oldest
data and put the newest samples back together. Simple example: there
are 13 samples available and you want the newest 6, so you have to
pull 6 samples until there are no more samples available while
rotating the current and last buffer, then copy the newest 1 sample
together with the older 5 samples. It's a common use case so I'd
prefer the more error proof "drop all except the latest 6 samples,
pull 6 samples, done`.
Providing a 'helper' method to do this for a user is actually what I had
envisioned. In the end, though, I am still of the mind that this should
be left to the user.
|
Not with the Boost Queue, see spsc_queue::read_available(). |
I would be surprised if the current implementation has significant performance overhead. If we discover excessive copies being made, those should be replaced with pass-by-reference. |
outlet
inlet
When creating an outlet or inlet, the user can specify the maximum amount of data that can be buffered, in seconds. The data type for this argument is int32_t.
In the non-typed wrappers (Python, Matlab), users can specify floats and these will get typecast and rounded, creating problems. We can make these wrappers check the argument to make sure it is an int > 0, but this intersects with a separate issue.
In some use-cases, especially when a stream is carrying a processed signal, only the latest sample is desired and anything older is to be dropped. In these cases it is convenient to have a buffer of only 1 sample, otherwise the user is forced to fetch until there are no samples remaining then use whatever the last sample was. This has come up several times recently.
The docstring for
data_receiver
here suggests thatmax_buflen
is number of seconds, as above, and indeed the seconds-version is passed directly to the data_receiver constructor during inlet creation, but if you follow how it's used then it ends up being number of samples. (Gets written to a string, then read in from the string by tcp_server as number of samples).This is probably a bug. Can other devs please check this to make sure I didn't miss something?
@tstenner @mgrivich
Is there a minimum number of samples that must be buffered? I think it is 1, but some of the checks I see look to see if it is >= 0... and 0 causes the stream to be useless because there's no buffer.
Should
max_buflen
/max_buffered
be changed to a float type so < 1 second can be used?Or, probably better, can it be made to represent sample number instead of seconds?
In either case, this will require most apps to be changed.
The text was updated successfully, but these errors were encountered: