-
-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overriden res.write() has altered flushing semantics #61
Comments
Unfortunately we won't be making such a change. You're completely welcome to publish a version to |
@dougwilson Hey there, sorry for posting a non relevant question here. Is that a correct assumption if nginx gzipping the static files i don't need to use the Also if you got 3 minutes to run over this gist i would really appreciate it. 👍 |
Hi Doug Can you substantiate why you believe that this is not worth fixing? (I know from personal experience that it sucks when someone comes along and points out that you're doing something incorrectly according to an overall design, user expectations or what not. That said, I'd much appreciate a logical discussion of the issue at hand.) It seems to me that the current implementation breaks applications unexpectedly when the compression module is enabled because of this issue. |
Hi @usernameisalreadytaken2014, the reason we're not going to change it is because you are perfectly capable of implementing this is a user-land module.
This is not true and can happen with anything buffering. Just because the buffering in memory does not change this fact. If you have a reverse proxy front your Node.js server doing the buffering, your application would have the same issue. If you are trying to stream out, you need to disable the compression, for example setting the But in the end, this was really a question, and asking for us to implement these changes, which we don't see the need. Perhaps your case will be more compelling if you can provide a pull request, implementing the changes in this module, complete with full documentation and tests with full coverage. That would make a much more compelling case to add this feature, and can also allow us to discuss the implications of the given implementation. |
I beg to differ. The reason that I opened this issue is that my application broke the moment I enabled the compression module. I expect anyone else using both SSE and compression to run into the same issue. |
Then I'd argue your application was written incorrectly. Regardless, please provide the aforementioned pull request. Also, would you be interested in simply taking over this module? This would be the best way to make progress on your issue and provide the compress expertise the community needs from you. |
SSE will have this same issue on intermediate servers. Why are you not setting the |
Your entire argument of fixing SSE behavior with a magical 40ms write timeout is even worse than how it is today, not only because it will hurt the compression for all other applications, like slowing reading data from disk, but also will cause people to scratch their heads for why their event takes 40ms longer to reach the client with compression than without. Regardless, if you are passionate about pursuing this change, please provide a pull request with the necessary implementation, complete with full documentation and tests with full coverage. |
Would that cause the compression module to behave any different? If using this header gets rids of the stalls that adding the compression module causes, then this is a nice workaround. (Will compression still work then? The first couple thousands of events are sent in bulk and are highly compressible, the latter arrive in drips and are reasonably compressible.) |
Yes, as is documented on the README.
Yes, it absolutely would.
No, this header is acknowledged by the module and no longer transforms the request (compressing is a transformation of the request, as defined by the HTTP RFCs). The biggest problem with not including that header is that any intermediate proxy between your server and the client can buffer the request in the same way this module is doing, and kill your SSE functionality. There are many servers out there people use like Amazon Silk's speed boost, Opera and Google have similar public services. Not including that header is rolling the dice with your users being able to receive the events in a timely fashion. |
Yeah, that makes sense. The algorithm is basically Nagle and should only be used on network streams.
The default Nagle timeout is 40ms, so the tail segment in a non-compressed stream (stream = no response.end() to flush) that doesn't fill the segment size fully should have the exact same behaviour. As far as I can tell, no difference? |
I don't think you understand. For example, the system reads a chunk of 4kb from disk, and gzip compresses it down to 600b. The next 4kb buffer does not get read off the disk until 50ms later, but since that was longer than 40ms, this module will have sent that tiny, incomplete gzip window to the client, when it could have compressed more of that second 4kb buffer into the same gzip window, resulting in very sub-optimal compression for slow file systems. In fact, this can likely cause the response size to go up due to the overhead of the gzip window framing and dictionaries if the system reads are consistently below this 40ms flush window. While we were talking here, I wondered how can you do SSE responses in other languages like Java and use compression. They all require the user to indicate when you want to actually flush our your event, aka call If you don't want to go and make that pull request, can you at least point me to some other compression/SSE systems like this in other languages and stuff that implement the solution you are describing? |
Yep, I misunderstood. Thanks for the explanation. There's a similar effect when not using compression. For example, if the disk stalls for 50ms and 600 bytes have been read, the network stack pushes those 600 bytes out, including an additional overhead (TCP + IP headers, ethernet framing) after 40ms. The link between client and server is probably a lot faster than 600 bytes per 10 msec (faster than ~0.5 mbps), so the extra overhead has likely long gone, disappeared onto the wire, once the disk returns with more data. On a more massive scale, if the server with a slow disk is handling many simultaneous connections, that extra overhead could tip the total link bandwidth towards causing congestion to occur. There is a couple of options to circumvent that, for example the TCP_CORK option on the server-side socket to hold back until buffers are filled entirely before transmission. I guess my point is that the default on a network stream is that the kernel guarantees that what you send() will reach the other end in a timely fashion. It does not guarantee to make optimal use of bandwidth to achieve this. When the compression module is enabled, this is reversed; optimal use of bandwidth is guaranteed but your connection may stall.
Good question. I haven't touched Java for years...
If you mean reverse proxies doing compression, then I would have to test different products to say something conclusive. Best guess: simple and dumb products like Varnish would not be able to compress a slow-dripping stream, while advanced products with a hefty price tag such as BIG-IPs probably handles it just fine.
Ah, but I'm not averse to doing a pull request at all. I'm probably a shitty programmer and don't want to insist on polluting your very nice codebase with any of my gunk. Also I don't want to spend time writing something that you do not want to include anyway. I prefer to have the discussion first. I submitted this issue because I thought that maybe there was an edge case here, where the quality of the compression module might be improved. |
Hi @usernameisalreadytaken2014,
So in my example, the code was always writing 4kb chunks to
Yes, this is true in a way. This module will turn your stream into being optimized to fill up the configured gzip chunks. You can actually configure this with the
That's OK. I really don't have any use-case for this feature, and there is no available time for a very, very long time to get something like this added to the module, which means that it's unlikely to move without a provided contribution due to (1) lack of time to spending implementing and (2) no real need seen by the current maintainers, resulting in even lower priority. If it helps, perhaps this could be solved in some other way in the future, who knows? There is a problem right now with |
Thank you!! :-)
OK. You are probably correct that Might be worthwhile to mention in the documentation for this module that Eg.:
Ugh. |
FWIW, it seems that |
Missed this one:
Hmm, that would be interesting. I know compression and HTTP well enough, but I don't think my NodeJS skills are of the necessary caliber. Haven't released any open source projects since CPAN and Perl was hot fuzz. I would like to get to know github first by releasing some smaller projects, before taking over something that is used en masse such as compression... (Also: need to come up with a better username, this one is just embarrassing) |
When writing via the regular
res.write()
, under normal circumstances, data always reaches the browser.This is also the case when the connection is never-ending, and data is pushed to
write()
as it becomes available. For example when data comes in small portions interspersed with pauses.The Nagle algorithm makes sure that data reaches the browser in this case. The default setting on Linux is to wait 40 milliseconds, and if no further writes have happened, flush the data to the network.
When the
compression
module is included viaapp.use()
, this no longer happens. There is a buffer in thezlib
library which holds on to the data forever, and it never reaches the browser.This feature request is to implement the following:
write()
happens, set or reset a 40 millisecond timer.flush()
the data to the browser.That way, we retain the semantics of the original
res.write()
.Implementation ideas below.
I imagine that one way to implement this would be a "nagle" writable stream running on top of the gzip stream.
When the "nagle" stream's pipe() method is called, it looks if the passed stream is "instanceof Gzip".
If so, set an internal property to reference the underlying stream, so that we can later call
flush()
on it when a timer runs out.The text was updated successfully, but these errors were encountered: