-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Denial of Service attack factor #324
Comments
Thank you @Power2All. I've just tested it and it seems there is no timeout, but I guess there must be a way to do it with Axum. I'll check it. Do you have more info now? |
There is, but it's half-tested. |
Thank you @Power2All!! TimeoutIt seems the middleware always returns 408, which means: The HyperText Transfer Protocol (HTTP) 408 Request Timeout response status code means that the server would like to shut down this unused connection. It is sent on an idle connection by some servers, even without any previous request by the client. A server should send the "close" Connection header field in the response, since 408 implies that the server has decided to close the connection rather than continue waiting. This response is used much more since some browsers, like Chrome, Firefox 27+, and IE9, use HTTP pre-connection mechanisms to speed up surfing. It seems there could be some cases where a 504 could be the reason, which means the reason for the timeout would be the server and not the inactive client. They are discussing how to handle this ambiguity: tower-rs/tower-http#300. Is that what you mean "half-tested"? In our particular case, I think that's not a problem because we do not have a proxy. The tracker does not use a proxy. It could be the opposite, the tracker could be behind a proxy. Anyway, if we confirm that it could be a problem, I would prefer to just implement our own middleware and change the response status code or fully test it, instead of migrating back to actix-web only for one feature. OR at least I will postpone it, because as you mentioned we anyway have other DoS risks like the rate limit. Rate limitI agree, a rate limit is something we need. But maybe for the time being you can handle that with a proxy. RwLockCan you elaborate on which cases
Does that make sense for you @Power2All ? Hey @da2ce7 @WarmBeer what do you think? ExtraIf you are considering changing your fork back to actix-web and that's a big effort for you I would recommend you to reconsider forking the current tracker version since now:
We could even:
|
That discussion was based on my investigation of some high unusual tokio threads. use anyhow::anyhow;
use axum::{routing::get, Router};
use axum_server::{accept::Accept, Handle};
use futures_util::{ready, Future};
use http_body::Body;
use hyper::{client::conn::handshake, Request, Response};
use pin_project_lite::pin_project;
use std::{
future::Ready,
io::ErrorKind,
net::SocketAddr,
pin::Pin,
task::{Context, Poll},
time::Duration,
};
use tokio::{
io::{AsyncRead, AsyncWrite, ReadBuf},
net::TcpStream,
sync::mpsc::{self, UnboundedReceiver, UnboundedSender},
time::{Instant, Sleep},
};
use tower::{Service, ServiceExt};
const TIMEOUT: Duration = Duration::from_secs(5);
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
let handle = Handle::new();
tokio::spawn(start_server(addr, handle.clone()));
handle
.listening()
.await
.ok_or_else(|| anyhow!("server failed to bind"))?;
let stream = TcpStream::connect(addr).await?;
let (mut send_request, connection) = handshake(stream).await?;
let conn_task = tokio::spawn(connection);
let request = Request::new(hyper::Body::empty());
tokio::time::sleep(Duration::from_secs(3)).await;
send_request.ready().await?.call(request).await?;
conn_task.await??;
println!("Connection closed.");
Ok(())
}
async fn start_server(addr: SocketAddr, handle: Handle) -> anyhow::Result<()> {
let app = Router::new().route("/", get(handler));
axum_server::bind(addr)
.acceptor(TimeoutAcceptor)
.handle(handle)
.serve(app.into_make_service())
.await?;
Ok(())
}
async fn handler() {}
#[derive(Clone)]
struct TimeoutAcceptor;
impl<I, S> Accept<I, S> for TimeoutAcceptor {
type Stream = TimeoutStream<I>;
type Service = TimeoutService<S>;
type Future = Ready<std::io::Result<(Self::Stream, Self::Service)>>;
fn accept(&self, stream: I, service: S) -> Self::Future {
let (tx, rx) = mpsc::unbounded_channel();
let stream = TimeoutStream::new(stream, TIMEOUT, rx);
let service = TimeoutService::new(service, tx);
std::future::ready(Ok((stream, service)))
}
}
#[derive(Clone)]
struct TimeoutService<S> {
inner: S,
sender: UnboundedSender<TimerSignal>,
}
impl<S> TimeoutService<S> {
fn new(inner: S, sender: UnboundedSender<TimerSignal>) -> Self {
Self { inner, sender }
}
}
impl<S, B, Request> Service<Request> for TimeoutService<S>
where
S: Service<Request, Response = Response<B>>,
{
type Response = Response<TimeoutBody<B>>;
type Error = S::Error;
type Future = TimeoutServiceFuture<S::Future>;
fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
self.inner.poll_ready(cx)
}
fn call(&mut self, req: Request) -> Self::Future {
// send timer wait signal
let _ = self.sender.send(TimerSignal::Wait);
TimeoutServiceFuture::new(self.inner.call(req), self.sender.clone())
}
}
pin_project! {
struct TimeoutServiceFuture<F> {
#[pin]
inner: F,
sender: Option<UnboundedSender<TimerSignal>>,
}
}
impl<F> TimeoutServiceFuture<F> {
fn new(inner: F, sender: UnboundedSender<TimerSignal>) -> Self {
Self {
inner,
sender: Some(sender),
}
}
}
impl<F, B, E> Future for TimeoutServiceFuture<F>
where
F: Future<Output = Result<Response<B>, E>>,
{
type Output = Result<Response<TimeoutBody<B>>, E>;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let this = self.project();
this.inner.poll(cx).map(|result| {
result.map(|response| {
response.map(|body| {
TimeoutBody::new(body, this.sender.take().expect("future polled after ready"))
})
})
})
}
}
enum TimerSignal {
Wait,
Reset,
}
pin_project! {
struct TimeoutBody<B> {
#[pin]
inner: B,
sender: UnboundedSender<TimerSignal>,
}
}
impl<B> TimeoutBody<B> {
fn new(inner: B, sender: UnboundedSender<TimerSignal>) -> Self {
Self { inner, sender }
}
}
impl<B: Body> Body for TimeoutBody<B> {
type Data = B::Data;
type Error = B::Error;
fn poll_data(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Result<Self::Data, Self::Error>>> {
let this = self.project();
let option = ready!(this.inner.poll_data(cx));
if option.is_none() {
let _ = this.sender.send(TimerSignal::Reset);
}
Poll::Ready(option)
}
fn poll_trailers(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Result<Option<hyper::HeaderMap>, Self::Error>> {
self.project().inner.poll_trailers(cx)
}
fn is_end_stream(&self) -> bool {
let is_end_stream = self.inner.is_end_stream();
if is_end_stream {
let _ = self.sender.send(TimerSignal::Reset);
}
is_end_stream
}
fn size_hint(&self) -> http_body::SizeHint {
self.inner.size_hint()
}
}
struct TimeoutStream<IO> {
inner: IO,
// hyper requires unpin
sleep: Pin<Box<Sleep>>,
duration: Duration,
waiting: bool,
receiver: UnboundedReceiver<TimerSignal>,
finished: bool,
}
impl<IO> TimeoutStream<IO> {
fn new(inner: IO, duration: Duration, receiver: UnboundedReceiver<TimerSignal>) -> Self {
Self {
inner,
sleep: Box::pin(tokio::time::sleep(duration)),
duration,
waiting: false,
receiver,
finished: false,
}
}
}
impl<IO: AsyncRead + Unpin> AsyncRead for TimeoutStream<IO> {
fn poll_read(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<std::io::Result<()>> {
if !self.finished {
match Pin::new(&mut self.receiver).poll_recv(cx) {
// reset the timer
Poll::Ready(Some(TimerSignal::Reset)) => {
self.waiting = false;
let deadline = Instant::now() + self.duration;
self.sleep.as_mut().reset(deadline);
}
// enter waiting mode (for response body last chunk)
Poll::Ready(Some(TimerSignal::Wait)) => self.waiting = true,
Poll::Ready(None) => self.finished = true,
Poll::Pending => (),
}
}
if !self.waiting {
// return error if timer is elapsed
if let Poll::Ready(()) = self.sleep.as_mut().poll(cx) {
return Poll::Ready(Err(std::io::Error::new(
ErrorKind::TimedOut,
"request header read timed out",
)));
}
}
Pin::new(&mut self.inner).poll_read(cx, buf)
}
}
impl<IO: AsyncWrite + Unpin> AsyncWrite for TimeoutStream<IO> {
fn poll_write(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &[u8],
) -> Poll<std::io::Result<usize>> {
Pin::new(&mut self.inner).poll_write(cx, buf)
}
fn poll_flush(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<std::io::Result<()>> {
Pin::new(&mut self.inner).poll_flush(cx)
}
fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<std::io::Result<()>> {
Pin::new(&mut self.inner).poll_shutdown(cx)
}
fn poll_write_vectored(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
bufs: &[std::io::IoSlice<'_>],
) -> Poll<Result<usize, std::io::Error>> {
Pin::new(&mut self.inner).poll_write_vectored(cx, bufs)
}
fn is_write_vectored(&self) -> bool {
self.inner.is_write_vectored()
}
} |
Doesn't really matter, normally the proxy end would disconnect it after a certain time, however, if it in some reason doesn't, the connection will be keeping in use, for example, keep-alive could be such a issue.
I've not fully figured out why it creates deadlocks, I also tried to figure it out using lockbud to scan for potential deadlocks. However, I'm pretty sure it has to do with over-saturating the read/write locks, until to the point that it's overwhelming and killing the system. An HTTP/HTTPS/UDP rate limiter should be able to tame the overwhelming amount of connections. Currently testing it on Actix, and so far it seems to fare, but I will let you know after I've tested it thoroughly. Keep in mind, I've been having 8000 connections average per second on HTTP/HTTPS, and 3000+ connections per second on UDP, so that's roughly 12k connections per second, excluding the obvious burst connections, so a rate limiter would be a very, very good idea ;)
It's a overall issue, the RwLock is just being overwhelmed with the amount of incoming connections to deal with, doesn't matter if it's a single IP, I want RwLock to be able to keep up with the resources it has. Tokio RwLock eventually has it's limit, and I reach it pretty quickly in the midday, then the most connections are coming in. About your extra, the thing is, Rust is still something I'm learning, and I find out every time something new, and using my project as a playground so to speak. I've looked into your code, and it's going to be a lot of work to make it optimal, since I'm still not happy how the SQLx is performing (need less memory overhead, I know where I do it wrong, but haven't gotten around to improve it with the connectivity issues right now). I've also been pondering about replacing the whole RwLock for a fullblown Crossbeam Channel system, which does have a lock mechanism that doesn't use any CPU cycles, and might be a real big performance patch-up. I like the idea, but I need to get used to Torrust-Tracker, if I want to be useful, since a lot of the things are scattered in folders and files, and makes it a hassle to figure out where what is resided in. |
Hi @Power2All thank you for taking the time to reply. Your feedback is beneficial for me and the project. I guess you are the only kwon contributor who is actually using the tracker (your forked tracker, which has quite diverged) on heavy load conditions. First of all, I have to say I'm also a newbie to Rust, BitTorrent and "systems" programming. Clarifications for the previous commentWhen I said there was not a problem with the actual timeout middleware implementation from Axum (because we do not use a proxy), I meant only for the response status code 408. For me, the HTTP 408 response status code is OK in our context. But I see that we have to solve the problem anyway if the tracker does not close connections that have not been used for a long time by the client. DeadlocksId this the tool you are using? I'm interested in knowing a concrete case where you could potentially replace RwLock with Crossbeam Channel system. By the way, would this be the package to use? Intentional vs unintentional denial-of-serviceI just want to clarify how I see the problem of saturating the system.
You seem to be focused on the "unintentional" ones, especially on deadlocks.
Regarding "intentional" DoS attacks, I'm not a security expert, but I suppose for those cases, you should try to block only the attacker and not only try to avoid the system to reach a saturated state. The only method I know it's banning an IP if it goes over a request rate limit. It would be fun to explore other mechanisms like Proof-of-Work, for example, Hashcash. What do you think @da2ce7? Architecture and performanceI'm sorry to hear that "a lot of the things are scattered in folders and files". That's probably my fault. If that helps, I've opened a new discussion about the: I think performance is a key aspect of this kind of software, but I would like to have other things like:
Maybe a more transactional architecture could improve the performance, but in my opinion, it is going to be harder to get those attributes. I'm willing to pay a 5% performance loss to get other things. As I see it, even if your application is really good at performance, at some point, you are going to reach the machine limit. For me, bad performance means:
The second case could make the deployment harder, leading to the loss of users because of the complexity of running the app. The problem is we do not know which kind of users are interested in running the tracker:
|
No problem, I knew what you mean, but you need to take into account, with the current time, security is a must, and exploiters and hackers will try anything to break the code, so security is a must have right now, and a open to be exploited DoS factor is a no-no in my books ;)
Yes, and yes. I'm going to work on a simple separate .rs file that contains the channel handler. It works almost the same like with GoLang where I implemented it the same way.
Both, first was I noticed the Axum issue that it didn't have a correctly connection, read, write and disconnect timeout implement. Spoke to that david guy from the tokio development team, and it seems nobody was aware this would be such a issue, so they got some pressure on it now as well, since other users chimed in about it too.
Those can be alleviated using Crossbeam Channel as replacement for all the RwLock Arc's, but that's going to be some work on my end as well, since I've done it on Go pretty well, Rust is another one that makes it a nice and interesting side project.
It's the best to have everything dealt with in a single thread, since moving pointers between threads is a hassle pretty much. With Crossbeam Channels, instead of moving around a shared big ass BTreeMap and such, you simply throw bits between threads to the main thread that handles it, which was my main focus on the former Go app I made.
I've not figured out yet to profile correctly within CLion yet, but it would be pretty neat to see memory consumption and whatnot shown on-the-fly, just like C# in Visual Studio has. If you know something about that, let me know :)
Trust me, I've dealt with a shit ton of DDoS in the past when I was hosting and administrating a controversial IRC network, so I know what kinds of stuff could be thrown at it, including DNS reflection attacks (which is a bitch to deal with).
It was just a observation. If there was a logical reason for these scattering, and you could explain me the logic behind it, I could get used to it, but right now I like less scattered structures, like for example I had to get used to PHP Symfony structure, but now I know why it's like that and love it.
I like the way you explain things, so it's pretty awesome to respond to these messages, hence why I also helped WarmBeer, awesome dude.
I see it more like a general purpose server, it could be used for public trackers, private tracker, or semi private that also monitors user accounts bandwidth usage and such (which I'm working on for version 3.2.0). If you are on Discord, you can add me through this handler, so we could have some more indept discussion if you like: Cheers |
I'm trying to fix this problem in Axum, but I can't find a patch. I have not tried the solution proposed by @Power2All yet. I've been waiting to see if axum-server crate responds to my issue. I thought it was solved after the migration to hyper 1.0, but I only see an option to set a timeout for sending the request headers ( https://github.com/josecelano/axum-server-timeout I can also confirm that it works in ActixWeb. I've added three web frameworks in the example: Axum, ActixWeb and Rocket. It seems that it only works in ActixWeb. |
That's why I moved away from Axum and went with Actix for quiet a while, with good progress ;) |
I'm you are using a proxy (Nginx or Apache) it seems you can mitigate this attack with the proxy configuration: Slowloris DoS Attack
|
Correct, but that's just a band-aid. |
I've updated the example repo with the patch. I had to update the patch because the |
Hi @Power2All, I've applied the patch (from @programatik29) you mentioned in the two Axum servers we are using:
However, It does not send a 408 Request Timeout to the clients like ActixWeb. But it seems not all servers do it: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/408 I'm going to close this, but feel free to re-open if you think we are still vulnerable to this attack. |
The patch does not work when TSL is enabled. |
The new hyer release 1.4.0 changes the We are already using this new version, but it seems it has not fixed the problem. When you run the tracker and connect to the API via telnet without making any request, the connection is not closed. How to reproduceRun the tracker. Make sure the API is enabled.
Connect to the API:
Hyper 1.4.0Although it seems it's not working with Axum, I did a test using directly hyper, and it works. You can use the https://github.com/hyperium/hyper/blob/0eb1b6cf4d914ce9c3f8e92a8b43754eba27a327/examples/echo.rs If you run the example with:
And then you run:
You need to set the timeout in the main function with #[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
let listener = TcpListener::bind(addr).await?;
println!("Listening on http://{}", addr);
loop {
let (stream, _) = listener.accept().await?;
let io = TokioIo::new(stream);
tokio::task::spawn(async move {
if let Err(err) = http1::Builder::new()
.timer(TokioTimer)
.header_read_timeout(Duration::from_secs(5))
.serve_connection(io, service_fn(echo))
.await
{
println!("Error serving connection: {:?}", err);
}
});
}
} |
Yah, I left Axum for a while, and Actix v4 is doing everything I need, stable and fast. |
Since I've been using Axum on my own project based on Torrust-Tracker, I'm moving myself to Actix framework again, but now with properly implementation.
This server uses, as far as I'm aware, still the Hyper/Warp framework, which is missing the basic function for Client Connection and Client Disconnection timeout.
You could verify if the code is affected, by opening a command prompt (or linux the terminal), and telnet to the API or HTTP port of Torrust-Tracker. If it works correctly, this should disconnect within a few seconds, or if you customized this timeout a bit longer, but normally by default this should be no more then 5 seconds at the most (reading Warp documentation, there is a default timeout of 30 seconds).
If this is not killing the connection, could be seen as a critical security flaw, and should be addressed. I haven't tested it out with the current code, but could in a minute, modifying this post.
The text was updated successfully, but these errors were encountered: