Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting HTTP protocol request body #204

Open
paulofaria opened this issue Mar 29, 2020 · 6 comments
Open

Getting HTTP protocol request body #204

paulofaria opened this issue Mar 29, 2020 · 6 comments

Comments

@paulofaria
Copy link
Contributor

I'm having trouble changing the webapp example to allow reading the request body. My idea was to detach after receiving the fields, receive the body, attach again to send the status, detach one more time to send the response body. Everything works fine until I try to detach the second time. The program simply blocks and never returns. I'm not sure if this is a bug or a mistake in my understanding of how the library should be used.

coroutine void html_worker(int s) {
    s = http_attach(s);
    assert(s >= 0);

    char command[256];
    char resource[256];
    int rc = http_recvrequest(s, command, sizeof(command),
        resource, sizeof(resource), -1);
    assert(rc == 0);

    printf("%s %s HTTP/1.1\n", command, resource);
    /* This will hold the content length */
    int sz = 0;
    
    while(1) {
        char name[256];
        char value[256];
        rc = http_recvfield(s, name, sizeof(name), value, sizeof(value), -1);
        if(rc < 0 && errno == EPIPE) break;
        printf("%s: %s\n", name, value);
        if(!strcmp(name, "Content-Length")) sz = atoi(value);
        assert(rc == 0);
    }
    
    /* Detach so we can read the request body. */
    s = http_detach(s, -1);
    assert(s >= 0);
    
    /* Read the request body */
    char buf[sz + 1];
    rc = brecv(s, buf, sz, -1);
    assert(rc == 0);
    buf[sz] = '\0';
    printf("\n%s\n", buf);
    
    /* Attach again so we can send the reply. */
    s = http_attach(s);
    assert(s >= 0);
    
    /* Send an HTTP reply. */
    rc = http_sendstatus(s, 200, "OK", -1);
    assert(rc == 0);
    
    /* Perform HTTP terminal handshake. */
    s = http_detach(s, -1); /* <--------- This is where the program blocks forever.
    assert(s >= 0);

    /* Send the HTML to the browser. */
    rc = bsend(s, html, strlen(html), -1);
    assert(rc == 0);

    /* Close the underlying TCP connection. */
    rc = tcp_close(s, -1);
    assert(rc == 0);
}
@paulofaria
Copy link
Contributor Author

To be more specific it gets blocked in this line:

libdill/term.c

Line 168 in de7a917

ssize_t sz = dill_mrecvl(self->u, first, last, deadline);

From what I understand it waits for the client to send more data, which is strange since I already read the whole request body.

@jakgra
Copy link

jakgra commented Apr 1, 2020

I have the same problem. I didn't find any official way in the documentation and am probably misunderstanding some parts of how the library should be used. I also tried the way described above but failed with the same problem. It worked when I would only http_detach() once and afterwards read the body, but this is very unpractical, because I first need to parse the body to know if I need to respond with 200 or 500 using http_sendstatus() and http_sendstatus() doesn't work after calling http_detach(). Calling http_attach() and http_detach() 2 times doesn't work.

I think the problem is that the TERM protocol expects at least one message to be received, so it waits for it before closing the socket in term_detach() called from http_detach() and doesn't receive it because we have already read all the headers and all the body, when calling http_detach() the second time. Another problem is that http_detach() adds an empty line and curl hangs up if no content-length is specified. So my custom solution for now is adding a few functions to libdill that work around the above described problems. But there is probably some nicer official solution.

Here is the code to my temporary solution: jakgra@9d5d13d

The way you use it is the first time you call http_pause() instead of http_detach(). All the other stuff is the same as in the above solution:

  deadline = now() + 2000;

  s = http_attach(s);
  check(s >= 0, http_cleanup);
  rc = http_recvrequest(s, command, sizeof(command), resource, sizeof(resource),
                        deadline);
  check(rc == 0, http_cleanup);
  body_len = -1;
  while (1) {
    char name[256];
    char value[256];
    rc = http_recvfield(s, name, sizeof(name), value, sizeof(value), deadline);
    if (rc < 0 && errno == EPIPE)
      break;
    check(rc == 0, name_cleanup);
    if (strcasecmp(name, "Content-Length") == 0) {
      errno = 0;
      body_len = strtol(value, &endptr, 10);
      check(body_len >= 0, name_cleanup);
      check(errno == 0 && *endptr == '\0', name_cleanup);
    }
  }
  s = http_pause(s, deadline);
  check(s >= 0, pause_cleanup);
  if (body_len > 0) {
    body = bfromcstr("");
    check(body, pause_cleanup);
    // TODO check for max body length :), to prevent malicious RAM exhaustion
    rc = balloc(body, body_len + 1);
    check(rc == BSTR_OK, body_cleanup);
    rc = brecv(s, body->data, body_len, deadline);
    check(rc == 0, body_cleanup);
    body->slen = body_len;
    body->data[body->slen] = '\0';
  } else {
    body = NULL;
  }
  s = http_attach(s);
  check(s >= 0, body2_cleanup);

  // ... handle all the bussines logic and set the response_body string
  
  /* Send an HTTP reply. */
  rc = http_sendstatus(s, status_code, status, deadline);
  check(rc == 0, final_cleanup);

  rc = http_sendfield(s, "Content-Type", "application/json; charset=utf-8",
                      deadline);
  check(rc == 0, final_cleanup);

  body_len = strlen(response_body);
  length = bformat("%d", strlen(response_body));
  check(length, final_cleanup);
  rc = http_sendfield(s, "Content-Length", bdata(length), deadline);
  bdestroy(length);
  check(rc == 0, final_cleanup);

  /* Perform HTTP terminal handshake. */
  s = http_detach(s, deadline);
  check(s >= 0, final_cleanup);

  /* Send the HTML to the browser. */
  rc = bsend(s, response_body, strlen(response_body), deadline);
  check(rc == 0, final_cleanup);

  /* Close the underlying TCP connection. */
  rc = tcp_close(s, deadline);
  check(rc == 0, final_cleanup);

@MobiusHorizons
Copy link

I also have this issue. It's a huge issue that you can't read the body of an http request. Is there really no supported way to do this?

@paulofaria
Copy link
Contributor Author

@sustrik I know you're very busy, but I kindly ask you to at least reply with a "this is a bug I'm open to PRs" or "I don't expect it to be supported in libdill, create a fork or a new protocol in your application". Just so we have a direction.

@sustrik
Copy link
Owner

sustrik commented Jul 31, 2020

Yes, it seems to be a bug and I am open to PRs.

@PythonJedi
Copy link

Seems like I'm getting to experiment with libdill by fixing stuff right off the bat. I haven't tested the client code here against #209 just yet (would make a good candidate for extra test cases for HTTP, I think), it's gotten far later than I intended and I need to get some sleep before work.

The root cause of the issue is that HTTP/1.1 (and presumably 2.0) considers the optional content body that follows the CRLF that terminates the header as part of the message, but we're trying to drop back to the underlying bytestream socket to read that body. TERM expects an empty line for the terminal handshake every time it is detached, so detaching the second time hangs waiting for the client to send another empty line, which it won't if it's only sending one request. Nor will the client close the socket, as it is waiting for the body the server promised would follow.

Regardless of how the HTTP module exposes the message body to application code, it will need to detach the SUFFIX protocol in order to correctly pass through arbitrary octets in the message body (it's not necessarily text), and so anything attached above it also needs to be detached. I added a variant of TERM called HUP (short for hang-up) that allows a peer to detach without receiving a terminal message from the other peer, but ensures that the terminal message is sent if any other messages have been sent. I'm not sure that that's the best approach, since it's easy to read messages as the wrong protocol by detaching early, but at least I think I grok the basics of libdill after implementing a new micro-protocol. It might be better to just drop the use of TERM from HTTP altogether, have it work directly with a SUFFIX socket, and handle the empty lines explicitly. However, I suspect this isn't the only scenario where this kind of behavior is desirable.

I'm not sure that I'd want libdill to add much more functionality to HTTP. It appears to get quite hairy to handle all the edge cases appropriately if we wanted to, say, provide a wrapped socket to read off the body octets in a controlled manner instead of completely detaching the HTTP protocol to read/write the message body. That could easily start rather aggressive feature creeping towards a full-blown REST framework. Not that I wouldn't want such a thing to exist (I think it'd actually be rather fantastic to have a structured concurrency REST backend), but that should be tackled by a separate project, in my opinion. I suspect other protocols like SMTP/POP3/IMAP/SIP/XMPP/etc. would fall into the same category. As soon as we hit the application layer in the networking stack, there's just too much variation to try and have them all live under the same project, and it's arguable whether HTTP should be included in libdill, as it's otherwise just providing transport layers (since SOCKS5 is a proxy). That discussion should probably be in a separate issue, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants