-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway + Letsencrypt changes for app , notebook-api #5244
Gateway + Letsencrypt changes for app , notebook-api #5244
Conversation
…ains. Letsencrypt: add app and notebook-api to domain list used in generating TLS cert
gateway/hail.nginx.conf.in
Outdated
@@ -35,6 +35,9 @@ server { | |||
server { | |||
server_name scorecard.@domain@; | |||
|
|||
gzip on; | |||
gzip_types text/plain text/html application/json; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gzip is on by default, see /etc/nginx/nginx.conf
, but the gzip_types is empty. I think you should edit that file to enable the types we want globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Do we want a single global types definition, or application-controlled definitions? For instance notebook-api never serves html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global, I would assume. I don't see a problem with having types in gzip_types that aren't served. The question is does one service want to compress a type that another one does not. I don't see why that would be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not incur the cost of compression/decompression when you know your responses are small, and to avoid having to modify a global config every time you want to add a new type (for instance a font, or .ico; I can modify the server block for the nginx instance that sits in front of my service, rather than modifying the nginx server that sits at the public/private boundary in our modular nginx world). For small items, gzipping increases size (i.e 0 bytes -> ~600 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then set gzip_min_length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid having to modify a global config
I don't see what's wrong with that. In fact, I see it as preferable to editing every server block that might use that modification. And, as you know, I'd like make gateway service-generic so there won't be a per-service block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this. A global option will be set in next commit, and overridden in server blocks if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it need to be overridden?
location ~ /instance/([^/]+)/(.*) { | ||
set $svc_name $1; | ||
#set $auth_request_uri "$is_args$args"; | ||
#auth_request /auth-notebook; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auth_request for notebook redirect will be reenabled when I modify it to properly handle not only the initial redirect
Ah, I see. Leaving this out means, essentially, anyone with the right URL can get execution access to our cluster. This seems bad, even temporarily. What's involved in fixing the endpoint to support the other requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, yes, not safe. For an initial demo I don't think it's very problematic, since attackers need to guess a svc_name and a long randomly generated string, for resources that live only the length of the demo, and whose names are transmitted over https and not stored browser-side (in-memory only). Potential strategies:
- in notebook-api/verify response, set a cookie containing the access_token
- allow all .css/.js files, and for all other requests use the Referrer, which contains the access_token.
- The .js files should contain no state, they're just application/ui libraries. Even if they were resources that are only served to authenticated users, they will not be compiled for each user specifically (most likely).
- The requests with referrer headers are Jupyter-issued redirects to protected resources it seems, and those possess the access_token, and we can/will check them in this mode
We could also save a Flask-issued session cookie, encrypted with a secret, and check that to avoid re-validating the access_token.
gateway/Dockerfile.in
Outdated
RUN apt-get update -y && \ | ||
apt-get install -y nginx && \ | ||
rm -rf /var/lib/apt/lists/* | ||
FROM nginx:1.15-alpine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't prefer this at this point. Alpine is much harder to work with if you're trying to debug anything (and we're at the stage where we often are.) Thoughts, @danking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it harder to debug? To debug Nginx requests I view the log, where error.log and access.log are symlinked to stderr/stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes you want to log in and do stuff and Alpine has none of the tooling that makes that easy (which is why it is so small).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the change if there is something that we're doing that alpine can't cover.
The other motivation was to move us to Nginx 1.15.N from 1.10.3, which is years old (if for no reason other than patching vulnerabilities: https://www.cvedetails.com/vulnerability-list/vendor_id-10048/product_id-17956/version_id-198730/Nginx-Nginx-1.10.0.html , nixawk/labs#15)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not talking about the normal behavior, I'm talking about the developer experience when you log into the pod to muck around, e.g. you get bash instead of sh. I'm just trying to make things easy for us. I don't think image size matters here, and if all the images are based on Debian, it will be cached on every node and there is very little overhead. I'd even prefer a standard image build on Debian with useful stuff installed (e.g. emacs, which I immediately install every time I log into a pod).
I'm all for upgrading. You can do that without switching to Alpine. Let's make that a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll roll back the change.
proxy_pass http://auth-gateway.default.svc.cluster.local/verify/$auth_request_uri; | ||
} | ||
|
||
location = /auth-notebook { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used? The auth_request below is commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be used, placeholder until I re-enable it. It works as provided, just not on subsequent requests issued by Jupyter, which drop the ?access_token. I can remove if you prefer.
} | ||
|
||
location / { | ||
if ($request_method = 'OPTIONS') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason to handle this here and not in the notebook api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw nginx as the controller of ingress, and not notebook-api. It should also be more resource efficient than Flask. That being said, I don't have a strong preference, both Flask, and Sanic which I prefer over Flask, have CORS functions that are very simple to use
add_header 'Access-Control-Allow-Origin' '*'; | ||
add_header 'Access-Control-Allow-Credentials' 'true'; | ||
add_header 'Access-Control-Allow-Methods' 'GET, POST, DELETE, OPTIONS'; | ||
add_header 'Access-Control-Allow-Headers' 'Authorization,Keep-Alive,User-Agent,Cache-Control,Content-Type'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were using Flask-CORS for this. Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We certainly can, mostly wanted to try to separate notebook business logic from public/private ingress concerns
|
||
# Support large URI's stemming from access tokens | ||
client_header_buffer_size 64k; | ||
large_client_header_buffers 4 64k; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a brief discussion the other day about passing the token when redirect to notebook. Would making the token a cookie solve this? Is that better that the header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in my response to your question about /auth-notebook I mention this as a potential strategy. This is maybe the preferred way for the 2nd ... Nth access of /instance/<svc_name>.
The initial request needs to either have an access_token get parameter, or, if we're ok with not having a portable url, we can have app.hail.is do something more complicated, where the listed notebook isn't simply an anchor tag <a href'=https://notebook-api.hail.is.instance/svc-...
but some click handler that sets a cookie and then redirects the user.
edit: I think a simpler version of the latter approach is the specify domain=hail.is when setting the access token cookie. It should then be usable with the anchor tag. In that case, the /instance... url still loses portability, will only be usable when clicked on from a browser that has had this cookie set by app.hail.is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that response way too specific to the details of Jupyter (discussion of file types, etc.)
Won't making the JWT a cookie instead of putting it in the header completely solve this problem? Then all accesses (1st and Nth) check the user in the JWT and are the same, you don't need a second cookie, we get a portable URL and the link to the notebook is a regular anchor, etc. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the JWT is a cookie, one of our services needs to set it. That happens in one of two scenarios: app.hail.is sets it when it receives the token from auth0 after authentication, or Nginx gateway sets it after auth_request reads/authorizes an ?access_token ...
You of course can't pin a cookie to a URL, since the URL is fully defined by its string domain/path/to/resource?query_variables , which I don't think you're suggesting . So when I say "portable" I mean that the url will only work from a browser that already has the cookie set ... unless the url contains access_token, and after auth_request nginx or Flask sets it (such that for all other requests it has this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I might be swinging in the dark here because I don't understand this stuff super well, but can't you set the cookie in JavaScript after the auth0 authentication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like document.cookie = newCookie;
: https://developer.mozilla.org/en-US/docs/Web/API/document/cookie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate, if we set a domain for the cookie, all subdomains of that domain will have access to it. However, the nginx gateway will not see that cookie if the /instance/... url is accessed from a browser in which that cookie wasn't already sent.
So if we want the instance url to be copy-pastable this won't work, only having the access_token in the url on the first request will work.
We probably prefer to require a cookie, since the cookie will represent one added assurance (that the url was accessed from app.hail.is), although we should not feel too confident in this as an access control, because the access token can, and over enough time and use, will, be leaked. In that case, the only access control is the access token's short lifespan.
I'm losing track of all the threads. Trying to summarize:
|
Got it. |
@cseed, do you mind if we use nginx:1.15.8? It's on debian:stretch-slim. This should have no issues, and is the easiest way to get 1.15.8. Is also the easiest way to pin to a version, so we understand what we're running by glancing at the dockerfile. |
I'm happy to switch to debian:9.6 (that's the same as debian:stretch). But as far as I can tell, it has 1.10.3. |
Great. Dismiss the review when it's ready for another look. |
Why not use the official docker image nginx:1.15.8? It does everything our custom Dockerfile does, pins the version, and removes 50% of the lines in our custom config. The only thing that I see it not doing that we may want is the jwt auth request module, but that isn't needed currently. |
Oh, I misunderstood, I thought you were suggesting changing our FROM to stretch/9.6. I think that should be fine, but can we do it as a separate PR since it seems orthogonal to this change which we're trying to get in for the demo tomorrow? (And in general orthogonal changes should be separate PRs so discussion on one part doesn't hold up the other parts.) |
Yes, although the gzip settings issued in this pr will be different between the two version. 1.10.3 doesn't have gzip on by default. I understand the value of conservative updates before public demonstrations, so will do what you ask. Btw, the full config if relying on nginx:10.15.8 goes from:
to
kind of neat. |
I don't see this as conservative, and I don't think it has to do with the demo. It is to make incremental changes and disentangle unrelated changes. You should think of a PR as being lightweight. There's no reason not to create lots of them. In fact, by creating lots of them, you increase your velocity by removing false dependencies. |
Just adds app.hail.is and notebook-api.hail.is server blocks for the two corresponding features, and modifies letsencrypt domain def to allow us to listen on 443/use https.
This has already been pushed to the cluster, works. The auth_request for notebook redirect will be reenabled when I modify it to properly handle not only the initial redirect, but the subsequent ~30 requests issued by the Jupyter web app (various css / js files, and some Ajax requests).