Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Respons with byte-range Requests #188

Closed
gimse opened this issue Nov 4, 2023 · 17 comments
Closed

Slow Respons with byte-range Requests #188

gimse opened this issue Nov 4, 2023 · 17 comments

Comments

@gimse
Copy link

gimse commented Nov 4, 2023

When using byte-range requests on large files, the first request will be slow, because it does not forward the Range Header and it caches the entire file first.

I want the ability to forward the Range header and disable the cache with env variables.

@4141done
Copy link
Collaborator

4141done commented Dec 9, 2023

Hi @gimse , thank you for bringing this issue up and I apologize for the delay in my response. Do you have any sample configurations/requests that you could provide to help me reproduce the issue?

@xquek
Copy link

xquek commented Feb 16, 2024

Hi I am also experiencing this issue.

When i have a 3.1GB file on S3.

I use the default configuration.

When i do a range request - even if it for first 100bytes of the file, I noticed that the cache is large. Sometimes going up to a 3 GB.

I am using this behind aws fargate (not sure if that matters)
Image: nginxinc/nginx-s3-gateway:latest@sha256:5fb4855864e23506e649a74ecdf42689e1e30477e485a0271e0191ba1fca9405

Are range request headers been passed down to AWS?

Is there a way to avoid writing to the s3 proxy cache?

            - Name: ALLOW_DIRECTORY_LIST
              Value: 'true'
            - Name: AWS_SIGS_VERSION
              Value: '4'
            - Name: S3_BUCKET_NAME
              Value: !Ref 'NewBucketName'
            - Name: S3_REGION
              Value: !Ref 'AWS::Region'
            - Name: S3_SERVER_PORT
              Value: '443'
            - Name: S3_SERVER_PROTO
              Value: https
            - Name: S3_SERVER
              Value: !Join
                - .
                - - s3
                  - !Ref 'AWS::Region'
                  - amazonaws.com
            - Name: S3_STYLE
              Value: default
            - Name: DEBUG
              Value: 'true'
            - Name: CORS_ENABLED
              Value: 'true'

@4141done
Copy link
Collaborator

4141done commented Feb 16, 2024

Thank you for the additional information. I'm going to push this up in priority given the multiple requests.

I think what is happening here is described in this NGINX blog post (although it's not obvious, the suggested fixes are available in both the paid and free versions of NGINX): https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/#cache-slice

Once NGINX has cached an entire resource, it services byte‑range requests directly from the cached copy on disk.

What happens when content is not cached? When NGINX receives a byte‑range request for uncached content, it requests the entire file (not a byte range) from the origin server and begins streaming the response to temporary storage.

As soon as NGINX receives the data required to satisfy the client’s original byte‑range request, NGINX sends the data to the client. In the background, NGINX continues streaming the full response to a file in temporary storage. When the transfer is complete, NGINX moves the file to the cache.

So NGINX is pulling the whole file and doesn't start serving until it reaches the byte range you asked for. Then it will continue to pull down the whole file to cache afterwards which is likely what is causing your memory issues.

I need to do a bit more research in terms of how this will integrate with the normal usage of the gateway, but I think we could either introduce configuration options for the ngx_http_slice_module or just move to using slices generally if it can be configured to work well for non-byte range requests.

I will experiment and report back. While I think about what's best for the project, modifying your version to use cache slices would be a workaround.

@xquek
Copy link

xquek commented Feb 16, 2024

i think i have found out the issue with my configuration.

Range headers was not getting pass to s3 request because of

proxy_pass_request_headers off;

But i think we can manually set the request Range header: (but we need to be careful here - S3 does not support multi range)

proxy_pass_request_headers off;
proxy_set_header Range $http_range;

This will set Range to the request made to s3

now this is lighting fast! thanks!!

@xquek
Copy link

xquek commented Feb 16, 2024

but @4141done you are right. passing the range request is causing it to be not very good with caching.

@4141done
Copy link
Collaborator

@xquek Glad to here you've found a workaround. I'm going to try to make a change that incorporates your learnings as well as fixes the caching situation. If it is ok I may contact you to test. I'll leave this issue open for now.

@xquek
Copy link

xquek commented Feb 22, 2024

@4141done sure i will be more than happy to test that! that will be awesome!

i was also wondering if using a if directive will be useful here, i will still like to cache small files or requests without a range header

@4141done
Copy link
Collaborator

i was also wondering if using a if directive will be useful here, i will still like to cache small files or requests without a range header

That's good to know - currently I'm looking in to whether byte slice caching should just be the default of the project or whether we will need to introduce conditionals as you suggest or additional configuration options. It's good to understand the desire to serve both - I'm new to byte slice caching so I'll do some learning and let you know

@4141done
Copy link
Collaborator

Hey @xquek I have what I think is a potential fix for you. I'm still running some of my own tests, but I'd love it if you could give me some early feedback.

This PR #215 gives a detailed description of the change. If you don't specify PROXY_CACHE_SLICE_SIZE the cache slice size will be 1mb.

Test container is prebuilt for you here https://github.com/nginxinc/nginx-s3-gateway/pkgs/container/nginx-s3-gateway%2Fnginx-oss-s3-gateway-dev/183815019?tag=issue-188-byte-range-request

Your feedback on the behavior and the code is welcome.

@xquek
Copy link

xquek commented Feb 28, 2024

Wow thanks @4141done let me test it and get back to you!

@xquek
Copy link

xquek commented Feb 28, 2024

Hi @4141done , i tested briefly and

  1. The range header is getting passed on and that is awesome
  2. I think it is the slice cache
  3. I like the fact if you dont pass in range header, it will cache the object. This help with our usecase.

@xquek
Copy link

xquek commented Feb 28, 2024

Let me know if there are specific tests or metrics that you are looking for

@4141done
Copy link
Collaborator

Let me know if there are specific tests or metrics that you are looking for

Nope! Thank you for taking the time to test - I appreciate it.

Based on your use case, I'd love your thoughts on these questions:

  1. Do you think a default slice size of 1 megabyte (the nginx default) is fine? Going lower increases the granularity of the cache but can lead to performance issues.
  2. Do you think I need to add different configuration for the slice-based caching? Basically Adding versions of PROXY_CACHE_MAX_SIZE, PROXY_CACHE_INACTIVE for the slice-based cache (which is separate). So something like PROXY_CACHE_SLICE_INACTIVE, PROXY_CACHE_SLICE_MAX_SIZE

@xquek
Copy link

xquek commented Feb 29, 2024

i think since there are many different use case, sticking to the default may be the best option here

I think since it is possible to add custom configuration via templates - it might be fine to not expose these configs as env vars. I think it was really thoughtful engineering to have the templates - it works so well with config maps too.

Thanks again for this implementation! i am looking forward to see this getting released!

@4141done
Copy link
Collaborator

Thank you for the additional feedback. I'm going to make some final adjustments to the PR then will comment back here when it is merged.

@4141done
Copy link
Collaborator

4141done commented Mar 6, 2024

Thank you both for your patience on this issue. I've merged the fix and new docker containers should be built and available shortly. Feel free to reopen this issue or file a new one if you have any additional issues related to byte range requests

@gimse
Copy link
Author

gimse commented May 9, 2024

Thank you so much @4141done !! I worked really well now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants