Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piping and delegating requests #18

Open
scripting opened this issue Jun 21, 2021 · 2 comments
Open

Piping and delegating requests #18

scripting opened this issue Jun 21, 2021 · 2 comments

Comments

@scripting
Copy link
Owner

scripting commented Jun 21, 2021

To @danmactough and other Node mentors of mine,

I call your attention to delegateRequest, a function inside PagePark.

https://github.com/scripting/pagePark/blob/master/pagepark.js#L648

It works and it's deployed so I have to be careful about not breaking it.

But there's a problem -- pagePark has a set of renderings it does based on the file extension.

For example, when a request comes in for a .opml file, it renders it like this.

When I make a request via xxx, I get the raw unprocessed OPML.

I don't want to just send back the result, I want to first put it through the filters, and return that result.

I think perhaps what I have to do is just request the source file, and then return it exactly as if I had read it from the local hard disk. I think perhaps for the processing I'm doing I can't use a straight pipe.

I understand that this would increase the memory overhead of PP, but that's consistent with its approach, I want to create a higher level server, inching toward the server we have in Frontier...

@danmactough
Copy link

@scripting If I understand correctly, your ideal solution would involve retaining the non-buffering piping for files other than OPML files. And that because you want to buffer and transform OPML files, you feel like you need to buffer ALL files.

You do have a couple of alternatives that would allow you to continue streaming non-OPML files, but they're kind of complex.

  1. One would be to change this naive piping to code that inspects the response headers (or even the first chunk of data) and then decides whether to buffer and transform it or instead return the raw response a chunk at a time.
  2. Another would be to write a streaming OPML transformer -- instead of buffering the entire file and then transforming, transforming a chunk at a time. But not only is that complex, it may not be practicable -- it's possible that you need to have the entire contents to do that rich HTML transformation. And you would still need a way to decide which responses to pass through your transformer.
  3. Yet another (but less precise) alternative would be to rely on the request url -- if the url path ends in .opml you could branch your delegation logic and buffer+transform detected OPML files while piping others.

Option 3 would be the easiest to implement but also the least complete. If Option 3 would work for your current use case, I would give that a try while working on Option 1. I would only pursue Option 2 if (a) it's workable (chuck based transformation would yield the output you want) and (b) you really want to be able to handle arbitrarily large OPML files. Because that's the biggest concern with buffering -- if the file is too big, you can run out of memory. (Although the other big concern with buffering is nothing to ignore -- that if you need to wait for the the entire file to buffer before sending back the first byte, your latency increases with the size of the file you're proxying.)

@scripting
Copy link
Owner Author

@danmactough -- thanks for the thoughtful response.

I just wanted to be absolutely sure I was thinking about it correctly, not wanting to take any chance on breaking already-deployed applications.

After giving it some thought I think I have the ideal solution.

The piping happens when there's an option in a domain's config.json file that says to redirect from this url to another. There are several options that do that.

I can create a new option that does a mirror but trades off performance for a complete rendering. It already does this for S3 content serving. This will be a generalization of that.

Thanks for your help and being a sounding board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants