Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(etag): allow for custom hashing methods to be used to etag #3832

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

EdamAme-x
Copy link
Contributor

@EdamAme-x EdamAme-x commented Jan 15, 2025

resolve: #3829

const app = new Hono()

app.use(
  '/etag/*',
  etag({
    generateDigest: (body: Uint8Array) =>
      crypto.subtle.digest({
          name: 'SHA-256',
        },
        body
      ),
  })
)

The author should do the following, if applicable

  • Add tests
  • Run tests
  • bun run format:fix && bun run lint:fix to format the code
  • Add TSDoc/JSDoc to document the code

@EdamAme-x
Copy link
Contributor Author

Can you review this?
@Gobd @usualoma

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.32%. Comparing base (2a68c59) to head (a48d054).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3832      +/-   ##
==========================================
- Coverage   91.71%   91.32%   -0.39%     
==========================================
  Files         160      161       +1     
  Lines       10195    10247      +52     
  Branches     2925     2905      -20     
==========================================
+ Hits         9350     9358       +8     
- Misses        844      888      +44     
  Partials        1        1              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Gobd
Copy link

Gobd commented Jan 15, 2025

I'll try it out later today

@EdamAme-x
Copy link
Contributor Author

@Gobd Sorry for ping, how?

@usualoma
Copy link
Member

Hi @EdamAme-x!

Thank you for creating the PR. I'm sorry for the very late response. Please wait a little while as I check it.

@usualoma
Copy link
Member

@EdamAme-x
This method of PR is very good because users can generate the etag value simply by passing a custom hash function. Users don't need to be aware of the stream, and it's very simple.

However, the current etag of hono is limited by the fact that crypto.subtle does not support incremental generation, so it is in a situation where it is forced to repeat the generation of the hash value. If a custom function supports incremental generation, it would be more advantageous in terms of performance to use that. (Or, depending on the app, it may be the case that generating a hash value from the entire body is not necessary, and that generating a hash value from the first chunk is sufficient.) Considering this, I think there is a possibility that a generator that receives a stream would be better.
For example, it could be done in the following way.
main...usualoma:hono:feat/etag-generator-option

However, if you are going to be this flexible, it seems better to handle it in the middleware for each user rather than as an option.

diff --git a/src/middleware/etag/index.test.ts b/src/middleware/etag/index.test.ts
index a1c67c60..f2a90f7f 100644
--- a/src/middleware/etag/index.test.ts
+++ b/src/middleware/etag/index.test.ts
@@ -1,3 +1,4 @@
+import { createHash } from 'crypto'
 import { Hono } from '../../hono'
 import { RETAINED_304_HEADERS, etag } from '.'
 
@@ -261,4 +262,74 @@ describe('Etag Middleware', () => {
       expect(res.headers.get('ETag')).toBeNull()
     })
   })
+
+  it('Should generate etag from custom generator middleware', async () => {
+    const customGenerator = async (stream: ReadableStream) => {
+      // create sha1 hash incrementally by Node.js crypto
+      const hash = createHash('sha1')
+      const reader = stream.getReader()
+
+      while (true) {
+        const { done, value } = await reader.read()
+        if (done) {
+          break
+        }
+        hash.update(value)
+      }
+
+      return hash.digest('hex')
+    }
+
+    const app = new Hono()
+    app.use('/etag/*', etag(), async (c, next) => {
+      await next()
+
+      if (c.res.headers.get('ETag')) {
+        return
+      }
+
+      const stream = c.res.clone().body
+      if (!stream) {
+        return
+      }
+
+      const hash = await customGenerator(stream)
+      c.res.headers.set('ETag', `"${hash}"`)
+    })
+
+    app.get('/etag/stream1', (c) => {
+      return c.body(
+        new ReadableStream({
+          start(controller) {
+            controller.enqueue(new TextEncoder().encode('Hello '))
+            controller.enqueue(new TextEncoder().encode('World'))
+            controller.close()
+          },
+        })
+      )
+    })
+
+    app.get('/etag/stream2', (c) => {
+      return c.body(
+        new ReadableStream({
+          start(controller) {
+            controller.enqueue(new TextEncoder().encode('Hello'))
+            controller.enqueue(new TextEncoder().encode(' '))
+            controller.enqueue(new TextEncoder().encode('World'))
+            controller.close()
+          },
+        })
+      )
+    })
+
+    app.get('/etag/text', (c) => {
+      return c.text('Hello World')
+    })
+
+    for (const path of ['/etag/stream1', '/etag/stream2', '/etag/text']) {
+      const res = await app.request(path)
+      expect(res.status).toBe(200)
+      expect(res.headers.get('ETag')).toBe('"0a4d55a8d778e5022fab701977c5d840bbc486d0"')
+    }
+  })
 })

What do you think?

I think this PR approach is fine with a simple API. However, I am a little concerned about the fact that it involves going through a loop of repeatedly applying a hash function, which is being done because (the code I wrote) crypto.subtle does not support incremental generation.

for (;;) {
const { value, done } = await reader.read()
if (done) {
break
}
result = await crypto.subtle.digest(
{
name: 'SHA-1',
},
mergeBuffers(result, value)
)
}

@EdamAme-x
Copy link
Contributor Author

EdamAme-x commented Jan 25, 2025

Hmmm...
Either one sounds good.

@EdamAme-x
Copy link
Contributor Author

I think to provide a simple api is fine in this context.

If the user is concerned about maximizing performance, it may be better to have them consider creating their own.

@usualoma
Copy link
Member

@EdamAme-x Thank you for your response.

What kind of use case is this function actually used for?

Since etag does not require a cryptographic hash function, I think the use case is ‘I want to make it faster to generate.’

However, in the first place, in cases where speed is an issue, ‘loading the entire response body and calculating the hash value at the time of response’ is not a good approach, so I think it is difficult to aim for ultra-fast with the current hono etag design.
If speed is essential, I think it is better to use the last modified date of the resource or calculate it in advance.
https://httpd.apache.org/docs/2.4/mod/core.html#fileetag

There may be cases where you want to replace the hash generation function to speed up the process while keeping the design of the hono etag, but if you go through the loop of non-incremental generation, it can't be said to be ultra fast. I feel that this would never be used in production and would be nothing more than a ‘try’.

I have a question about how much demand there is for ‘replacing the hash function to speed up hono's etag’. If it's not that much, I think it's not necessary to add the function, and it would be enough to guide people to the following approach in the documentation, etc.

app.use('/etag/*', etag(), async (c, next) => {
  await next()
  const etag = await customETagGenerator(c.res.clone().body)
  c.res.headers.set('ETag', `"${etag}"`)
})

What do you think, @yusukebe?

@usualoma
Copy link
Member

If we want to neatly organize it as an API, I think there are also ways to introduce APIs like Cloudflare's crypto.DigestStream. However, I don't think we have the motivation to do that.

https://developers.cloudflare.com/workers/runtime-apis/web-crypto/#constructors

@EdamAme-x
Copy link
Contributor Author

EdamAme-x commented Jan 26, 2025

That is certainly true.
I considered designing a system that receives a ReadbleStream and performs incremental generation, but it would be too complicated.

@Gobd What do you think?

@Gobd
Copy link

Gobd commented Jan 26, 2025

After filing that issue what I ended up doing was copying the middleware code into a pre-existing wrapper function gets the body response object, JSON stringifies it, then runs Bun.hash.wyhash(body).toString(16) and uses c.body to send the response. Based on my testing that is faster than what was happening here but I am not a JS dev so I'm not sure. In reality I think anyone going for performance would probably do something similar and actually look at the middleware code, see how simple it is, and copy/paste if and modify it for theit needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for custom hashing methods to be used to etag
3 participants