-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[WIP] feat: support chunked add requests #1540
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my thoughts at ipfs-inactive/js-ipfs-http-client#851 (review) first, then the inline comment below.
src/http/api/routes/files.js
Outdated
|
||
const streams = [] | ||
const filesDir = tempy.directory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so that we don't forget: we should remove temporary files after successful upload, but also have a plan for what happens with temp files produced by failed/aborted ones. I don't think we can assume temp dir will be cleared by the OS.
Leaving them forever may be a self-administered DoS attack (eventually running out of disk space)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes definitely should go like this
- success adding ? delete file
- errors during session ? define a way to remove only the last chunk from the file ( or have one file per chunk ) this would allow for resumable add's
- on new session start async check the temp folder and delete all files older than 1 hour
- also we need to handle open streams, too many can become a problem
src/http/api/routes/files.js
Outdated
if (err) { | ||
pushStream.push({ | ||
Message: err.toString(), | ||
Code: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I've tracked down the source of values used in Code
field in go-lang implementation (at go-ipfs-cmdkit/error.go#L9-L25).
Translating from iota notation, it is an enum:
ErrNormal // ErrorType = 0
ErrClient // ErrorType = 1
ErrImplementation // ErrorType = 2
ErrNotFound // ErrorType = 3
ErrFatal // ErrorType = 3
Not sure if that is a problem, probably hardcoding 0
here is fine.
payload: { | ||
parse: false, | ||
output: 'stream', | ||
maxBytes: 1000 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How do we pick what is the default max chunk size? Eg. why 1GB and not 256MB ?
- Is this going to be a hardcoded limit, or a configuration option that can be passed to js-ipfs constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we pick what is the default max chunk size? Eg. why 1GB and not 256MB ?
we need this value to be high for the non chunked path
Is this going to be a hardcoded limit, or a configuration option that can be passed to js-ipfs constructor?
don't know. what you think ?
from the hapi documentation when output=stream, maxBytes doesn't matter but this doesnt seem to be the case. Needs further investigation, maybe it's a matter of upgrading hapi dunno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of anything right now.
Unless you see good use case for customizing it, lets go with hardcoded value for now.
Exposing it as a config property can be contributed later in separate PR.
d25cc7d
to
e5c1d80
Compare
@hugomrdias is this ready for review or does this need feedback for you to proceed? If yes then please in future assign me as a reviewer and remove the "[WIP]" label from the title when it's ready so I don't skim down the PR list and miss it! Thank you ❤️ |
Yes I'm gonna do that, just getting this ready for you and others. I have a couple more commits to push with tests and a big fix for a bug I found today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start 😄 🚀 thanks for working on this - I'm excited to be able to upload big data to IPFS from the browser!
Just a note while I'm thinking about it - the API will need documentation in the README for this and for bonus points an example with a big file!?
There's some review comments for specific tests but this PR needs a suite of tests to ensure the chunking works correctly. Note it doesn't have to include super big files!
Some test cases I'm interested in:
- Do the temp files get cleaned up on success/error or incomplete request (final chunk not sent)?
- Do the chunking headers get validated properly?
- Can I send a single chunk upload?
- If I make the chunk size really small does it still work?
- Does progress reporting still work?
const fs = require('fs') | ||
fs.open(file, 'r', (err, fd) => { | ||
if (err) { | ||
cb(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return
|
||
fs.fstat(fd, (err, stats) => { | ||
if (err) { | ||
cb(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return
} | ||
|
||
fs.read(fd, buffer, 0, buffer.length, stats.size - 58, function (e, l, b) { | ||
cb(null, b.toString().includes(boundary)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please handle error and use more descriptive variable names!
fs.read(fd, buffer, 0, buffer.length, stats.size - 58, function (e, l, b) { | ||
cb(null, b.toString().includes(boundary)) | ||
}) | ||
fs.close(fd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move close to read callback?
} | ||
|
||
const matchMultipartEnd = (file, boundary, cb) => { | ||
const buffer = Buffer.alloc(56) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the magic numbers here 56 & 58? Would you mind adding comments to explain or refactor to be more obvious?
reply, | ||
(err) => { | ||
if (err) { | ||
return reply(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reply
will be called twice if there's an error mid stream. Does Hapi somehow deal with this? In createMultipartReply
there's error listeners that call callback but they don't do anything to end the replyStream
. Does this leave the connection open?
I think we need a test to ensure the correct thing is being done.
parser.on('end', () => { | ||
addStream.end() | ||
}) | ||
parser.on('error', cb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an abstraction around this that allows us to just pipe it to our add stream.
Name: file.path, | ||
Hash: file.hash, | ||
Size: file.size | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be a transform, we should just be able to pipe readStream -> parser -> addStream -> replyStream
wrapWithDirectory: query['wrap-with-directory'], | ||
pin: query.pin, | ||
chunker: query.chunker | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extract this logic into a function e.g. queryToAddOptions
in http/api/resouces/files.js
and use it in add
and addExperimental
and just pass the result of calling it to this function. It'll save us having to keep track of another place where these options are constructed.
return [match[1], Number(match[2])] | ||
} | ||
|
||
const createMultipartReply = (readStream, request, reply, cb) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this should just construct and return a stream that can be passed to reply in the handler.
@hugomrdias any plans to continue this PR or shall we close? |
let it be open i would like to come back to this |
Closing as this is quite stale. |
This PR supports this ipfs-inactive/js-ipfs-http-client#851 in the ipfs-api
todo:
try not using files and add directlynext(in a new PR):
These need more work so we don't add corrupted data.