-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add log abstraction to store records in disk #15
base: master
Are you sure you want to change the base?
Conversation
import "encoding/json" | ||
|
||
type Record struct { | ||
Rev int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to have rev number persisted on disk.
} | ||
|
||
func newEncoder(w io.Writer) *encoder { | ||
return &encoder{bufio.NewWriter(w)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No needs to have bufio here since OS already buffers IO for you. Double buffer is evil in database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. We need to investigate more regarding buffering behavior.
It currently uses bufio to write a single request: (length | record).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer should only be done in kernel space.
In kernel space, OS's page cache will buffer any incoming writes until you fsync on the file or until the flusher thread kicks in. There is no point buffering them in user space unless you want to speed up the reads for buffered data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't about buffering. It's just a way to construct a message to be written into disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still construct the message with io.writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we don't buffer stuff until we know there is a performance issue without buffering? : ) The code will be much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. You can keep a downstream branch without buffer. I don't understand this part in OS so I will keep it in upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not upstream, this is just a pull request towards upstream. I'm just pointing out things that doesn't make sense in the pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a follow up on this. According to KAFKA doc, "As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure". It sounds OS page-cache reliance is a better option. And thanks for the discussion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the due diligence before adding a feature :)
To clarify, most of stuff we were discussing here is buffering instead of caching. Caching is a super set of buffering in the sense that you can read from a cache but cannot read from a bufio writer.
bea727b
to
5fc79ea
Compare
@xiangli-cmu
This adds log and message abstraction.
I haven't thought about how to batch encoding yet. It had better be considered in snapshotting.