Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add log abstraction to store records in disk #15

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

hongchaodeng
Copy link

@xiangli-cmu
This adds log and message abstraction.
I haven't thought about how to batch encoding yet. It had better be considered in snapshotting.

import "encoding/json"

type Record struct {
Rev int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to have rev number persisted on disk.

}

func newEncoder(w io.Writer) *encoder {
return &encoder{bufio.NewWriter(w)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No needs to have bufio here since OS already buffers IO for you. Double buffer is evil in database.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. We need to investigate more regarding buffering behavior.

It currently uses bufio to write a single request: (length | record).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer should only be done in kernel space.

In kernel space, OS's page cache will buffer any incoming writes until you fsync on the file or until the flusher thread kicks in. There is no point buffering them in user space unless you want to speed up the reads for buffered data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't about buffering. It's just a way to construct a message to be written into disk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still construct the message with io.writer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we don't buffer stuff until we know there is a performance issue without buffering? : ) The code will be much simpler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. You can keep a downstream branch without buffer. I don't understand this part in OS so I will keep it in upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not upstream, this is just a pull request towards upstream. I'm just pointing out things that doesn't make sense in the pull request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a follow up on this. According to KAFKA doc, "As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure". It sounds OS page-cache reliance is a better option. And thanks for the discussion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the due diligence before adding a feature :)

To clarify, most of stuff we were discussing here is buffering instead of caching. Caching is a super set of buffering in the sense that you can read from a cache but cannot read from a bufio writer.

@hongchaodeng hongchaodeng force-pushed the master branch 3 times, most recently from bea727b to 5fc79ea Compare November 8, 2014 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants