Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offset management #65

Open
cddr opened this issue Sep 15, 2015 · 6 comments
Open

Offset management #65

cddr opened this issue Sep 15, 2015 · 6 comments

Comments

@cddr
Copy link

cddr commented Sep 15, 2015

Hey folks,

What are your thoughts about the new method of managing offsets in kafka. There's some documentation (in the form of example code) here...

https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka

The TLDR is that there's a quite a bit of overhead to maintaining the offset in zookeeper so there's another approach which involves writing to a topic, and keeping an in-memory cache of the current offset so that consumers with high throughput, or lots of consumers groups (or both) can still commit after processing each message rather than trying to limit the frequency of commits. Would you like clj-kafka to provide something like this?

@pingles
Copy link
Owner

pingles commented Sep 15, 2015

It's definitely interesting, although I'd probably lean to this being an
add-on lib that people could pull in, I guess as a kind of offset strategy.

Having said that, I'm not overly familiar with the development but I think
upcoming releases of Kafka will have a broker API suitable for centrally
managing offsets:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit/FetchAPI
.

Again, I'd probably err on the side of, as far as possible, letting people
choose whichever offset strategy they like.

What do you think? Would you be up for developing a clj-kafka equivalent to
the confluence code you posted?

On Tue, Sep 15, 2015 at 8:51 PM, Andy Chambers [email protected]
wrote:

Hey folks,

What are your thoughts about the new method of managing offsets in kafka.
There's some documentation (in the form of example code) here...

https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka

The TLDR is that there's a quite a bit of overhead to maintaining the
offset in zookeeper so there's another approach which involves writing to a
topic, and keeping an in-memory cache of the current offset so that
consumers with high throughput, or lots of consumers groups (or both) can
still commit after processing each message rather than trying to limit the
frequency of commits. Would you like clj-kafka to provide something like
this?


Reply to this email directly or view it on GitHub
#65.

@pingles
Copy link
Owner

pingles commented Sep 15, 2015

D'oh. I've just realised your suggestion uses the API I found :)

Haha. Yep, definitely up for adding support. I'll see if I can get some time this week to have a look, of course pull requests are always still welcome!!

@cddr
Copy link
Author

cddr commented Sep 15, 2015

Cool!

I think we will need this either way so if you don't get to it, we'll get to it soon enough. Just wanted to check before digging in. Thanks for this library. It's been working great for us so far.

@cddr
Copy link
Author

cddr commented Oct 9, 2015

Hey @pingles. Just letting you know, I probably wont get to this any time soon as my company appears to be leaning towards using samza which handles this stuff itself.

@ottbot
Copy link
Contributor

ottbot commented Oct 20, 2015

This looks like it was done in open PR #64

@pingles
Copy link
Owner

pingles commented Oct 21, 2015

Thanks for the reminder- we'll try and take a look this week for merging it in. Apologies for the delay, been busy with some other unrelated stuff at work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants