Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use a connection pool for the multiple Producers across topics #43

Closed
Raynos opened this issue Feb 10, 2014 · 9 comments
Closed

Comments

@Raynos
Copy link
Collaborator

Raynos commented Feb 10, 2014

We want to use many small topics and the current implementation of one TCP socket per topic doesn't scale for our usage.

Is there a technical kafka related issue that stops us from implementing a Producer that uses a connection pool ?

@elee
Copy link
Collaborator

elee commented Feb 10, 2014

At work right now but two off the cuff thoughts:

  1. Each topic in Kafka creates a log file used for the append only writes. Creating many topics may be prohibitively expensive. If you're trying to control parallelism with your Kafka queue you may want to investigate using multiple partitions per topic instead.
  2. Why not create a pool of Producer objects instead? Each will claim it's own connection information and not block on .write() -- this allows you to produce messages unconstrained by blocking IO. The Kafka 0.7 Prozess doesn't use persistent connections so having a pool of connections is just a memory allocation issue.

Or maybe I miss your point entirely because I'm trying to wrangle an init.d script while answering this ticket. What's your use case?

@Raynos
Copy link
Collaborator Author

Raynos commented Feb 10, 2014

@elee one of our boxes is opening 400 sockets to kafka-leaf that's 40 workers * 10 topics. This number increased significantly today because we added 5 new topics.

We used to send all messages to one topic but that became far too CPU intensive to consume, we are now partitioning our messages by topic so that other parts of the system can consume a subset of the messages from kafka without consuming a lot of CPU. There may be a better way of doing this.

Basically the issue is that we don't see a good reason for our workers to open 10 TCP sockets to kafka for writing.

The problem currently lies here: kafka.Producer is bound to a topic which it uses to construct the proper request packet however it seems to create a new tcp connection for every topic which seems too heavy.

@elee
Copy link
Collaborator

elee commented Feb 11, 2014

@Raynos I see your issue and that's a valid concern. Right now I think I can channel @cainus and say we can't implement this feature in any timely fashion because we're not working on this driver as actively as we would like. We will label this issue as a Enhancement for now.

EDIT: err, 'Enhancement' in the Github issues parlance

@Raynos
Copy link
Collaborator Author

Raynos commented Feb 11, 2014

@elee we can make a PR on prozess to implement connection pooling of some fashion.

@cainus
Copy link
Owner

cainus commented Feb 11, 2014

@Raynos I'm sure this goes without saying, but if you're going to tackle this, it'd be nice if you can do it in a backward compatible fashion, if possible. Thanks in advance if you get this working though! Give us a shout if you want to talk about specifics before you get to a full blown PR too.

@Raynos
Copy link
Collaborator Author

Raynos commented May 2, 2014

@elee @cainus

We made a pull request for a connection cache ( #45 ).

It's backwards compatible and we are running it in production so it seems solid.

@iproctor
Copy link
Collaborator

iproctor commented May 3, 2014

See #46 instead.

@cainus
Copy link
Owner

cainus commented May 25, 2014

Closing as this was just fixed in 0.7.1.

@cainus cainus closed this as completed May 25, 2014
@Raynos
Copy link
Collaborator Author

Raynos commented May 25, 2014

nice. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants