Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exist data loss when flume-ng-redis stop or restart #13

Open
lovemelovemycode opened this issue Jun 9, 2015 · 5 comments
Open

Exist data loss when flume-ng-redis stop or restart #13

lovemelovemycode opened this issue Jun 9, 2015 · 5 comments

Comments

@lovemelovemycode
Copy link

publish ---->redis---->flume-ng-redis(source)
when the flume-ng-redis(source) stop for some reason,the data will lose during this period。

@chiwanpark
Copy link
Owner

Currently, there is no perfect solution for this case. Flume can be configured with multiplexing but the data will be replicated with multiplexing configuration. After implementation using Redis List structure, we can solve this problem.

@lovemelovemycode
Copy link
Author

https://github.com/fengpeiyuan/flumeng-plugins-redis
1 This example solve the problem with data lose
2 But I don`t know which is faster compared to publish/subscribe example

@chiwanpark
Copy link
Owner

Yes. Using redis list structure can solve the problem. And I just implemented the plugin using list structure also. You can use it from master branch.

You can solve the problem using multiple subscriber with pub/sub implementation. But there are duplicated records in multiple subscription. You can deal with the duplication in the stage before using the collected data as known as ETL.

I think pub/sub is faster than list but the list structure is enough in common case. I attached a article about this. (https://davidmarquis.wordpress.com/2013/01/03/reliable-delivery-message-queues-with-redis/)

@lovemelovemycode
Copy link
Author

I have tested the plugin using list structure,2M/S,11700 record/S.Maybe we shouled make it faster.
lpush---->list named event-current---->lpop
every 5 minute |
every 5 minute |
every 5 minute |
rename the event-current to event-yyyyddMMHHmm ---->lrange + del

@chiwanpark
Copy link
Owner

Hi! Thanks for your effort to test. :) But I cannot understand your suggestion perfectly.
Do you mean that processing the events in small batches should be faster? It sounds reasonable but in some cases, sending the events immediately should be better than the small batches. (Also I think using current time as a part of list name is not good idea. Using a atomic counter in Redis as a list number is better.)

I'll add this feature as a option. But I'm preparing my final exam in school now. Maybe I can implement this feature within 2-3 weeks. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants