Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA configuration performs incorrectly #67

Open
wolf31o2 opened this issue Jun 12, 2018 · 4 comments
Open

HA configuration performs incorrectly #67

wolf31o2 opened this issue Jun 12, 2018 · 4 comments

Comments

@wolf31o2
Copy link

Problem:
I am running HDFS within my Mesos cluster. It is fully HA. I have configured a matcher to point to both NameNodes. However, when the first listed NameNode is in standby mode, the standby_namenode is never used.

Expected behavior:
Connection to the namenode NameNode succeeds, finds its in standby mode, and attempts to send to standby_namenode which is now the active NameNode.

Actual results:

2018-06-12 19:28:48 +0000 [warn]: #0 [out_webhdfs] webhdfs check request failed. (namenode: name-0-node.hdfs.mesos:9002, error: {"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error"}})

This is using td-agent 3.1.1 (fluentd 1.0.2) with the shipped fluent-plugin-webhdfs 1.2.2 plugin.

Forcing a NameNode failover caused logs to start flowing, again. However, this required manual intervention and I think the driver should do the correct thing in this state.

@wolf31o2
Copy link
Author

Digging around, I ran into the failures_before_use_standby setting. It looks like is_standby_exception isn't detecting the exception correctly.

@repeatedly
Copy link
Member

Here is a check:

e.is_a?(WebHDFS::IOError) && e.message.match(/org\.apache\.hadoop\.ipc\.StandbyException/)

So the problem is StandbyException happens with non WebHDFS::IOError?

@wolf31o2
Copy link
Author

Correct.

2018-07-20 18:44:27 +0000 [warn]: #0 [out_webhdfs] webhdfs check request failed. (namenode: name-1-node.hdfs.mesos:9002, error: {"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error"}})
2018-07-20 18:44:27 +0000 [warn]: #0 [out_webhdfs] failed to flush the buffer. retry_time=9 next_retry_seconds=2018-07-20 18:44:27 +0000 chunk="57170302677042b73fcd566e929f9311" error_class=WebHDFS::ServerError error="{\"RemoteException\":{\"exception\":\"ArrayIndexOutOfBoundsException\",\"javaClassName\":\"java.lang.ArrayIndexOutOfBoundsException\",\"message\":null}}"
  2018-07-20 18:44:27 +0000 [warn]: #0 suppressed same stacktrace

@wolf31o2
Copy link
Author

Versions:

fluent-plugin-webhdfs (1.2.3)
webhdfs (0.8.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants