Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempted to resurrect connection to dead ES instance, but got an error. #160

Open
xebia-progress opened this issue Jul 3, 2020 · 15 comments

Comments

@xebia-progress
Copy link

xebia-progress commented Jul 3, 2020

Hi,
I have a problem with connection to the AWS ES service from logstash installed on AWS EC2.
I have two instances [logstash01, logstash02] with the same configuration:

  • logstash 6.8.10
  • logstash-output-amazon_es (6.4.2)
  • AWS ES version 6.2

The first instance is working fine but on the second one there are many warnings:

[2020-07-03T09:52:46,924][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>https://{my-aws-es-service}.{aws-region}.es.amazonaws.com:443/, :path=>"/"} [2020-07-03T09:52:46,927][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"https://{my-aws-es-service}.{aws-region}.es.amazonaws.com:443/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '403' contacting Elasticsearch at URL 'https://{my-aws-es-service}.{aws-region}.es.amazonaws.com:443/'"}

Both EC2 instances have been assigned to the same IAM Role with full access to ES, they are both in the same VPC, subnet and their security groups have the same rules. I am able to curl ES service from both EC2s.

The configuration of amazon_es output plugin is:
output { amazon_es { hosts => ["{my-aws-es-service}.{aws-region}.es.amazonaws.com"] region => "{aws-region}" index => "logstash-%{[index]}-%{+YYYY.MM.dd}" } }

I have had this issue for couple of days and not able to resolve. Any help would be appreciated.

@riemann89
Copy link

riemann89 commented Jul 22, 2020

Having same issue, have you solved?

@xebia-progress
Copy link
Author

I created a new EC2 instance and recreated configuration on it but I still don't know why the problem was occuring on the old EC2.

@riemann89
Copy link

riemann89 commented Jul 22, 2020

Specs of the EC2 instance pls?

@xebia-progress
Copy link
Author

  • Both EC2 instances (t3.medium - CentOS Linux 7 x86_64 HVM EBS 1708_11.01) - created with Terraform
  • Logstash configuration - created with Ansible

@riemann89
Copy link

riemann89 commented Jul 22, 2020

Thanks, I am using Amazon Linux one (medium). Still need to figure out the issue.

@AdrienBigot
Copy link

Same issue for me. I'm suspecting a throttle on the elasticsearch side because this error appears near another error about the Request size exceeded .

[2020-09-03T10:18:30,104][ERROR][logstash.outputs.amazonelasticsearch][main][7fe0e0a34b3fb83c50f4196e10aea37404a5be5bdef7037eb295a4700045ac92] Encountered a retryable error. Will Retry with exponential backoff {:code=>413, :url=>"https://search-elk-nonprod-es-xxxxxxxxxxxxxxxxxxxxxxxx.eu-west-1.es.amazonaws.com:443/_bulk"} [2020-09-03T10:18:30,430][WARN ][logstash.outputs.amazonelasticsearch][main][7fe0e0a34b3fb83c50f4196e10aea37404a5be5bdef7037eb295a4700045ac92] Marking url as dead. Last error: [LogStash::Outputs::AmazonElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [https://search-elk-nonprod-esxxxxxxxxxxxxxxxxxxxxxxxx.eu-west-1.es.amazonaws.com:443/][Manticore::ClientProtocolException] search-elk-nonprod-es-xxxxxxxxxxxxxxxxxxxxxxxx.eu-west-1.es.amazonaws.com:443 failed to respond {:url=>https://search-elk-nonprod-es-xxxxxxxxxxxxxxxxxxxxxxxx.eu-west-1.es.amazonaws.com:443/, :error_message=>"Elasticsearch Unreachable: [https://search-elk-nonprod-es-xxxxxxxxxxxxxxxxxxxxxxxx.eu-west-1.es.amazonaws.com:443/][Manticore::ClientProtocolException] search-elk-nonprod-es-xxxxxxxxxxxxxxxxxxxxxxxx.eu-west-1.es.amazonaws.com:443 failed to respond", :error_class=>"LogStash::Outputs::AmazonElasticSearch::HttpClient::Pool::HostUnreachableError"}

Even with the max_bulk_bytes set to 100.000 I have these errors.

@NeckBeardPrince
Copy link

@pgs-progress Did you ever figure this out?

@xebia-progress
Copy link
Author

@NeckBeardPrince Unfortunately not. I recreated an EC2 instance and on the new one the problem disappeared.

@NeckBeardPrince
Copy link

@NeckBeardPrince Unfortunately not. I recreated an EC2 instance and on the new one the problem disappeared.

sigh not a lot of info in the error for me to troubleshoot either. Thanks.

@nicon89
Copy link

nicon89 commented Nov 13, 2020

Got same issue. Any advices?

@Rnxxx
Copy link

Rnxxx commented Nov 22, 2020

I got the same error and found out i mismatched Elasticsearch endpoint with Kibana endpoint.
To set proper Elasticsearch endpoint solved issue.
endpoint

@TechieGenie
Copy link

Anyone got the fix?

@kriss332
Copy link

kriss332 commented Jun 27, 2021

I've also got the same problem.

[2021-06-27T15:35:30,864][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"http://locahost:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [http://locahost:9200/][Manticore::ResolutionFailure] locahost"}

Whereas elasticsearch is available-

curl http://localhost:9200

{
  "name" : "ELK",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "i90k-uQlQjyERkduOLy6Jw",
  "version" : {
    "number" : "7.13.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "4d960a0733be83dd2543ca018aa4ddc42e956800",
    "build_date" : "2021-06-10T21:01:55.251515791Z",
    "build_snapshot" : false,
    "lucene_version" : "8.8.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Strangely when I try to stop logstash service, I get this problem from logstash-plain.log

[2021-06-27T15:38:21,684][WARN ][org.logstash.execution.ShutdownWatcherExt] {"inflight_count"=>0, "stalling_threads_info"=>{"other"=>[{"thread_id"=>32, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:95:in `sleep'"}, {"thread_id"=>33, "name"=>"[main]>worker1", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:95:in `sleep'"}, {"thread_id"=>34, "name"=>"[main]>worker2", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:95:in `sleep'"}, {"thread_id"=>35, "name"=>"[main]>worker3", "current_call"=>"[...]/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:95:in `sleep'"}]}}

Then I've to kill the logstash process. Although, I am able to upload a csv using a custom config file using below command-
/usr/share/logstash/bin/logstash -f paloAlto.config
Plz let me know if more logs are required.

@kriss332
Copy link

UPDATE: I had been parsing local CSV files only for triage till now, so it was first time I was getting a syslog file.
Uncommenting the line
http.port: 9200
in /etc/elasticsearch/elasticsearch.yml worked for me. Now no more logs about dead elasticsearch instance and logs are flowing in.
Additionally make sure that the line network.host: localhost is also uncommented.

@tuurek
Copy link

tuurek commented Sep 2, 2022

I got this issue as well, it was due to the user lacking privileges to the root path. By adding cluster-level monitor privileges the problem was gone. Not necessarily the same issue, but the error message was the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants