Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smithy-go's HTTP metrics are prone to data races #548

Open
alexandrupitis1 opened this issue Oct 27, 2024 · 4 comments · May be fixed by #555
Open

Smithy-go's HTTP metrics are prone to data races #548

alexandrupitis1 opened this issue Oct 27, 2024 · 4 comments · May be fixed by #555

Comments

@alexandrupitis1
Copy link

Smithy-go's HTTP metrics use httptrace.ClientTrace. To quote the docs:

ClientTrace is a set of hooks to run at various stages of an outgoing HTTP request. Any particular hook may be nil. Functions may be called concurrently from different goroutines and some may be called after the request has completed or failed.

However, current implementation is not safe to call concurrently, as there are write-write data races on various methods ( for example ConnectStart ). This in turn trips data race detection.

@abraithwaite
Copy link

Just ran into this myself:

==================
WARNING: DATA RACE
Write at 0x00c000578390 by goroutine 59:
  github.com/aws/smithy-go/transport/http.(*httpMetrics).ConnectStart()
      /Users/abraithwaite/Projects/golang/pkg/mod/github.com/aws/[email protected]/transport/http/metrics.go:122 +0x6c
  github.com/aws/smithy-go/transport/http.(*httpMetrics).ConnectStart-fm()
      <autogenerated>:1 +0x20
  net.(*sysDialer).dialSingle()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:657 +0x108
  net.(*sysDialer).dialSerial()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:635 +0x1bc
  net.(*sysDialer).dialParallel.func1()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:555 +0xb4
  net.(*sysDialer).dialParallel.gowrap4()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:581 +0x58

Previous write at 0x00c000578390 by goroutine 56:
  github.com/aws/smithy-go/transport/http.(*httpMetrics).ConnectStart()
      /Users/abraithwaite/Projects/golang/pkg/mod/github.com/aws/[email protected]/transport/http/metrics.go:122 +0x6c
  github.com/aws/smithy-go/transport/http.(*httpMetrics).ConnectStart-fm()
      <autogenerated>:1 +0x20
  net.(*sysDialer).dialSingle()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:657 +0x108
  net.(*sysDialer).dialSerial()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:635 +0x1bc
  net.(*sysDialer).dialParallel.func1()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:555 +0xb4
  net.(*sysDialer).dialParallel.gowrap2()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:570 +0x58

Goroutine 59 (running) created at:
  net.(*sysDialer).dialParallel()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:581 +0x6dc
  net.(*Dialer).DialContext()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:527 +0x878
  net.(*Dialer).DialContext-fm()
      <autogenerated>:1 +0x64
  github.com/aws/aws-sdk-go-v2/aws/transport/http.(*BuildableClient).GetTransport.defaultHTTPTransport.traceDialContext.func1()
      /Users/abraithwaite/Projects/golang/pkg/mod/github.com/aws/[email protected]/aws/transport/http/client.go:207 +0xb4
  net/http.(*Transport).dial()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1226 +0x90
  net/http.(*Transport).dialConn()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1728 +0xa3c
  net/http.(*Transport).dialConnFor()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1563 +0xb8
  net/http.(*Transport).startDialConnForLocked.func1()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1545 +0x38

Goroutine 56 (running) created at:
  net.(*sysDialer).dialParallel()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:570 +0x310
  net.(*Dialer).DialContext()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/dial.go:527 +0x878
  net.(*Dialer).DialContext-fm()
      <autogenerated>:1 +0x64
  github.com/aws/aws-sdk-go-v2/aws/transport/http.(*BuildableClient).GetTransport.defaultHTTPTransport.traceDialContext.func1()
      /Users/abraithwaite/Projects/golang/pkg/mod/github.com/aws/[email protected]/aws/transport/http/client.go:207 +0xb4
  net/http.(*Transport).dial()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1226 +0x90
  net/http.(*Transport).dialConn()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1728 +0xa3c
  net/http.(*Transport).dialConnFor()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1563 +0xb8
  net/http.(*Transport).startDialConnForLocked.func1()
      /opt/homebrew/Cellar/go/1.23.3/libexec/src/net/http/transport.go:1545 +0x38
==================

@rjeczalik
Copy link

Just did a quick research, and found that this:

func (m *httpMetrics) ConnectStart(string, string) {
m.connectStart = now()
}

Can be called concurrently if transport is dual-stack:

https://github.com/golang/go/blob/6293a0683171e3a08e275884fddeaa2b57a861f6/src/internal/nettrace/nettrace.go#L36-L45

Due to this:

https://github.com/golang/go/blob/6293a0683171e3a08e275884fddeaa2b57a861f6/src/net/http/httptrace/trace.go#L43-L46

rjeczalik added a commit to rjeczalik/smithy-go that referenced this issue Dec 9, 2024
This PR fixes a race condition reported by aws#548,
by introducing a simple wrapper for time.Time.

While by default aws-sdk-go-v2 uses NopMeterProvider,
the transport/http internals are still instrumenting
the http requests (the withMetrics func).

Maybe there is a room for improvement, to not call
httptrace.WithClientTrace when the meter is nop,
however this is out of scope for this PR.
@rjeczalik rjeczalik linked a pull request Dec 9, 2024 that will close this issue
@rjeczalik
Copy link

#555

@mhr3
Copy link

mhr3 commented Jan 7, 2025

We're seeing similar issue, though a bit different caller:

WARNING: DATA RACE
Write at 0x00c0c55baee8 by goroutine 933403:
  github.com/aws/smithy-go/transport/http.(*httpMetrics).TLSHandshakeStart()
      github.com/aws/[email protected]/transport/http/metrics.go:126 +0x73
  github.com/aws/smithy-go/transport/http.(*httpMetrics).TLSHandshakeStart-fm()
      <autogenerated>:1 +0x1f
  net/http.(*persistConn).addTLS.func2()
      net/http/transport.go:1649 +0x92
<empty string>
Previous write at 0x00c0c55baee8 by goroutine 933381:
  github.com/aws/smithy-go/transport/http.(*httpMetrics).TLSHandshakeStart()
      github.com/aws/[email protected]/transport/http/metrics.go:126 +0x73
  github.com/aws/smithy-go/transport/http.(*httpMetrics).TLSHandshakeStart-fm()
      <autogenerated>:1 +0x1f
  net/http.(*persistConn).addTLS.func2()
      net/http/transport.go:1649 +0x92
<empty string>
Goroutine 933403 (running) created at:
  net/http.(*persistConn).addTLS()

as well as:

WARNING: DATA RACE
Write at 0x00c000a6ddb8 by goroutine 933323:
  github.com/aws/smithy-go/transport/http.(*httpMetrics).DNSStart()
      github.com/aws/[email protected]/transport/http/metrics.go:118 +0x70
  github.com/aws/smithy-go/transport/http.(*httpMetrics).DNSStart-fm()
      <autogenerated>:1 +0x1f
  net/http/httptrace.WithClientTrace.func1()
      net/http/httptrace/trace.go:49 +0x55
  net.(*Resolver).lookupIPAddr()
      net/lookup.go:314 +0x10a
  net.(*Resolver).internetAddrList()
      net/ipsock.go:289 +0x807
  net.(*Resolver).resolveAddrList()
      net/dial.go:302 +0x544
  net.(*Dialer).DialContext()
      net/dial.go:509 +0x726
  net.(*Dialer).DialContext-fm()
      <autogenerated>:1 +0x8f
  github.com/aws/aws-sdk-go-v2/aws/transport/http.(*BuildableClient).GetTransport.defaultHTTPTransport.traceDialContext.func1()
      github.com/aws/[email protected]/aws/transport/http/client.go:207 +0xfa
  net/http.(*Transport).dial()
      net/http/transport.go:1226 +0xef
  net/http.(*Transport).dialConn()
      net/http/transport.go:1728 +0xe04
  net/http.(*Transport).dialConnFor()
      net/http/transport.go:1563 +0x11d
  net/http.(*Transport).startDialConnForLocked.func1()
      net/http/transport.go:1545 +0x3c
<empty string>
Previous write at 0x00c000a6ddb8 by goroutine 933316:
  github.com/aws/smithy-go/transport/http.(*httpMetrics).DNSStart()
      github.com/aws/[email protected]/transport/http/metrics.go:118 +0x70
  github.com/aws/smithy-go/transport/http.(*httpMetrics).DNSStart-fm()
      <autogenerated>:1 +0x1f
  net/http/httptrace.WithClientTrace.func1()
      net/http/httptrace/trace.go:49 +0x55
  net.(*Resolver).lookupIPAddr()
      net/lookup.go:314 +0x10a
  net.(*Resolver).internetAddrList()
      net/ipsock.go:289 +0x807
  net.(*Resolver).resolveAddrList()
      net/dial.go:302 +0x544
  net.(*Dialer).DialContext()
      net/dial.go:509 +0x726
  net.(*Dialer).DialContext-fm()
      <autogenerated>:1 +0x8f
  github.com/aws/aws-sdk-go-v2/aws/transport/http.(*BuildableClient).GetTransport.defaultHTTPTransport.traceDialContext.func1()
      github.com/aws/[email protected]/aws/transport/http/client.go:207 +0xfa
  net/http.(*Transport).dial()
      net/http/transport.go:1226 +0xef
  net/http.(*Transport).dialConn()
      net/http/transport.go:1728 +0xe04
  net/http.(*Transport).dialConnFor()
      net/http/transport.go:1563 +0x11d
  net/http.(*Transport).startDialConnForLocked.func1()
      net/http/transport.go:1545 +0x3c

@lucix-aws lucix-aws linked a pull request Jan 8, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants