Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using xray.SQLContext with SetMaxOpenConns/SetConnMaxLifetime sometimes "failed to begin subsegment". #391

Open
vbqz opened this issue Jan 11, 2023 · 1 comment
Assignees

Comments

@vbqz
Copy link

vbqz commented Jan 11, 2023

Hi,

My team bumped into failed to begin subsegment panics when using SQLContext together with setting limits on connection numbers and durations.

With aws-xray-sdk-go v1.8.0 the problem still occurs, but logs instead of panics (which is great!). But I wanted to check if we are misusing the SDK in anyway, and give you a heads up if there is some kind of race condition here.

Issue

When using a database connection created like:

instance, err = xray.SQLContext("postgres", url)
...
instance.SetMaxOpenConns(10)
instance.SetConnMaxLifetime(30 * time.Second)

Executing queries with instrumented contexts occasionally caused panics like:

"LOG panic: failed to begin subsegment named 'xyz@localhost': segment cannot be found."

From what I can tell, there is a small chance that database/sql will try to open a fresh conn using context.Background(), instead of using the caller context, or re-using an existing connection.

Reprod

I managed to come up with this slightly contrived program (tons of connections and very low lifetimes) that reliably triggers the failure on my machine:

package main

import (
	"context"
	"database/sql"
	"log"
	"net"
	"time"

	"github.com/aws/aws-xray-sdk-go/xray"

	// "sqlite" database driver.
	_ "modernc.org/sqlite"
)

func initDB() (*sql.DB, error) {
	db, err := xray.SQLContext("sqlite", "tmp.sqlite")
	if err != nil {
		return nil, err
	}

	db.SetMaxOpenConns(10)
	db.SetConnMaxLifetime(time.Millisecond)

	return db, nil

}

func oneRequest(db *sql.DB) error {
	var err error

	ctx, seg := xray.BeginSegment(context.Background(), "main")
	defer seg.Close(err)

	_, err = db.ExecContext(ctx, "select sqlite_version();")
	if err != nil {
		return err
	}

	return nil
}

func xraySink() {
	pc, err := net.ListenPacket("udp", ":2000")
	if err != nil {
		log.Fatalf("listen: %s", err)
	}
	defer pc.Close()

	buf := make([]byte, 1024)
	for {
		pc.ReadFrom(buf)
	}
}

func main() {

	// Give X-ray something to log to.
	go xraySink()

	db, err := initDB()
	if err != nil {
		log.Fatalf("open DB: %s", err)
	}

	start := make(chan struct{})

	for i := 0; i < 10_000; i++ {
		go func(i int) {
			<-start
			err := oneRequest(db)
			if err != nil {
				log.Fatalf("oneRequest failed: %s", err)
			}
		}(i)
	}
	close(start)
	time.Sleep(time.Second)
}

The stacktrace at the failed subsegment is:

0  0x00000001029a3e90 in github.com/aws/aws-xray-sdk-go/xray.BeginSubsegment
   at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/segment.go:283
1  0x000000010299db84 in github.com/aws/aws-xray-sdk-go/xray.Capture
   at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/capture.go:19
2  0x00000001029ae85c in github.com/aws/aws-xray-sdk-go/xray.(*driverConnector).Connect
   at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/sql_go110.go:59
3  0x0000000102267f58 in database/sql.(*DB).openNewConnection
   at /Users/viktor/local/go/src/database/sql/sql.go:1238
4  0x0000000102267ec0 in database/sql.(*DB).connectionOpener
   at /Users/viktor/local/go/src/database/sql/sql.go:1228
5  0x0000000102265a00 in database/sql.OpenDB.func1
   at /Users/viktor/local/go/src/database/sql/sql.go:792
6  0x00000001021a47e4 in runtime.goexit
   at /Users/viktor/local/go/src/runtime/asm_arm64.s:1172

So its the connectionOpener goroutine that gets a request for another connection, and tries to do it with context.Background().

@vbqz vbqz changed the title Using xray.SQLContext with SetMaxOpenConns/SetConnMaxLifetime can initialize connections without a segment. Using xray.SQLContext with SetMaxOpenConns/SetConnMaxLifetime sometimes "failed to begin subsegment". Jan 11, 2023
@wangzlei
Copy link
Contributor

_, err = db.ExecContext(ctx, "select sqlite_version();")

If db.ExecContext is an async method, xray sdk would create subsegment after the parent segment closed, then cause xray context missing panic.

Alternatively we recommend user move to opentelemetry, that SDK supports asynchronous better, has no context missing error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants