runtime: for/range with iterator results in program that never terminates #69434

vsivsi · 2024-09-12T23:40:11Z

Go version

go version go1.23.1 darwin/amd64

Output of `go env` in your module/workspace:

go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/Users/user/Library/Caches/go-build'
GOENV='/Users/user/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/user/go/pkg/mod'
GONOPROXY='github.com/vsivsi'
GONOSUMDB='github.com/vsivsi'
GOOS='darwin'
GOPATH='/Users/user/go'
GOPRIVATE='github.com/vsivsi'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/Cellar/go/1.23.1/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/Cellar/go/1.23.1/libexec/pkg/tool/darwin_amd64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/user/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/user/wordcounter/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/h5/p_k0bq5d55g3r_v_sz5ghz5h0000gn/T/go-build693439929=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Modified an existing "pull" style io.Reader loop to instead use a go 1.23 iterator, with no other changes to the following logic in the loop body.

So this:

for {
	n, err := wordReader.Read(buf)
	if err == io.EOF {
		break
	} else if err != nil {
		panic(err)
	}
	word := string(buf[:n])
		
	// rest of loop logic
}

Became this:

for word := range wordReader.All() {

 	// rest of loop logic (same as case above)

}

Where new wordReader.All() iterator implementation is based on wordReader.Read():

// All returns an iterator allowing the caller to iterate over the WordReader using for/range.
func (wr *WordReader) All() iter.Seq[string] {
	word := make([]byte, 1024)
	return func(yield func(string) bool) {
		var err error
		var n int
		for n, err = wr.Read(word); err == nil; n, err = wr.Read(word) {
			if !yield(string(word[:n])) {
				return
			}
		}
		if err != io.EOF {
			fmt.Fprintf(os.Stderr, "error reading word: %v\n", err)
		}
	}
}

A full reproduction including both the original "pull" and new "iterator" versions of the loop is available here:
https://go.dev/play/p/ENULYowKCLS

What did you see happen?

Executing with the new range/iterator loop does not terminate. In the go playground link above this manifests as the original "pull" style loop completing immediately (in EstimateUniqueWordsPull()) , and then the program times-out the playground while running EstimateUniqueWordsIter().

If run locally as a CLI, the program does not terminate.

Note! Frustratingly there are many cases that rescue this non-termination behavior. Running under a debug build or using the race detector works as expected.

As do even seemingly trivial changes to the code, e.g. in the Playground repro:

     // fmt.Println("Rounds", rounds) // ######### Uncomment line to rescue

Simply adding a single Println reverts to the expected behavior. This feels very racy to me, but there are no goroutine calls in the reproducing code, so this is a pretty clear indication that something bad is happening out of sight.

What did you expect to see?

I expect the two loops documented above as implemented in EstimateUniqueWordsPull() and EstimateUniqueWordsIter() to behave essentially identically, each running nearly instantaneously on a small test input, and the program should return a result for each and properly terminate.

Note, in the reproduction code in the playground, it would be normal for either of the above functions to return values resulting in either the "PASS" case or the "expected X, got Y" mismatch. This code implements a probabilistic counting estimation algorithm that uses randomness, as described here:
https://www.quantamagazine.org/computer-scientists-invent-an-efficient-new-way-to-count-20240516/

The text was updated successfully, but these errors were encountered:

gabyhelp · 2024-09-12T23:40:30Z

Related Issues and Documentation

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

adonovan · 2024-09-13T00:55:58Z

When I run this program it spends a lot of time in refreshCounter trying to count to three hundred billion:

		for i := 0; i < pr.n-1; i++ {
			pr.bitmask &= rand.Uint64()
		}

I don't think the problem is in the compiler or runtime.

vsivsi · 2024-09-13T02:37:00Z

@adonovan Thanks for taking a look. Good sleuthing. I agree with the symptom that you uncovered, but continue to suspect something fishy is going on here. I sincerely wish I had a simpler reproduction of this, but well, I assume that this new iterator feature is not blatantly broken... And so a simpler reproduction may not be in the cards. Anyhow, based on your clue, here is some possible firmer evidence that something is amiss:

I've added a check method to the struct holding the value of pr.n used in the loop you highlighted above. Because this issue seemingly can't be reproduced in the debugger, we'll resort to debugging with panics.

// check panics if pr.n is not the expected value
func (pr *ProbabilisticSkipper) check(n int) {
	if pr.n != n {
		panic(fmt.Sprintf("check: pr.n != n  %d != %d", pr.n, n))
	}
}

If I place a check call at the top and bottom of the range loop with the iterator, it appears that the value of pr.n is getting clobbered between the end of one iteration of the loop and the start of another, because the panic always appears to happen at the top.

So, the loop now looks like this:

wordSkipper.check(rounds)     // No panic

for word := range wordReader.All() {
	wordSkipper.check(rounds)       //  Panic always occurs here

	// ... all other loop logic...

	wordSkipper.check(rounds)      //  Never panics here
}

So the mystery for me is, why is this struct member value seemingly "randomly" changing between iterations of the "iterator" loop (but not the "pull" loop)? And why not for debug builds?

My iterator function doesn't have anything to do with this wordSkipper value, it literally makes no reference to that value or any value of its type in any way, so I don't understand how wordSkipper is getting modified by any code I can see.

Here's the new reproduction code with the above changes: https://go.dev/play/p/iEMVpdhy3go

Thanks again for taking a look. I've been programming for along time, so I've been humbled enough times that I'm certainly not beyond considering that there's some subtle issue with my code. But I have spent a bunch of time looking at this now, and boy is it not obvious what's going on here. Like I said above, this feels super racy no me, but I'm not doing anything overtly concurrent...

randall77 · 2024-09-13T03:48:41Z

This does look like a compiler bug to me. The variable wordSkipper should live in the parent frame and be captured by address in the loop body closure so the body can update it appropriately. But looking at the assembly the function body allocates its own copy of wordSkipper on the stack. It isn't at the same location as the outer one (and isn't copied back when done).

@drchase

xieyuschen · 2024-09-13T06:17:18Z

It's likely a compiler issue, but it's not the issue of range-over-func blocks forever, instead, it's caused by the generated code doesn't handle the parameter properly, which triggers a long-time running for-loop inside users code. I tried to clear the code in this go playground. Probably you can focus on this code snippet:

	if len(words) >= memorySize {
		rounds++ // generated store ssa instruction doesn't take effect, rounds value is wrong as 309237645415
		// fmt.Println(rounds) // generated load ssa instruction take effect
		wordSkipper = NewProbabilisticSkipper(rounds)
		for w := range words {
			if roundRemover.ShouldSkip() {
				delete(words, w)
			}
		}
	}

case1: use assignment of rounds rounds++ and comment the fmt.Println

	rounds++ // generated store ssa instruction doesn't take effect, ...
	// fmt.Println(rounds) // generated load ssa instruction take effect

case2: comment assignment of rounds and use fmt.Println

	// rounds++ // generated store ssa instruction doesn't take effect, ...
	fmt.Println(rounds) // generated load ssa instruction take effect

more cases:

rand.Intn(rounds) // 3. cannot work, used to check whether std lib matters
load(rounds) // 4. cannot work, when it's custom function
readInterface(rounds) // 5. work, which receives an interface 

func readInterface(r interface{}) {
	fmt.Println(r)
}
func load(r int) {
	fmt.Println(r)
	_ = r
}

The output is shown below when I use gotip run main.go on my local. Amazingly, the output of case1 is the same at the remote go playground. So I wonder is there something special for value 309237645415, do you have some ideas about it? @randall77

// output for case1
do bit operation 0 times when refresh counter in refreshCounter
do bit operation 1 times when refresh counter in refreshCounter
do bit operation 309237645415 times when refresh counter in refreshCounter

// output for case2
do bit operation 0 times when refresh counter in refreshCounter
0
Iter: expected 6 unique words, got 5

Looks like the compiler hasn't passed the allocation to the function call properly. Besides, currently if there is a load instruction and the parameter is passed as an interface before function call it works well. But the store instruction doesn't take effects. Not sure the fix is to respect the store instruction here as well.

adonovan · 2024-09-13T14:27:36Z

(Apologies for my dismissive response. If Keith says it's a compiler bug, it's a compiler bug.)

gopherbot · 2024-09-18T15:44:43Z

Change https://go.dev/cl/614096 mentions this issue: cmd/compile: fix wrong esacpe analysis for rangefunc

cuonglm · 2024-09-18T16:25:57Z

This issue does not need a backport, #69511 will do the job.

gopherbot · 2024-09-18T16:28:00Z

Change https://go.dev/cl/614195 mentions this issue: [release-branch.go1.23] cmd/compile: fix wrong esacpe analysis for rangefunc

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 12, 2024

randall77 added the release-blocker label Sep 13, 2024

dmitshur added this to the Go1.24 milestone Sep 13, 2024

timothy-king added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 13, 2024

randall77 mentioned this issue Sep 17, 2024

runtime: mysterious crashes and non-determinism with range over func #69507

Closed

cuonglm self-assigned this Sep 18, 2024

cherrymui added the NextMinor A critical problem that affects the availability or correctness of production systems built using Go label Sep 18, 2024

gopherbot closed this as completed in db40d1a Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: for/range with iterator results in program that never terminates #69434

runtime: for/range with iterator results in program that never terminates #69434

vsivsi commented Sep 12, 2024

gabyhelp commented Sep 12, 2024

adonovan commented Sep 13, 2024

vsivsi commented Sep 13, 2024

randall77 commented Sep 13, 2024

xieyuschen commented Sep 13, 2024 •

edited

Loading

adonovan commented Sep 13, 2024

gopherbot commented Sep 18, 2024

cuonglm commented Sep 18, 2024

gopherbot commented Sep 18, 2024

runtime: for/range with iterator results in program that never terminates #69434

runtime: for/range with iterator results in program that never terminates #69434

Comments

vsivsi commented Sep 12, 2024

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

gabyhelp commented Sep 12, 2024

adonovan commented Sep 13, 2024

vsivsi commented Sep 13, 2024

randall77 commented Sep 13, 2024

xieyuschen commented Sep 13, 2024 • edited Loading

adonovan commented Sep 13, 2024

gopherbot commented Sep 18, 2024

cuonglm commented Sep 18, 2024

gopherbot commented Sep 18, 2024

Output of `go env` in your module/workspace:

xieyuschen commented Sep 13, 2024 •

edited

Loading