Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race causes flaky test in "clone" #276

Closed
matiasinsaurralde opened this issue Oct 15, 2022 · 8 comments · Fixed by #393
Closed

Data race causes flaky test in "clone" #276

matiasinsaurralde opened this issue Oct 15, 2022 · 8 comments · Fixed by #393
Assignees
Labels
bug Something isn't working high priority High priority - something is broken or missing that is critical for users or developers. intermediate Will take some time to fix
Milestone

Comments

@matiasinsaurralde
Copy link
Contributor

matiasinsaurralde commented Oct 15, 2022

Describe the bug

When running clone tests in a repeated way, TestSignalKilledGoldenGate randomly fails. I believe this might be caused by data races in the clone package, as reported by go test -race ./clone:

% go test -race ./clone
==================
WARNING: DATA RACE
Write at 0x00c0000ba0f0 by goroutine 45:
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x3c0
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8

Previous write at 0x00c0000ba0f0 by goroutine 40:
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x3c0
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8

Goroutine 45 (running) created at:
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8

Goroutine 40 (finished) created at:
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
==================
==================
WARNING: DATA RACE
Read at 0x00c000268218 by goroutine 68:
  runtime.growslice()
      /usr/local/go/src/runtime/slice.go:178 +0x0
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x39c
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8

Previous write at 0x00c000268218 by goroutine 34:
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:272 +0x3c0
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8

Goroutine 68 (running) created at:
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8

Goroutine 34 (finished) created at:
  github.com/TimothyStiles/poly/clone.recurseLigate()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0x5f4
  github.com/TimothyStiles/poly/clone.recurseLigate.func2()
      /Users/matias/go/src/github.com/TimothyStiles/poly/clone/clone.go:273 +0xd8
==================
--- FAIL: TestSignalKilledGoldenGate (0.02s)
    clone_test.go:169: Should be only 4 looping sequences. Got: 7
    testing.go:1319: race detected during execution of test
FAIL
FAIL	github.com/TimothyStiles/poly/clone	1.317s
FAIL

To Reproduce
Steps to reproduce the behavior:

  1. To run tests in a repeated way: go test -count 10 -v ./clone/
  2. Then run the race detector: go test -race ./clone on the main directory.

Expected behavior
Data race should be fixed.

@Koeng101
Copy link
Contributor

I was thinking about just making it iterative, but then wouldn't almost by definition, if there is a data race that occasionally happens, it be because of the ordering of the sequences?

@Koeng101
Copy link
Contributor

This'll be in the ligation function btw, almost 100% sure.

@micrictor
Copy link

Is the expectation for a single clone site in the test based on independent analysis of the fragments, or was it the value that passed? I ask because a naive approach to avoiding the race (creating a new slice for every recursion, then writing in the values) consistently fails as it returns 2 clones.

diff --git a/clone/clone.go b/clone/clone.go
index 4e25971..0d103a2 100644
--- a/clone/clone.go
+++ b/clone/clone.go
@@ -265,8 +265,10 @@ func recurseLigate(wg *sync.WaitGroup, constructs chan string, infiniteLoopingCo
                                }
                                wg.Add(1)
                                // If everything is clear, append fragment to usedFragments and recurse.
-                               usedFragments = append(usedFragments, newFragment)
-                               go recurseLigate(wg, constructs, infiniteLoopingConstructs, newSeed, fragmentList, usedFragments)
+                               usedFragmentsCopy := make([]Fragment, len(usedFragments)+1)
+                               copy(usedFragmentsCopy, usedFragments)
+                               usedFragmentsCopy = append(usedFragmentsCopy, newFragment)
+                               go recurseLigate(wg, constructs, infiniteLoopingConstructs, newSeed, fragmentList, usedFragmentsCopy)
                        }
                }
        }

@TimothyStiles
Copy link
Collaborator

TimothyStiles commented Nov 14, 2022

^@Koeng101 do you know the answer to this question from @micrictor?

@Koeng101
Copy link
Contributor

I'm not sure I entirely understand the question - but if I'm getting it right: I passed in values for the tests that I know the output of. Is the naive version on a branch or something? I'd love to see it.

I'm also thinking this is something that needs to be fixed soon, since I'm using Ligate a lot more for my work, and the flaky-ness is causing me some problems. I'm thinking I might have to go for a full rewrite with this one...

@carreter carreter added bug Something isn't working high priority High priority - something is broken or missing that is critical for users or developers. intermediate Will take some time to fix labels Sep 16, 2023
@micrictor
Copy link

My question is whether the test results are known to be accurate chemistry, namely whether it's impossible that there's two valid clones of the input fragments.

I have the diff for the change above.

@Koeng101
Copy link
Contributor

My question is whether the test results are known to be accurate chemistry, namely whether it's impossible that there's two valid clones of the input fragments.

I have the diff for the change above.

It is possible to have many, many valid clones based on input fragments. I think I might want to change this, because it can be a little confusing and combinatorially explodes very quickly.

In particular you might have a cloning reaction where you are switching out different protein sequences (let's say, to change the color of GFP). You'd clone the same promoter+rbs+terminator, but might have a lot of variations of the CDS.

@carreter carreter added this to the v1.0 milestone Sep 23, 2023
@TwFlem
Copy link
Contributor

TwFlem commented Nov 4, 2023

@Koeng101 if I had to guess, there might be some missing or additional infinite constructs coming back depending on what happened during the race since those slices are share some of the same underlying data.

With that being said, it might be advantageous to remove the concurrency from clone to solve the issue. I opened up a draft with some measurements here: #393. Removing the concurrency makes GoldenGate consistent and maybe a good step towards the clone refactor and improving performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority High priority - something is broken or missing that is critical for users or developers. intermediate Will take some time to fix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants