diff --git a/CHANGELOG.md b/CHANGELOG.md index 78bfc0a..502c4e3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `fastq` now handles optionals correctly [(#323)](https://github.com/TimothyStiles/poly/issues/323) - Adds functional test and fix for [(#313)](https://github.com/TimothyStiles/poly/issues/313). - In addition to expanding the set of genbank files which can be validly parsed, the parser is more vocal when it encounters unusual syntax in the "feature" section. This "fail fast" approach is better as there were cases where inputs triggered a codepath which would neither return a valid Genbank object nor an error, and should help with debugging. +- Fixed bug that produced wrong overhang in linear, non-directional, single cut reactions. #408 ## [0.26.0] - 2023-07-22 Oops, we weren't keeping a changelog before this tag! diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 904fe22..e866e01 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -61,7 +61,7 @@ representative at an online or offline event. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at -[INSERT CONTACT METHOD]. +poly.maintainers@bebop.bio All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b29dc2a..f48cdaa 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -60,8 +60,6 @@ As one final guideline please be welcoming to newcomers and encourage new contri Unsure where to begin contributing to Poly? You can start by looking through these beginner and help-wanted issues: -[Beginner issues](https://github.com/TimothyStiles/poly/issues?q=is%3Aissue+is%3Aopen+label%3A%22beginner%22+) - issues which should only require a few lines of code, and a test or two. - [Good first issues](https://github.com/TimothyStiles/poly/contribute) - issues which are good for first time contributors. [Help wanted issues](https://github.com/TimothyStiles/poly/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22+) - issues which should be a bit more involved than beginner issues. @@ -109,14 +107,14 @@ Additionally, you may want to [install](https://golangci-lint.run/usage/install/ ### Security disclosures -If you find a security vulnerability, do NOT open an issue. I've yet to set up a security email for this so please in the interim DM me on twitter for my email [@timothystiles](https://twitter.com/TimothyStiles). +If you find a security vulnerability, do NOT open an issue. Instead, email poly-collaborators@googlegroups.com with a description of the vulnerability and we will get in contact with you ASAP. In order to determine whether you are dealing with a security issue, ask yourself these two questions: * Can I access something that's not mine, or something I shouldn't have access to? * Can I disable something for other people? -If the answer to either of those two questions are "yes", then you're probably dealing with a security issue. Note that even if you answer "no" to both questions, you may still be dealing with a security issue, so if you're unsure, just DM me [@timothystiles](https://twitter.com/TimothyStiles) for my personal email until I can set up a security related email. +If the answer to either of those two questions are "yes", then you're probably dealing with a security issue. Note that even if you answer "no" to both questions, you may still be dealing with a security issue, so if you're unsure, shoot an email to poly.maintainers@bebop.bio. ### Non-security related bugs diff --git a/README.md b/README.md index b704eea..688a734 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,10 @@ # (Poly)merase -[![PkgGoDev](https://pkg.go.dev/badge/github.com/TimothyStiles/poly)](https://pkg.go.dev/github.com/TimothyStiles/poly) -[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/TimothyStiles/poly/blob/main/LICENSE) -![Tests](https://github.com/TimothyStiles/poly/workflows/Test/badge.svg) -![Test Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/TimothyStiles/e58f265655ac0acacdd1a38376ccd32a/raw/coverage.json) +[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/koeng101/poly/blob/main/LICENSE) +![Tests](https://github.com/koeng101/poly/workflows/Test/badge.svg) +![Test Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/koeng101/e58f265655ac0acacdd1a38376ccd32a/raw/coverage.json) -Poly is a Go package for engineering organisms. +Poly is a Go package for engineering organisms. This is a fork of the main poly project incorporating more features and bug fixes. * **Fast:** Poly is fast and scalable. @@ -13,25 +12,19 @@ Poly is a Go package for engineering organisms. * **Reproducible:** Poly is well tested and designed to be used in industrial, academic, and hobbyist settings. No more copy and pasting strings into random websites to process the data you need. -* **Ambitious:** Poly's goal is to be the most complete, open, and well used collection of computational synthetic biology tools ever assembled. If you like our dream and want to support us please star this repo, request a feature, open a pull request, or [sponsor the project](https://github.com/sponsors/TimothyStiles). +* **Ambitious:** Poly's goal is to be the most complete, open, and well used collection of computational synthetic biology tools ever assembled. If you like our dream and want to support us please star this repo, request a feature, or open a pull request. ## Install -`go get github.com/TimothyStiles/poly@latest` +`go get github.com/koeng101/poly@latest` ## Documentation -* **[Library](https://pkg.go.dev/github.com/TimothyStiles/poly#pkg-examples)** +* **[Library](https://pkg.go.dev/github.com/koeng101/poly#pkg-examples)** -* **[Tutorials](https://github.com/TimothyStiles/poly/tree/main/tutorials): ([live](https://gitpod.io/#tutorial=true/https://github.com/TimothyStiles/poly) | [github](https://github.com/TimothyStiles/poly/tree/main/tutorials))** - -* **[Learning Synbio](https://github.com/TimothyStiles/how-to-synbio)** - -## Community - -* **[Discord](https://discord.gg/Hc8Ncwt):** Chat about Poly and join us for game nights on our discord server! +* **[Tutorials](https://github.com/koeng101/poly/tree/main/tutorials)** ## Contributing @@ -39,12 +32,8 @@ Poly is a Go package for engineering organisms. * **[Contributor's guide](CONTRIBUTING.md):** Please read through it before you start hacking away and pushing contributions to this fine codebase. -## Sponsor - -* **[Sponsor](https://github.com/sponsors/TimothyStiles):** 🤘 Thanks for your support 🤘 - ## License * [MIT](LICENSE) -* Copyright (c) 2023 Timothy Stiles +* Copyright (c) 2023 Keoni Gandall, Timothy Stiles diff --git a/clone/clone.go b/clone/clone.go index e2cbf41..e7f1c3e 100644 --- a/clone/clone.go +++ b/clone/clone.go @@ -46,7 +46,6 @@ import ( "regexp" "sort" "strings" - "sync" "github.com/TimothyStiles/poly/checks" "github.com/TimothyStiles/poly/seqhash" @@ -83,15 +82,27 @@ type Enzyme struct { RegexpFor *regexp.Regexp RegexpRev *regexp.Regexp Skip int - OverhangLen int + OverheadLength int RecognitionSite string } -// Eventually, we want to get the data for this map from ftp://ftp.neb.com/pub/rebase -var enzymeMap = map[string]Enzyme{ - "BsaI": {"BsaI", regexp.MustCompile("GGTCTC"), regexp.MustCompile("GAGACC"), 1, 4, "GGTCTC"}, - "BbsI": {"BbsI", regexp.MustCompile("GAAGAC"), regexp.MustCompile("GTCTTC"), 2, 4, "GAAGAC"}, - "BtgZI": {"BtgZI", regexp.MustCompile("GCGATG"), regexp.MustCompile("CATCGC"), 10, 4, "GCGATG"}, +// EnzymeManager manager for Enzymes. Allows for management of enzymes throughout the lifecyle of your +// program. EnzymeManager is not safe for concurrent use. +type EnzymeManager struct { + // enzymeMap Map of enzymes that exist for the lifetime of the manager. Not safe for concurrent use. + enzymeMap map[string]Enzyme +} + +// NewEnzymeManager creates a new EnzymeManager given some enzymes. +func NewEnzymeManager(enzymes []Enzyme) EnzymeManager { + enzymeMap := make(map[string]Enzyme) + for enzymeIndex := range enzymes { + enzymeMap[enzymes[enzymeIndex].Name] = enzymes[enzymeIndex] + } + + return EnzymeManager{ + enzymeMap: enzymeMap, + } } /****************************************************************************** @@ -100,30 +111,37 @@ Base cloning functions begin here. ******************************************************************************/ -func getBaseRestrictionEnzymes() map[string]Enzyme { - return enzymeMap -} - // CutWithEnzymeByName cuts a given sequence with an enzyme represented by the // enzyme's name. It is a convenience wrapper around CutWithEnzyme that // allows us to specify the enzyme by name. -func CutWithEnzymeByName(seq Part, directional bool, enzymeStr string) ([]Fragment, error) { - enzymeMap := getBaseRestrictionEnzymes() - if _, ok := enzymeMap[enzymeStr]; !ok { - return []Fragment{}, errors.New("Enzyme " + enzymeStr + " not found in enzymeMap") +func (enzymeManager EnzymeManager) CutWithEnzymeByName(part Part, directional bool, name string) ([]Fragment, error) { + // Get the enzyme from the enzyme map + enzyme, err := enzymeManager.GetEnzymeByName(name) + if err != nil { + // Return an error if there was an error + return []Fragment{}, err + } + // Cut the sequence with the enzyme + return CutWithEnzyme(part, directional, enzyme), nil +} + +// GetEnzymeByName gets the enzyme by it's name. If the enzyme manager does not +// contain an enzyme with the provided name, an error will be returned +func (enzymeManager EnzymeManager) GetEnzymeByName(name string) (Enzyme, error) { + if enzyme, ok := enzymeManager.enzymeMap[name]; ok { + return enzyme, nil } - enzyme := enzymeMap[enzymeStr] - return CutWithEnzyme(seq, directional, enzyme), nil + return Enzyme{}, errors.New("Enzyme " + name + " not found") } // CutWithEnzyme cuts a given sequence with an enzyme represented by an Enzyme struct. -func CutWithEnzyme(seq Part, directional bool, enzyme Enzyme) []Fragment { - var fragmentSeqs []string +func CutWithEnzyme(part Part, directional bool, enzyme Enzyme) []Fragment { + var fragmentSequences []string var sequence string - if seq.Circular { - sequence = strings.ToUpper(seq.Sequence + seq.Sequence) + if part.Circular { + sequence = strings.ToUpper(part.Sequence + part.Sequence) } else { - sequence = strings.ToUpper(seq.Sequence) + sequence = strings.ToUpper(part.Sequence) } // Check for palindromes @@ -135,20 +153,20 @@ func CutWithEnzyme(seq Part, directional bool, enzyme Enzyme) []Fragment { var reverseOverhangs []Overhang forwardCuts := enzyme.RegexpFor.FindAllStringIndex(sequence, -1) for _, forwardCut := range forwardCuts { - forwardOverhangs = append(forwardOverhangs, Overhang{Length: enzyme.OverhangLen, Position: forwardCut[1] + enzyme.Skip, Forward: true, RecognitionSitePlusSkipLength: len(enzyme.RecognitionSite) + enzyme.Skip}) + forwardOverhangs = append(forwardOverhangs, Overhang{Length: enzyme.OverheadLength, Position: forwardCut[1] + enzyme.Skip, Forward: true, RecognitionSitePlusSkipLength: len(enzyme.RecognitionSite) + enzyme.Skip}) } // Palindromic enzymes won't need reverseCuts if !palindromic { reverseCuts := enzyme.RegexpRev.FindAllStringIndex(sequence, -1) for _, reverseCut := range reverseCuts { - reverseOverhangs = append(reverseOverhangs, Overhang{Length: enzyme.OverhangLen, Position: reverseCut[0] - enzyme.Skip, Forward: false, RecognitionSitePlusSkipLength: len(enzyme.RecognitionSite) + enzyme.Skip}) + reverseOverhangs = append(reverseOverhangs, Overhang{Length: enzyme.OverheadLength, Position: reverseCut[0] - enzyme.Skip, Forward: false, RecognitionSitePlusSkipLength: len(enzyme.RecognitionSite) + enzyme.Skip}) } } - // If, on a linear sequence, the last overhang's position + EnzymeSkip + EnzymeOverhangLen is over the length of the sequence, remove that overhang. + // If, on a linear sequence, the last overhang's position + EnzymeSkip + EnzymeOverhangLength is over the length of the sequence, remove that overhang. for _, overhangSet := range [][]Overhang{forwardOverhangs, reverseOverhangs} { if len(overhangSet) > 0 { - if !seq.Circular && (overhangSet[len(overhangSet)-1].Position+enzyme.Skip+enzyme.OverhangLen > len(sequence)) { + if !part.Circular && (overhangSet[len(overhangSet)-1].Position+enzyme.Skip+enzyme.OverheadLength > len(sequence)) { overhangSet = overhangSet[:len(overhangSet)-1] } } @@ -166,26 +184,40 @@ func CutWithEnzyme(seq Part, directional bool, enzyme Enzyme) []Fragment { var nextOverhang Overhang // Linear fragments with 1 cut that are no directional will always give a // 2 fragments - if len(overhangs) == 1 && !directional && !seq.Circular { // Check the case of a single cut + if len(overhangs) == 1 && !directional && !part.Circular { // Check the case of a single cut // In the case of a single cut in a linear sequence, we get two fragments with only 1 stick end - fragmentSeq1 := sequence[overhangs[0].Position+overhangs[0].Length:] - fragmentSeq2 := sequence[:overhangs[0].Position] - overhangSeq := sequence[overhangs[0].Position : overhangs[0].Position+overhangs[0].Length] - fragments = append(fragments, Fragment{fragmentSeq1, overhangSeq, ""}) - fragments = append(fragments, Fragment{fragmentSeq2, "", overhangSeq}) + + var fragmentSequence1 string + var fragmentSequence2 string + var overhangSequence string + + if len(forwardOverhangs) > 0 { + fragmentSequence1 = sequence[overhangs[0].Position+overhangs[0].Length:] + fragmentSequence2 = sequence[:overhangs[0].Position] + overhangSequence = sequence[overhangs[0].Position : overhangs[0].Position+overhangs[0].Length] + fragments = append(fragments, Fragment{fragmentSequence1, overhangSequence, ""}) + fragments = append(fragments, Fragment{fragmentSequence2, "", overhangSequence}) + } else { + fragmentSequence1 = sequence[overhangs[0].Position:] + fragmentSequence2 = sequence[:overhangs[0].Position-overhangs[0].Length] + overhangSequence = sequence[overhangs[0].Position-overhangs[0].Length : overhangs[0].Position] + fragments = append(fragments, Fragment{fragmentSequence2, "", overhangSequence}) + fragments = append(fragments, Fragment{fragmentSequence1, overhangSequence, ""}) + } + return fragments } // Circular fragments with 1 cut will always have 2 overhangs (because of the // concat earlier). If we don't require directionality, this will always get // cut into a single fragment - if len(overhangs) == 2 && !directional && seq.Circular { + if len(overhangs) == 2 && !directional && part.Circular { // In the case of a single cut in a circular sequence, we get one fragment out with sticky overhangs - fragmentSeq1 := sequence[overhangs[0].Position+overhangs[0].Length : len(seq.Sequence)] - fragmentSeq2 := sequence[:overhangs[0].Position] - fragmentSeq := fragmentSeq1 + fragmentSeq2 - overhangSeq := sequence[overhangs[0].Position : overhangs[0].Position+overhangs[0].Length] - fragments = append(fragments, Fragment{fragmentSeq, overhangSeq, overhangSeq}) + fragmentSequence1 := sequence[overhangs[0].Position+overhangs[0].Length : len(part.Sequence)] + fragmentSequence2 := sequence[:overhangs[0].Position] + fragmentSequence := fragmentSequence1 + fragmentSequence2 + overhangSequence := sequence[overhangs[0].Position : overhangs[0].Position+overhangs[0].Length] + fragments = append(fragments, Fragment{fragmentSequence, overhangSequence, overhangSequence}) return fragments } @@ -205,28 +237,28 @@ func CutWithEnzyme(seq Part, directional bool, enzyme Enzyme) []Fragment { // the basis of GoldenGate assembly. if directional && !palindromic { if currentOverhang.Forward && !nextOverhang.Forward { - fragmentSeqs = append(fragmentSeqs, sequence[currentOverhang.Position:nextOverhang.Position]) + fragmentSequences = append(fragmentSequences, sequence[currentOverhang.Position:nextOverhang.Position]) } // We have to subtract RecognitionSitePlusSkipLength in case we have a recognition site on // one side of the origin of a circular sequence and the cut site on the other side of the origin - if nextOverhang.Position-nextOverhang.RecognitionSitePlusSkipLength > len(seq.Sequence) { + if nextOverhang.Position-nextOverhang.RecognitionSitePlusSkipLength > len(part.Sequence) { break } } else { - fragmentSeqs = append(fragmentSeqs, sequence[currentOverhang.Position:nextOverhang.Position]) - if nextOverhang.Position-nextOverhang.RecognitionSitePlusSkipLength > len(seq.Sequence) { + fragmentSequences = append(fragmentSequences, sequence[currentOverhang.Position:nextOverhang.Position]) + if nextOverhang.Position-nextOverhang.RecognitionSitePlusSkipLength > len(part.Sequence) { break } } } // Convert fragment sequences into fragments - for _, fragment := range fragmentSeqs { + for _, fragmentsequence := range fragmentSequences { // Minimum lengths (given oligos) for assembly is 8 base pairs // https://doi.org/10.1186/1756-0500-3-291 - if len(fragment) > 8 { - fragmentSequence := fragment[enzyme.OverhangLen : len(fragment)-enzyme.OverhangLen] - forwardOverhang := fragment[:enzyme.OverhangLen] - reverseOverhang := fragment[len(fragment)-enzyme.OverhangLen:] + if len(fragmentsequence) > 8 { + fragmentSequence := fragmentsequence[enzyme.OverheadLength : len(fragmentsequence)-enzyme.OverheadLength] + forwardOverhang := fragmentsequence[:enzyme.OverheadLength] + reverseOverhang := fragmentsequence[len(fragmentsequence)-enzyme.OverheadLength:] fragments = append(fragments, Fragment{Sequence: fragmentSequence, ForwardOverhang: forwardOverhang, ReverseOverhang: reverseOverhang}) } } @@ -235,94 +267,73 @@ func CutWithEnzyme(seq Part, directional bool, enzyme Enzyme) []Fragment { return fragments } -func recurseLigate(wg *sync.WaitGroup, constructs chan string, infiniteLoopingConstructs chan string, seedFragment Fragment, fragmentList []Fragment, usedFragments []Fragment) { +func recurseLigate(seedFragment Fragment, fragmentList []Fragment, usedFragments []Fragment, existingSeqhashes map[string]struct{}) (openConstructs []string, infiniteConstructs []string) { // Recurse ligate simulates all possible ligations of a series of fragments. Each possible combination begins with a "seed" that fragments from the pool can be added to. - defer wg.Done() // If the seed ligates to itself, we can call it done with a successful circularization! if seedFragment.ForwardOverhang == seedFragment.ReverseOverhang { - constructs <- seedFragment.ForwardOverhang + seedFragment.Sequence - } else { - for _, newFragment := range fragmentList { - // If the seedFragment's reverse overhang is ligates to a fragment's forward overhang, we can ligate those together and seed another ligation reaction - var newSeed Fragment - var fragmentAttached bool - if seedFragment.ReverseOverhang == newFragment.ForwardOverhang { - fragmentAttached = true - newSeed = Fragment{seedFragment.Sequence + seedFragment.ReverseOverhang + newFragment.Sequence, seedFragment.ForwardOverhang, newFragment.ReverseOverhang} - } - // This checks if we can ligate the next fragment in its reverse direction. We have to be careful though - if our seed has a palindrome, it will ligate to itself - // like [-> <- -> <- -> ...] infinitely. We check for that case here as well. - if (seedFragment.ReverseOverhang == transform.ReverseComplement(newFragment.ReverseOverhang)) && (seedFragment.ReverseOverhang != transform.ReverseComplement(seedFragment.ReverseOverhang)) { // If the second statement isn't there, program will crash on palindromes - fragmentAttached = true - newSeed = Fragment{seedFragment.Sequence + seedFragment.ReverseOverhang + transform.ReverseComplement(newFragment.Sequence), seedFragment.ForwardOverhang, transform.ReverseComplement(newFragment.ForwardOverhang)} - } - - // If fragment is actually attached, move to some checks - if fragmentAttached { - // If the newFragment's reverse complement already exists in the used fragment list, we need to cancel the recursion. - for _, usedFragment := range usedFragments { - if usedFragment.Sequence == newFragment.Sequence { - infiniteLoopingConstructs <- usedFragment.ForwardOverhang + usedFragment.Sequence + usedFragment.ReverseOverhang - return - } - } - wg.Add(1) - // If everything is clear, append fragment to usedFragments and recurse. - usedFragments = append(usedFragments, newFragment) - go recurseLigate(wg, constructs, infiniteLoopingConstructs, newSeed, fragmentList, usedFragments) - } + construct := seedFragment.ForwardOverhang + seedFragment.Sequence + seqhash, _ := seqhash.Hash(construct, "DNA", true, true) + if _, ok := existingSeqhashes[seqhash]; ok { + return nil, nil } + existingSeqhashes[seqhash] = struct{}{} + return []string{construct}, nil } -} -func getConstructs(c chan string, constructSequences chan []string, circular bool) { - var constructs []string - var exists bool - var existingSeqhashes []string - for { - construct, more := <-c - if more { - exists = false - seqhashConstruct, _ := seqhash.Hash(construct, "DNA", circular, true) - // Check if this construct is unique - for _, existingSeqhash := range existingSeqhashes { - if existingSeqhash == seqhashConstruct { - exists = true + // If the seed ligates to another fragment, we can recurse and add that fragment to the seed + for _, newFragment := range fragmentList { + // If the seedFragment's reverse overhang is ligates to a fragment's forward overhang, we can ligate those together and seed another ligation reaction + var newSeed Fragment + var fragmentAttached bool + if seedFragment.ReverseOverhang == newFragment.ForwardOverhang { + fragmentAttached = true + newSeed = Fragment{seedFragment.Sequence + seedFragment.ReverseOverhang + newFragment.Sequence, seedFragment.ForwardOverhang, newFragment.ReverseOverhang} + } + // This checks if we can ligate the next fragment in its reverse direction. We have to be careful though - if our seed has a palindrome, it will ligate to itself + // like [-> <- -> <- -> ...] infinitely. We check for that case here as well. + if (seedFragment.ReverseOverhang == transform.ReverseComplement(newFragment.ReverseOverhang)) && (seedFragment.ReverseOverhang != transform.ReverseComplement(seedFragment.ReverseOverhang)) { // If the second statement isn't there, program will crash on palindromes + fragmentAttached = true + newSeed = Fragment{seedFragment.Sequence + seedFragment.ReverseOverhang + transform.ReverseComplement(newFragment.Sequence), seedFragment.ForwardOverhang, transform.ReverseComplement(newFragment.ForwardOverhang)} + } + + // If fragment is actually attached, move to some checks + if fragmentAttached { + // If the newFragment's reverse complement already exists in the used fragment list, we need to cancel the recursion. + for _, usedFragment := range usedFragments { + if usedFragment.Sequence == newFragment.Sequence { + infiniteConstruct := usedFragment.ForwardOverhang + usedFragment.Sequence + usedFragment.ReverseOverhang + seqhash, _ := seqhash.Hash(infiniteConstruct, "DNA", false, true) + if _, ok := existingSeqhashes[seqhash]; ok { + return nil, nil + } + existingSeqhashes[seqhash] = struct{}{} + return nil, []string{infiniteConstruct} } } - if !exists { - constructs = append(constructs, construct) - existingSeqhashes = append(existingSeqhashes, seqhashConstruct) - } - } else { - constructSequences <- constructs - close(constructSequences) - return + // If everything is clear, append fragment to usedFragments and recurse. + usedFragments = append(usedFragments, newFragment) + openconstructs, infiniteconstructs := recurseLigate(newSeed, fragmentList, usedFragments, existingSeqhashes) + + openConstructs = append(openConstructs, openconstructs...) + infiniteConstructs = append(infiniteConstructs, infiniteconstructs...) } } + + return openConstructs, infiniteConstructs } // CircularLigate simulates ligation of all possible fragment combinations into circular plasmids. -func CircularLigate(fragments []Fragment) ([]string, []string, error) { - var wg sync.WaitGroup +func CircularLigate(fragments []Fragment) ([]string, []string) { var outputConstructs []string var outputInfiniteLoopingConstructs []string - constructs := make(chan string) - infiniteLoopingConstructs := make(chan string) // sometimes we will get stuck in infinite loops. These are sequences with a recursion break - constructSequences := make(chan []string) - infiniteLoopingConstructSequences := make(chan []string) + existingSeqhashes := make(map[string]struct{}) for _, fragment := range fragments { - wg.Add(1) - go recurseLigate(&wg, constructs, infiniteLoopingConstructs, fragment, fragments, []Fragment{}) + openConstructs, infiniteConstructs := recurseLigate(fragment, fragments, []Fragment{}, existingSeqhashes) + + outputConstructs = append(outputConstructs, openConstructs...) + outputInfiniteLoopingConstructs = append(outputInfiniteLoopingConstructs, infiniteConstructs...) } - go getConstructs(constructs, constructSequences, true) - go getConstructs(infiniteLoopingConstructs, infiniteLoopingConstructSequences, false) - wg.Wait() - close(constructs) - close(infiniteLoopingConstructs) - outputConstructs = <-constructSequences - outputInfiniteLoopingConstructs = <-infiniteLoopingConstructSequences - return outputConstructs, outputInfiniteLoopingConstructs, nil + return outputConstructs, outputInfiniteLoopingConstructs } /****************************************************************************** @@ -333,14 +344,21 @@ Specific cloning functions begin here. // GoldenGate simulates a GoldenGate cloning reaction. As of right now, we only // support BsaI, BbsI, BtgZI, and BsmBI. -func GoldenGate(sequences []Part, enzymeStr string) ([]string, []string, error) { +func GoldenGate(sequences []Part, cuttingEnzyme Enzyme) (openConstructs []string, infiniteLoops []string) { var fragments []Fragment for _, sequence := range sequences { - newFragments, err := CutWithEnzymeByName(sequence, true, enzymeStr) - if err != nil { - return []string{}, []string{}, err - } + newFragments := CutWithEnzyme(sequence, true, cuttingEnzyme) fragments = append(fragments, newFragments...) } - return CircularLigate(fragments) + openconstructs, infiniteloops := CircularLigate(fragments) + return openconstructs, infiniteloops +} + +// GetBaseRestrictionEnzymes return a basic slice of common enzymes used in Golden Gate Assembly. Eventually, we want to get the data for this map from ftp://ftp.neb.com/pub/rebase +func GetBaseRestrictionEnzymes() []Enzyme { + return []Enzyme{ + {"BsaI", regexp.MustCompile("GGTCTC"), regexp.MustCompile("GAGACC"), 1, 4, "GGTCTC"}, + {"BbsI", regexp.MustCompile("GAAGAC"), regexp.MustCompile("GTCTTC"), 2, 4, "GAAGAC"}, + {"BtgZI", regexp.MustCompile("GCGATG"), regexp.MustCompile("CATCGC"), 10, 4, "GCGATG"}, + } } diff --git a/clone/clone_test.go b/clone/clone_test.go index 2cfb576..ee0ca78 100644 --- a/clone/clone_test.go +++ b/clone/clone_test.go @@ -1,55 +1,53 @@ -package clone_test +package clone import ( - "fmt" "testing" - - "github.com/TimothyStiles/poly/clone" - "github.com/TimothyStiles/poly/seqhash" ) // pOpen plasmid series (https://stanford.freegenes.org/collections/open-genes/products/open-plasmids#description). I use it for essentially all my cloning. -Keoni -var popen = clone.Part{"TAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGTAgtcttcGCcatcgCtACTAAAagccagataacagtatgcgtatttgcgcgctgatttttgcggtataagaatatatactgatatgtatacccgaagtatgtcaaaaagaggtatgctatgaagcagcgtattacagtgacagttgacagcgacagctatcagttgctcaaggcatatatgatgtcaatatctccggtctggtaagcacaaccatgcagaatgaagcccgtcgtctgcgtgccgaacgctggaaagcggaaaatcaggaagggatggctgaggtcgcccggtttattgaaatgaacggctcttttgctgacgagaacagggGCTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCTGTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCATGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCGCGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCCTTATACACAGgcgatgttgaagaccaCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGG", true} +var popen = Part{"TAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGTAgtcttcGCcatcgCtACTAAAagccagataacagtatgcgtatttgcgcgctgatttttgcggtataagaatatatactgatatgtatacccgaagtatgtcaaaaagaggtatgctatgaagcagcgtattacagtgacagttgacagcgacagctatcagttgctcaaggcatatatgatgtcaatatctccggtctggtaagcacaaccatgcagaatgaagcccgtcgtctgcgtgccgaacgctggaaagcggaaaatcaggaagggatggctgaggtcgcccggtttattgaaatgaacggctcttttgctgacgagaacagggGCTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCTGTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCATGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCGCGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCCTTATACACAGgcgatgttgaagaccaCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGG", true} func TestCutWithEnzymeByName(t *testing.T) { - _, err := clone.CutWithEnzymeByName(popen, true, "EcoFake") + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) + _, err := enzymeManager.CutWithEnzymeByName(popen, true, "EcoFake") if err == nil { t.Errorf("CutWithEnzymeByName should have failed when looking for fake restriction enzyme EcoFake") } } func TestCutWithEnzyme(t *testing.T) { - var seq clone.Part + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) + var sequence Part bsai := "GGTCTCAATGC" bsaiComplement := "ATGCAGAGACC" // test(1) // Test case of `<-bsaiComplement bsai-> <-bsaiComplement bsai->` where bsaI cuts off of a linear sequence. This tests the line: - // if !seq.Circular && (overhangSet[len(overhangSet)-1].Position+enzyme.EnzymeSkip+enzyme.EnzymeOverhangLen > len(sequence)) - seq = clone.Part{"ATATATA" + bsaiComplement + bsai + "ATGCATCGATCGACTAGCATG" + bsaiComplement + bsai[:8], false} - frag, err := clone.CutWithEnzymeByName(seq, true, "BsaI") + // if !sequence.Circular && (overhangSet[len(overhangSet)-1].Position+enzyme.EnzymeSkip+enzyme.EnzymeOverhangLen > len(sequence)) + sequence = Part{"ATATATA" + bsaiComplement + bsai + "ATGCATCGATCGACTAGCATG" + bsaiComplement + bsai[:8], false} + fragment, err := enzymeManager.CutWithEnzymeByName(sequence, true, "BsaI") if err != nil { t.Errorf("CutWithEnzyme should not have failed on test(1). Got error: %s", err) } - if len(frag) != 1 { + if len(fragment) != 1 { t.Errorf("CutWithEnzyme in test(1) should be 1 fragment in length") } - if frag[0].Sequence != "ATGCATCGATCGACTAGCATG" { - t.Errorf("CutWithEnzyme in test(1) should give fragment with sequence ATGCATCGATCGACTAGCATG . Got sequence: %s", frag[0].Sequence) + if fragment[0].Sequence != "ATGCATCGATCGACTAGCATG" { + t.Errorf("CutWithEnzyme in test(1) should give fragment with sequence ATGCATCGATCGACTAGCATG . Got sequence: %s", fragment[0].Sequence) } // test(2) // Now if we take the same sequence and circularize it, we get a different result - seq.Circular = true - frag, err = clone.CutWithEnzymeByName(seq, true, "BsaI") + sequence.Circular = true + fragment, err = enzymeManager.CutWithEnzymeByName(sequence, true, "BsaI") if err != nil { t.Errorf("CutWithEnzyme should not have failed on test(2). Got error: %s", err) } - if len(frag) != 2 { + if len(fragment) != 2 { t.Errorf("CutWithEnzyme in test(2) should be 1 fragment in length") } - if frag[0].Sequence != "ATGCATCGATCGACTAGCATG" || frag[1].Sequence != "TATA" { - t.Errorf("CutWithEnzyme in test(2) should give fragment with sequence ATGCATCGATCGACTAGCATG and TATA. Got sequence: %s and %s", frag[0].Sequence, frag[1].Sequence) + if fragment[0].Sequence != "ATGCATCGATCGACTAGCATG" || fragment[1].Sequence != "TATA" { + t.Errorf("CutWithEnzyme in test(2) should give fragment with sequence ATGCATCGATCGACTAGCATG and TATA. Got sequence: %s and %s", fragment[0].Sequence, fragment[1].Sequence) } // test(3) @@ -57,56 +55,96 @@ func TestCutWithEnzyme(t *testing.T) { // different results if we have a linear or circular DNA. Since single cuts // will give no fragments if you test for directionality, we set the // directionality flag to false. This tests the line: - // if len(overhangs) == 1 && !directional && !seq.Circular - seq = clone.Part{"ATATATATATATATAT" + bsai + "GCGCGCGCGCGCGCGCGCGC", false} - frag, err = clone.CutWithEnzymeByName(seq, false, "BsaI") + // if len(overhangs) == 1 && !directional && !sequence.Circular + sequence = Part{"ATATATATATATATAT" + bsai + "GCGCGCGCGCGCGCGCGCGC", false} + fragment, err = enzymeManager.CutWithEnzymeByName(sequence, false, "BsaI") if err != nil { t.Errorf("CutWithEnzyme should not have failed on test(3). Got error: %s", err) } - if len(frag) != 2 { + if len(fragment) != 2 { t.Errorf("Cutting a linear fragment with a single cut site should give 2 fragments") } - if frag[0].Sequence != "GCGCGCGCGCGCGCGCGCGC" || frag[1].Sequence != "ATATATATATATATATGGTCTCA" { - t.Errorf("CutWithEnzyme in test(3) should give fragment with sequence GCGCGCGCGCGCGCGCGCGC and ATATATATATATATATGGTCTCA. Got sequence: %s and %s", frag[0].Sequence, frag[1].Sequence) + if fragment[0].Sequence != "GCGCGCGCGCGCGCGCGCGC" || fragment[1].Sequence != "ATATATATATATATATGGTCTCA" { + t.Errorf("CutWithEnzyme in test(3) should give fragment with sequence GCGCGCGCGCGCGCGCGCGC and ATATATATATATATATGGTCTCA. Got sequence: %s and %s", fragment[0].Sequence, fragment[1].Sequence) } // test(4) // This tests for the above except with a circular fragment. Specifically, it // tests the line: - // if len(overhangs) == 2 && !directional && seq.Circular - seq.Circular = true - frag, err = clone.CutWithEnzymeByName(seq, false, "BsaI") + // if len(overhangs) == 2 && !directional && sequence.Circular + sequence.Circular = true + fragment, err = enzymeManager.CutWithEnzymeByName(sequence, false, "BsaI") if err != nil { t.Errorf("CutWithEnzyme should not have failed on test(4). Got error: %s", err) } - if len(frag) != 1 { + if len(fragment) != 1 { t.Errorf("Cutting a circular fragment with a single cut site should give 1 fragments") } - if frag[0].Sequence != "GCGCGCGCGCGCGCGCGCGCATATATATATATATATGGTCTCA" { - t.Errorf("CutWithEnzyme in test(4) should give fragment with sequence ATATATATATATATATGGTCTCA. Got Sequence: %s", frag[0].Sequence) + if fragment[0].Sequence != "GCGCGCGCGCGCGCGCGCGCATATATATATATATATGGTCTCA" { + t.Errorf("CutWithEnzyme in test(4) should give fragment with sequence ATATATATATATATATGGTCTCA. Got Sequence: %s", fragment[0].Sequence) } // test(5) // This tests if we have a fragment where we do not care about directionality // but have more than 1 cut site in our fragment. We can use pOpen for this. - frag, err = clone.CutWithEnzymeByName(popen, false, "BbsI") + fragment, err = enzymeManager.CutWithEnzymeByName(popen, false, "BbsI") if err != nil { t.Errorf("CutWithEnzyme should not have failed on test(5). Got error: %s", err) } - if len(frag) != 2 { + if len(fragment) != 2 { t.Errorf("Cutting pOpen without a direction should yield 2 fragments") } } +func TestCutWithEnzymeRegression(t *testing.T) { + sequence := "AGCTGCTGTTTAAAGCTATTACTTTGAGACC" // this is a real sequence I came across that was causing problems + + part := Part{sequence, false} + + // get enzymes with enzyme manager + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) + bsa1, err := enzymeManager.GetEnzymeByName("BsaI") + if err != nil { + t.Errorf("Error when getting Enzyme. Got error: %s", err) + } + + // cut with BsaI + fragments := CutWithEnzyme(part, false, bsa1) + + // check that the fragments are correct + if len(fragments) != 2 { + t.Errorf("Expected 2 fragments, got: %d", len(fragments)) + } + + if fragments[0].ForwardOverhang != "" { + t.Errorf("Expected forward overhang to be empty, got: %s", fragments[1].ForwardOverhang) + } + + if fragments[0].ReverseOverhang != "ACTT" { + t.Errorf("Expected reverse overhang to be GAGT, got: %s", fragments[1].ReverseOverhang) + } + + if fragments[1].ForwardOverhang != "ACTT" { + t.Errorf("Expected forward overhang to be ACTT, got: %s", fragments[0].ForwardOverhang) + } + + if fragments[1].ReverseOverhang != "" { + t.Errorf("Expected reverse overhang to be GAGT, got: %s", fragments[0].ReverseOverhang) + } + + // assemble the fragments back together + assembly := fragments[0].Sequence + fragments[0].ReverseOverhang + fragments[1].Sequence + if assembly != sequence { + t.Errorf("Expected assembly to be %s, got: %s", sequence, assembly) + } +} + func TestCircularLigate(t *testing.T) { // The following tests for complementing overhangs. Specific, this line: // newSeed := Fragment{seedFragment.Sequence + seedFragment.ReverseOverhang + ReverseComplement(newFragment.Sequence), seedFragment.ForwardOverhang, ReverseComplement(newFragment.ForwardOverhang)} - fragment1 := clone.Fragment{"AAAAAA", "GTTG", "CTAT"} - fragment2 := clone.Fragment{"AAAAAA", "CAAC", "ATAG"} - outputConstructs, infiniteLoops, err := clone.CircularLigate([]clone.Fragment{fragment1, fragment2}) - if err != nil { - t.Errorf("Failed circular ligation with error: %s", err) - } + fragment1 := Fragment{"AAAAAA", "GTTG", "CTAT"} + fragment2 := Fragment{"AAAAAA", "CAAC", "ATAG"} + outputConstructs, infiniteLoops := CircularLigate([]Fragment{fragment1, fragment2}) if len(outputConstructs) != 1 { t.Errorf("Circular ligation with complementing overhangs should only output 1 valid rotated sequence.") } @@ -115,53 +153,39 @@ func TestCircularLigate(t *testing.T) { } } -func TestGoldenGate(t *testing.T) { - // Here we test if the enzyme we want to use in a GoldenGate reaction does not exist in our enzyme pool - fragment1 := clone.Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} - fragment2 := clone.Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} - - _, _, err := clone.GoldenGate([]clone.Part{fragment1, fragment2, popen}, "EcoRFake") +func TestEnzymeManage_GetEnzymeByName_NotFound(t *testing.T) { + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) + _, err := enzymeManager.GetEnzymeByName("EcoRFake") if err == nil { t.Errorf("GoldenGate should fail when using enzyme EcoRFake") } - if err.Error() != "Enzyme EcoRFake not found in enzymeMap" { + if err.Error() != "Enzyme EcoRFake not found" { t.Errorf("Failure of GoldenGate on incorrect enzyme should follow the exact string `Enzyme EcoRFake not found in enzymeMap`. Got: %s", err.Error()) } } -func ExampleGoldenGate() { - // Fragment 1 has a palindrome at its start. This isn't very common but - // can occur. These two fragments are real DNA fragments used in the - // FreeGenes Project. They are used because they were on my computer - // - Keoni - fragment1 := clone.Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} - fragment2 := clone.Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} - - Clones, _, _ := clone.GoldenGate([]clone.Part{fragment1, fragment2, popen}, "BbsI") - - fmt.Println(seqhash.RotateSequence(Clones[0])) - // Output: AAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAG -} - func TestSignalKilledGoldenGate(t *testing.T) { + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) // This previously would crash from using too much RAM. - frag1 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATGGAGGGTCTCAAGGTGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTTTTGCCCTGTAAACGAAAAAACCACCTGGGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag2 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATTGGGGAGGTGGTTTGATCGAAGGTTAAGTCAGTTGGGGAACTGCTTAACCGTGGTAACTGGCTTTCGCAGAGCACAGCAACCAAATCTGTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag3 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATCTGTCCTTCCAGTGTAGCCGGACTTTGGCGCACACTTCAAGAGCAACCGCGTGTTTAGCTAAACAAATCCTCTGCGAACTCCCAGTTACCTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag4 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATTACCAATGGCTGCTGCCAGTGGCGTTTTACCGTGCTTTTCCGGGTTGGACTCAAGTGAACAGTTACCGGATAAGGCGCAGCAGTCGGGCTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag5 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATGGCTGAACGGGGAGTTCTTGCTTACAGCCCAGCTTGGAGCGAACGACCTACACCGAGCCGAGATACCAGTGTGTGAGCTATGAGAAAGCGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag6 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATAGCGCCACACTTCCCGTAAGGGAGAAAGGCGGAACAGGTATCCGGTAAACGGCAGGGTCGGAACAGGAGAGCGCAAGAGGGAGCGACCCGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag7 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATCCCGCCGGAAACGGTGGGGATCTTTAAGTCCTGTCGGGTTTCGCCCGTACTGTCAGATTCATGGTTGAGCCTCACGGCTCCCACAGATGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag8 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATGATGCACCGGAAAAGCGTCTGTTTATGTGAACTCTGGCAGGAGGGCGGAGCCTATGGAAAAACGCCACCGGCGCGGCCCTGCTGTTTTGCCTCACATGTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - frag9 := clone.Part{"AAAGCACTCTTAGGCCTCTGGAAGACATATGTTAGTCCCCTGCTTATCCACGGAATCTGTGGGTAACTTTGTATGTGTCCGCAGCGCAAAAAGAGACCCGCTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} - fragments := []clone.Part{popen, frag1, frag2, frag3, frag4, frag5, frag6, frag7, frag8, frag9} - - clones, loopingClones, err := clone.GoldenGate(fragments, "BbsI") + fragment1 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATGGAGGGTCTCAAGGTGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTTTTGCCCTGTAAACGAAAAAACCACCTGGGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment2 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATTGGGGAGGTGGTTTGATCGAAGGTTAAGTCAGTTGGGGAACTGCTTAACCGTGGTAACTGGCTTTCGCAGAGCACAGCAACCAAATCTGTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment3 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATCTGTCCTTCCAGTGTAGCCGGACTTTGGCGCACACTTCAAGAGCAACCGCGTGTTTAGCTAAACAAATCCTCTGCGAACTCCCAGTTACCTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment4 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATTACCAATGGCTGCTGCCAGTGGCGTTTTACCGTGCTTTTCCGGGTTGGACTCAAGTGAACAGTTACCGGATAAGGCGCAGCAGTCGGGCTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment5 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATGGCTGAACGGGGAGTTCTTGCTTACAGCCCAGCTTGGAGCGAACGACCTACACCGAGCCGAGATACCAGTGTGTGAGCTATGAGAAAGCGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment6 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATAGCGCCACACTTCCCGTAAGGGAGAAAGGCGGAACAGGTATCCGGTAAACGGCAGGGTCGGAACAGGAGAGCGCAAGAGGGAGCGACCCGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment7 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATCCCGCCGGAAACGGTGGGGATCTTTAAGTCCTGTCGGGTTTCGCCCGTACTGTCAGATTCATGGTTGAGCCTCACGGCTCCCACAGATGTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment8 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATGATGCACCGGAAAAGCGTCTGTTTATGTGAACTCTGGCAGGAGGGCGGAGCCTATGGAAAAACGCCACCGGCGCGGCCCTGCTGTTTTGCCTCACATGTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragment9 := Part{"AAAGCACTCTTAGGCCTCTGGAAGACATATGTTAGTCCCCTGCTTATCCACGGAATCTGTGGGTAACTTTGTATGTGTCCGCAGCGCAAAAAGAGACCCGCTTAGTCTTCGCATTTCTTAATCGGTGCCC", false} + fragments := []Part{popen, fragment1, fragment2, fragment3, fragment4, fragment5, fragment6, fragment7, fragment8, fragment9} + + bbsI, err := enzymeManager.GetEnzymeByName("BbsI") if err != nil { - t.Errorf("GoldenGate should not fail with these fragments. Got error: %s", err) + t.Errorf("Error when getting Enzyme. Got error: %s", err) } + + clones, loopingClones := GoldenGate(fragments, bbsI) if len(clones) != 1 { - t.Errorf("There should be 1 output clone. Got: %d", len(clones)) + t.Errorf("There should be 1 output Got: %d", len(clones)) } // This should be changed later when we have a better way of informing user of reused overhangs if len(loopingClones) != 4 { @@ -170,27 +194,31 @@ func TestSignalKilledGoldenGate(t *testing.T) { } func TestPanicGoldenGate(t *testing.T) { + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) // This used to panic with the message: // panic: runtime error: slice bounds out of range [:-2] [recovered] // It was from the following sequence: GAAGACATAATGGTCTTC . There are 2 intercepting BbsI sites. - frag1 := clone.Part{"AAACCGGAGCCATACAGTACGAAGACATGGAGGGTCTCAAATGAAAAAAATCATCGAAACCCAGCGTGCACCGGGAGCAATCGGACCGTACGTCCAGGGAGTCGACCTAGGATCAATGTAGTCTTCGCACTTGGCTTAGATGCAAC", false} - frag2 := clone.Part{"AAACCGGAGCCATACAGTACGAAGACATAATGGTCTTCACCTCAGGACAGATCCCGGTCTGCCCGCAGACCGGAGAAATCCCGGCAGACGTCCAGGACCAGGCACGTCTATCACTAGATAGTCTTCGCACTTGGCTTAGATGCAAC", false} - frag3 := clone.Part{"AAACCGGAGCCATACAGTACGAAGACATTAGAAAACGTCAAAGCAATCGTCGTCGCAGCAGGACTATCAGTCGGAGACATCATCAAAATGACCGTCTTCATCACCGACCTAAACGACTTAGTCTTCGCACTTGGCTTAGATGCAAC", false} - frag4 := clone.Part{"AAACCGGAGCCATACAGTACGAAGACATGACTTCGCAACCATCAACGAAGTCTACAAACAGTTCTTCGACGAACACCAGGCAACCTACCCGACCCGTTCATGCGTCCAGGTCGCACGTCTACTAGTCTTCGCACTTGGCTTAGATGCAAC", false} - frag5 := clone.Part{"AAACCGGAGCCATACAGTACGAAGACATCTACCGAAAGACGTCAAACTAGAAATCGAAGCAATCGCAGTCCGTTCAGCAAGAGCTTAGAGACCCGCTTAGTCTTCGCACTTGGCTTAGATGCAAC", false} - fragments := []clone.Part{popen, frag1, frag2, frag3, frag4, frag5} - - _, _, err := clone.GoldenGate(fragments, "BbsI") + fragment1 := Part{"AAACCGGAGCCATACAGTACGAAGACATGGAGGGTCTCAAATGAAAAAAATCATCGAAACCCAGCGTGCACCGGGAGCAATCGGACCGTACGTCCAGGGAGTCGACCTAGGATCAATGTAGTCTTCGCACTTGGCTTAGATGCAAC", false} + fragment2 := Part{"AAACCGGAGCCATACAGTACGAAGACATAATGGTCTTCACCTCAGGACAGATCCCGGTCTGCCCGCAGACCGGAGAAATCCCGGCAGACGTCCAGGACCAGGCACGTCTATCACTAGATAGTCTTCGCACTTGGCTTAGATGCAAC", false} + fragment3 := Part{"AAACCGGAGCCATACAGTACGAAGACATTAGAAAACGTCAAAGCAATCGTCGTCGCAGCAGGACTATCAGTCGGAGACATCATCAAAATGACCGTCTTCATCACCGACCTAAACGACTTAGTCTTCGCACTTGGCTTAGATGCAAC", false} + fragment4 := Part{"AAACCGGAGCCATACAGTACGAAGACATGACTTCGCAACCATCAACGAAGTCTACAAACAGTTCTTCGACGAACACCAGGCAACCTACCCGACCCGTTCATGCGTCCAGGTCGCACGTCTACTAGTCTTCGCACTTGGCTTAGATGCAAC", false} + fragment5 := Part{"AAACCGGAGCCATACAGTACGAAGACATCTACCGAAAGACGTCAAACTAGAAATCGAAGCAATCGCAGTCCGTTCAGCAAGAGCTTAGAGACCCGCTTAGTCTTCGCACTTGGCTTAGATGCAAC", false} + fragments := []Part{popen, fragment1, fragment2, fragment3, fragment4, fragment5} + + bbsI, err := enzymeManager.GetEnzymeByName("BbsI") if err != nil { - t.Errorf("GoldenGate should not fail with these fragments. Got error: %s", err) + t.Errorf("Error when getting Enzyme. Got error: %s", err) } + + _, _ = GoldenGate(fragments, bbsI) } func TestCircularCutRegression(t *testing.T) { + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) // This used to error with 0 fragments since the BsaI cut site is on the other // side of the origin from its recognition site. - plasmid1 := clone.Part{"AAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCCGAGaccaagtcgcggccgcgaggtgtcaatcgtcggagtagggataacagggtaatccgctgagcaataactagcataaccccttggggcctctaaacgggtcttgaggggttttttgcatggtcatagctgtttcctgttacgccccgccctgccactcgtcgcagtactgttgtaattcattaagcattctgccgacatggaagccatcacaaacggcatgatgaacctgaatcgccagcggcatcagcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatcaaaactggtgaaactcacccagggattggctgacacgaaaaacatattctcaataaaccctttagggaaataggccaggttttcaccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagggatgaaaacgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatatcaccagctcaccatccttcattgccatacgaaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggtctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttctttacgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgataactcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcatttccataggctccgcccccctgacgagcatcacaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactataaagataccaggcgtttccccctggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccggatacctgtccgcctttctcccttcgggaagcgtggcgctttctcatagctcacgctgtaggtatctcagttcggtgtaggtcgttcgctccaagctgggctgtgtgcacgaaccccccgttcagcccgaccgctgcgccttatccggtaactatcgtcttgagtccaacccggtaagacacgacttatcgccactggcagcagccactggtaacaggattagcagagcgaggtatgtaggcggtgctacagagttcttgaagtggtggcctaactacggctacactagaaggacagtatttggtatctgcgctctgctgaagccagttaccttcggaaaaagagttggtagctcttgatccggcaaacaaaccaccgctggtagcggtggtttttttgtttgcaagcagcagattacgcgcagaaaaaaaggatctcaagtaaaacgacggccagtagtcaaaagcctccgaccggaggcttttgacttggttcaggtggagtggcggccgcgacttgGTCTC", true} - newFragments, err := clone.CutWithEnzymeByName(plasmid1, true, "BsaI") + plasmid1 := Part{"AAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCCGAGaccaagtcgcggccgcgaggtgtcaatcgtcggagtagggataacagggtaatccgctgagcaataactagcataaccccttggggcctctaaacgggtcttgaggggttttttgcatggtcatagctgtttcctgttacgccccgccctgccactcgtcgcagtactgttgtaattcattaagcattctgccgacatggaagccatcacaaacggcatgatgaacctgaatcgccagcggcatcagcaccttgtcgccttgcgtataatatttgcccatggtgaaaacgggggcgaagaagttgtccatattggccacgtttaaatcaaaactggtgaaactcacccagggattggctgacacgaaaaacatattctcaataaaccctttagggaaataggccaggttttcaccgtaacacgccacatcttgcgaatatatgtgtagaaactgccggaaatcgtcgtggtattcactccagagggatgaaaacgtttcagtttgctcatggaaaacggtgtaacaagggtgaacactatcccatatcaccagctcaccatccttcattgccatacgaaattccggatgagcattcatcaggcgggcaagaatgtgaataaaggccggataaaacttgtgcttatttttctttacggtctttaaaaaggccgtaatatccagctgaacggtctggttataggtacattgagcaactgactgaaatgcctcaaaatgttctttacgatgccattgggatatatcaacggtggtatatccagtgatttttttctccattttagcttccttagctcctgaaaatctcgataactcaaaaaatacgcccggtagtgatcttatttcattatggtgaaagttggaacctcttacgtgccgatcatttccataggctccgcccccctgacgagcatcacaaaaatcgacgctcaagtcagaggtggcgaaacccgacaggactataaagataccaggcgtttccccctggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccggatacctgtccgcctttctcccttcgggaagcgtggcgctttctcatagctcacgctgtaggtatctcagttcggtgtaggtcgttcgctccaagctgggctgtgtgcacgaaccccccgttcagcccgaccgctgcgccttatccggtaactatcgtcttgagtccaacccggtaagacacgacttatcgccactggcagcagccactggtaacaggattagcagagcgaggtatgtaggcggtgctacagagttcttgaagtggtggcctaactacggctacactagaaggacagtatttggtatctgcgctctgctgaagccagttaccttcggaaaaagagttggtagctcttgatccggcaaacaaaccaccgctggtagcggtggtttttttgtttgcaagcagcagattacgcgcagaaaaaaaggatctcaagtaaaacgacggccagtagtcaaaagcctccgaccggaggcttttgacttggttcaggtggagtggcggccgcgacttgGTCTC", true} + newFragments, err := enzymeManager.CutWithEnzymeByName(plasmid1, true, "BsaI") if err != nil { t.Errorf("Failed to cut: %s", err) } @@ -198,3 +226,21 @@ func TestCircularCutRegression(t *testing.T) { t.Errorf("Expected 1 new fragment, got: %d", len(newFragments)) } } + +func benchmarkGoldenGate(b *testing.B, enzymeManager EnzymeManager, parts []Part) { + bbsI, err := enzymeManager.GetEnzymeByName("BbsI") + if err != nil { + b.Errorf("Error when getting Enzyme. Got error: %s", err) + } + for n := 0; n < b.N; n++ { + _, _ = GoldenGate(parts, bbsI) + } +} + +func BenchmarkGoldenGate3Parts(b *testing.B) { + enzymeManager := NewEnzymeManager(GetBaseRestrictionEnzymes()) + fragment1 := Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} + fragment2 := Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} + + benchmarkGoldenGate(b, enzymeManager, []Part{fragment1, fragment2, popen}) +} diff --git a/clone/example_test.go b/clone/example_test.go new file mode 100644 index 0000000..85d12dc --- /dev/null +++ b/clone/example_test.go @@ -0,0 +1,31 @@ +package clone_test + +import ( + "fmt" + "log" + + "github.com/TimothyStiles/poly/clone" + "github.com/TimothyStiles/poly/seqhash" +) + +func ExampleGoldenGate() { + enzymeManager := clone.NewEnzymeManager(clone.GetBaseRestrictionEnzymes()) + // Fragment 1 has a palindrome at its start. This isn't very common but + // can occur. These two fragments are real DNA fragments used in the + // FreeGenes Project. They are used because they were on my computer + // - Keoni + fragment1 := clone.Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} + fragment2 := clone.Part{"GAAGTGCCATTCCGCCTGACCTGAAGACCAGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGCGTCTTCAGGCTAGGTGGAGGCTCAGTG", false} + + // pOpen plasmid series (https://stanford.freegenes.org/collections/open-genes/products/open-plasmids#description). I use it for essentially all my cloning. -Keoni + var popen = clone.Part{"TAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGTAgtcttcGCcatcgCtACTAAAagccagataacagtatgcgtatttgcgcgctgatttttgcggtataagaatatatactgatatgtatacccgaagtatgtcaaaaagaggtatgctatgaagcagcgtattacagtgacagttgacagcgacagctatcagttgctcaaggcatatatgatgtcaatatctccggtctggtaagcacaaccatgcagaatgaagcccgtcgtctgcgtgccgaacgctggaaagcggaaaatcaggaagggatggctgaggtcgcccggtttattgaaatgaacggctcttttgctgacgagaacagggGCTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCTGTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCATGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCGCGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCCTTATACACAGgcgatgttgaagaccaCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGG", true} + + bbsI, err := enzymeManager.GetEnzymeByName("BbsI") + if err != nil { + log.Fatalf("Something went wrong when trying to get the enzyme. Got error: %s", err) + } + Clones, _ := clone.GoldenGate([]clone.Part{fragment1, fragment2, popen}, bbsI) + + fmt.Println(seqhash.RotateSequence(Clones[0])) + // Output: AAAAAAAGGATCTCAAGAAGGCCTACTATTAGCAACAACGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGAACCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACCTGCACCAGTCAGTAAAACGACGGCCAGTAGTCAAAAGCCTCCGACCGGAGGCTTTTGACTTGGTTCAGGTGGAGTGGGAGAAACACGTGGCAAACATTCCGGTCTCAAATGGAAAAGAGCAACGAAACCAACGGCTACCTTGACAGCGCTCAAGCCGGCCCTGCAGCTGGCCCGGGCGCTCCGGGTACCGCCGCGGGTCGTGCACGTCGTTGCGCGGGCTTCCTGCGGCGCCAAGCGCTGGTGCTGCTCACGGTGTCTGGTGTTCTGGCAGGCGCCGGTTTGGGCGCGGCACTGCGTGGGCTCAGCCTGAGCCGCACCCAGGTCACCTACCTGGCCTTCCCCGGCGAGATGCTGCTCCGCATGCTGCGCATGATCATCCTGCCGCTGGTGGTCTGCAGCCTGGTGTCGGGCGCCGCCTCCCTCGATGCCAGCTGCCTCGGGCGTCTGGGCGGTATCGCTGTCGCCTACTTTGGCCTCACCACACTGAGTGCCTCGGCGCTCGCCGTGGCCTTGGCGTTCATCATCAAGCCAGGATCCGGTGCGCAGACCCTTCAGTCCAGCGACCTGGGGCTGGAGGACTCGGGGCCTCCTCCTGTCCCCAAAGAAACGGTGGACTCTTTCCTCGACCTGGCCAGAAACCTGTTTCCCTCCAATCTTGTGGTTGCAGCTTTCCGTACGTATGCAACCGATTATAAAGTCGTGACCCAGAACAGCAGCTCTGGAAATGTAACCCATGAAAAGATCCCCATAGGCACTGAGATAGAAGGGATGAACATTTTAGGATTGGTCCTGTTTGCTCTGGTGTTAGGAGTGGCCTTAAAGAAACTAGGCTCCGAAGGAGAGGACCTCATCCGTTTCTTCAATTCCCTCAACGAGGCGACGATGGTGCTGGTGTCCTGGATTATGTGGTACGTACCTGTGGGCATCATGTTCCTTGTTGGAAGCAAGATCGTGGAAATGAAAGACATCATCGTGCTGGTGACCAGCCTGGGGAAATACATCTTCGCATCTATATTGGGCCACGTCATTCATGGTGGTATCGTCCTGCCGCTGATTTATTTTGTTTTCACACGAAAAAACCCATTCAGATTCCTCCTGGGCCTCCTCGCCCCATTTGCGACAGCATTTGCTACGTGCTCCAGCTCAGCGACCCTTCCCTCTATGATGAAGTGCATTGAAGAGAACAATGGTGTGGACAAGAGGATCTCCAGGTTTATTCTCCCCATCGGGGCCACCGTGAACATGGACGGAGCAGCCATCTTCCAGTGTGTGGCCGCGGTGTTCATTGCGCAACTCAACAACGTAGAGCTCAACGCAGGACAGATTTTCACCATTCTAGTGACTGCCACAGCGTCCAGTGTTGGAGCAGCAGGCGTGCCAGCTGGAGGGGTCCTCACCATTGCCATTATCCTGGAGGCCATTGGGCTGCCTACTCATGATCTGCCTCTGATCCTGGCTGTGGACTGGATTGTGGACCGGACCACCACGGTGGTGAATGTGGAAGGGGATGCCCTGGGTGCAGGCATTCTCCACCACCTGAATCAGAAGGCAACAAAGAAAGGCGAGCAGGAACTTGCTGAGGTGAAAGTGGAAGCCATCCCCAACTGCAAGTCTGAGGAGGAAACCTCGCCCCTGGTGACACACCAGAACCCCGCTGGCCCCGTGGCCAGTGCCCCAGAACTGGAATCCAAGGAGTCGGTTCTGTGAAGAGCTTAGAGACCGACGACTGCCTAAGGACATTCGCTGAGGTGTCAATCGTCGGAGCCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCATGGTCATAGCTGTTTCCTGAGAGCTTGGCAGGTGATGACACACATTAACAAATTTCGTGAGGAGTCTCCAGAAGAATGCCATTAATTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAG +} diff --git a/data/phix174.gb b/data/phix174.gb new file mode 100644 index 0000000..9fa1ed2 --- /dev/null +++ b/data/phix174.gb @@ -0,0 +1,267 @@ +LOCUS CP004084 5386 bp DNA circular PHG 04-MAR-2015 +DEFINITION Enterobacteria phage phiX174, complete genome. +ACCESSION CP004084 +VERSION CP004084.1 +DBLINK BioProject: PRJNA182589 + BioSample: SAMN03379850 +KEYWORDS . +SOURCE Escherichia virus phiX174 + ORGANISM Escherichia virus phiX174 + Viruses; Monodnaviria; Sangervirae; Phixviricota; + Malgrandaviricetes; Petitvirales; Microviridae; Bullavirinae; + Sinsheimervirus. +REFERENCE 1 (bases 1 to 5386) + AUTHORS Tian,B. and Moran,N.A. + TITLE Direct Submission + JOURNAL Submitted (21-JAN-2013) Department of Integrative Biology, + University of Texas, 2506 Speedway A5000, Austin, TX 78712, USA +COMMENT Source DNA/bacteria from Nancy Moran, University of Texas at + Austin. +FEATURES Location/Qualifiers + source 1..5386 + /organism="Escherichia virus phiX174" + /mol_type="genomic DNA" + /strain="bta3-1" + /isolation_source="honeybee gut" + /host="Enterobacteriaceae bacterium bta3-1" + /db_xref="taxon:10847" + /country="USA" + gene join(3981..5386,1..136) + /locus_tag="F652_4273" + CDS join(3981..5386,1..136) + /locus_tag="F652_4273" + /note="DNA replication initiation protein" + /codon_start=1 + /transl_table=11 + /product="protein A" + /protein_id="AJR02264.1" + /translation="MVRSYYPSECHADYFDFERIEALKPAIEACGISTLSQSPMLGFH + KQMDNRIKLLEEILSFRMQGVEFDNGDMYVDGHKAASDVRDEFVSVTEKLMDELAQCY + NVLPQLDINNTIDHRPEGDEKWFLENEKTVTQFCRKLAAERPLKDIRDEYNYPKKKGI + KDECSRLLEASTMKSRRGFAIQRLMNAMRQAHADGWFIVFDTLTLADDRLEAFYDNPN + ALRDYFRDIGRMVLAAEGRKANDSHADCYQYFCVPEYGTANGRLHFHAVHFMRTLPTG + SVDPNFGRRVRNRRQLNSLQNTWPYGYSMPIAVRYTQDAFSRSGWLWPVDAKGEPLKA + TSYMAVGFYVAKYVNKKSDMDLAAKGLGAKEWNNSLKTKLSLLPKKLFRIRMSRNFGM + KMLTMTNLSTECLIQLTKLGYDATPFNQILKQNAKREMRLRLGKVTVADVLAAQPVTT + NLLKFMRASIKMIGVSNLQSFIASMTQKLTLSDISDESKNYLDKAGITTACLRIKSKW + TAGGK" + gene join(4497..5386,1..136) + /locus_tag="F652_4274" + CDS join(4497..5386,1..136) + /locus_tag="F652_4274" + /note="replication initiation protein" + /codon_start=1 + /transl_table=11 + /product="protein A*" + /protein_id="AJR02262.1" + /translation="MKSRRGFAIQRLMNAMRQAHADGWFIVFDTLTLADDRLEAFYDN + PNALRDYFRDIGRMVLAAEGRKANDSHADCYQYFCVPEYGTANGRLHFHAVHFMRTLP + TGSVDPNFGRRVRNRRQLNSLQNTWPYGYSMPIAVRYTQDAFSRSGWLWPVDAKGEPL + KATSYMAVGFYVAKYVNKKSDMDLAAKGLGAKEWNNSLKTKLSLLPKKLFRIRMSRNF + GMKMLTMTNLSTECLIQLTKLGYDATPFNQILKQNAKREMRLRLGKVTVADVLAAQPV + TTNLLKFMRASIKMIGVSNLQSFIASMTQKLTLSDISDESKNYLDKAGITTACLRIKS + KWTAGGK" + gene join(5075..5386,1..51) + /locus_tag="F652_4275" + CDS join(5075..5386,1..51) + /locus_tag="F652_4275" + /note="internal scaffolding protein" + /codon_start=1 + /transl_table=11 + /product="protein B" + /protein_id="AJR02263.1" + /translation="MEQLTKNQAVATSQEAVQNQNEPQLRDENAHNDKSVHGVLNPTY + QAGLRRDAVQPDIEAERKKRDEIEAGKSYCSRRFGGATCDDKSAQIYARFDKNDWRIQ + PAEFYRFHDAEVNTFGYF" + gene 51..221 + /locus_tag="F652_4278" + CDS 51..221 + /locus_tag="F652_4278" + /note="unknown protein" + /codon_start=1 + /transl_table=11 + /product="protein K" + /protein_id="AJR02265.1" + /translation="MSRKIILIKQELLLLVYELNRSGLLAENEKIRPILAQLEKLLLC + DLSPSTNDSVKN" + gene 133..393 + /locus_tag="F652_4283" + CDS 133..393 + /locus_tag="F652_4283" + /note="DNA maturation" + /codon_start=1 + /transl_table=11 + /product="protein C" + /protein_id="AJR02266.1" + /translation="MRKFDLSLRSSRSSYFATFRHQLTILSKTDALDEEKWLNMLGTF + VKDWFRYESHFVHGRDSLVDILKERGLLSESDAVQPLIGKKS" + gene 390..848 + /locus_tag="F652_4288" + CDS 390..848 + /locus_tag="F652_4288" + /note="external scaffolding protein" + /codon_start=1 + /transl_table=11 + /product="protein D" + /protein_id="AJR02267.1" + /translation="MSQVTEQSVRFQTALASIKLIQASAVLDLTEDDFDFLTSNKVWI + ATDRSRARRCVEACVYGTLDFVGYPRFPAPVEFIAAVIAYYVHPVNIQTACLIMEGAE + FTENIINGVERPVKAAELFAFTLRVRAGNTDVLTDAEENVRQKLRAEGVM" + gene 643..843 + /locus_tag="F652_4293" + CDS 643..843 + /locus_tag="F652_4293" + /note="cell lysis protein" + /codon_start=1 + /transl_table=11 + /product="protein E" + /protein_id="AJR02268.1" + /translation="MFIPSTFKRPVSSWKALNLRKTLLMASSVRLKPLNCSRLPCVYA + QETLTFLLTQKKTCVKNYVQKE" + gene 848..964 + /locus_tag="F652_4298" + CDS 848..964 + /locus_tag="F652_4298" + /note="core protein" + /codon_start=1 + /transl_table=11 + /product="protein J" + /protein_id="AJR02269.1" + /translation="MSKGKKRSGARPGRPQPLRGTKGKRKGARLWYVGGQQF" + gene 1001..2284 + /locus_tag="F652_4303" + CDS 1001..2284 + /locus_tag="F652_4303" + /note="capsid protein" + /codon_start=1 + /transl_table=11 + /product="protein F" + /protein_id="AJR02270.1" + /translation="MSNIQTGAERMPHDLSHLGFLAGQIGRLITISTTPVIAGDSFEM + DAVGALRLSPLRRGLAIDSTVDIFTFYVPHRHVYGEQWIKFMKDGVNATPLPTVNTTG + YIDHAAFLGTINPDTNKIPKHLFQGYLNIYNNYFKAPWMPDRTEANPNELNQDDARYG + FRCCHLKNIWTAPLPPETELSRQMTTSTTSIDIMGLQAAYANLHTDQERDYFMQRYHD + VISSFGGKTSYDADNRPLLVMRSNLWASGYDVDGTDQTSLGQFSGRVQQTYKHSVPRF + FVPEHGTMFTLALVRFPPTATKEIQYLNAKGALTYTDIAGDPVLYGNLPPREISMKDV + FRSGDSSKKFKIAEGQWYRYAPSYVSPAYHLLEGFPFIQEPPSGDLQERVLIRHHDYD + QCFQSVQLLQWNSQVKFNVTVYRNLPTTRDSIMTS" + gene 2395..2922 + /locus_tag="F652_4308" + CDS 2395..2922 + /locus_tag="F652_4308" + /note="major spike protein" + /codon_start=1 + /transl_table=11 + /product="protein G" + /protein_id="AJR02271.1" + /translation="MFQTFISRHNSNFFSDKLVLTSVTPASSAPVLQTPKATSSTLYF + DSLTVNAGNGGFLHCIQMDTSVNAANQVVSVGADIAFDADPKFFACLVRFESSSVPTT + LPTAYDVYPLDGRHDGGYYTVKDCVTIDVLPRTPGNNVYVGFMVWSNFTATKCRGLVS + LNQVIKEIICLQPLK" + gene 2931..3917 + /locus_tag="F652_4313" + CDS 2931..3917 + /locus_tag="F652_4313" + /note="minor spike protein" + /codon_start=1 + /transl_table=11 + /product="protein H" + /protein_id="AJR02272.1" + /translation="MFGAIAGGIASALAGGAMSKLFGGGQKAASGGIQGDVLATDNNT + VGMGDAGIKSAIQGSNVPNPDEAVPSFVSGAMAKAGKGLLEGTLQAGTSAVSDKLLDL + VGLGGKSAADKGKDTRDYLAAAFPELNAWERAGADASSAGMVDAGFENQKELTKMQLD + NQKEIAEMQNETQKEIAGIQSATSRQNTKDQVYAQNEMLAYQQKESTARVASIMENTN + LSKQQQVSEIMRQMLTQAQTAGQYFTNDQIKEMTRKVSAEVDLVHQQTQNQRYGSSHI + GATAKDISNVVTDAASGVVDIFHGIDKAVADTWNNFWKDGKADGIGSNLSRK" +ORIGIN + 1 gagttttatc gcttccatga cgcagaagtt aacactttcg gatatttctg atgagtcgaa + 61 aaattatctt gataaagcag gaattactac tgcttgttta cgaattaaat cgaagtggac + 121 tgctggcgga aaatgagaaa attcgaccta tccttgcgca gctcgagaag ctcttacttt + 181 gcgacctttc gccatcaact aacgattctg tcaaaaactg acgcgttgga tgaggagaag + 241 tggcttaata tgcttggcac gttcgtcaag gactggttta gatatgagtc acattttgtt + 301 catggtagag attctcttgt tgacatttta aaagagcgtg gattactatc tgagtccgat + 361 gctgttcaac cactaatagg taagaaatca tgagtcaagt tactgaacaa tccgtacgtt + 421 tccagaccgc tttggcctct attaagctca ttcaggcttc tgccgttttg gatttaaccg + 481 aagatgattt cgattttctg acgagtaaca aagtttggat tgctactgac cgctctcgtg + 541 ctcgtcgctg cgttgaggct tgcgtttatg gtacgctgga ctttgtagga taccctcgct + 601 ttcctgctcc tgttgagttt attgctgccg tcattgctta ttatgttcat cccgtcaaca + 661 ttcaaacggc ctgtctcatc atggaaggcg ctgaatttac ggaaaacatt attaatggcg + 721 tcgagcgtcc ggttaaagcc gctgaattgt tcgcgtttac cttgcgtgta cgcgcaggaa + 781 acactgacgt tcttactgac gcagaagaaa acgtgcgtca aaaattacgt gcagaaggag + 841 tgatgtaatg tctaaaggta aaaaacgttc tggcgctcgc cctggtcgtc cgcagccgtt + 901 gcgaggtact aaaggcaagc gtaaaggcgc tcgtctttgg tatgtaggtg gtcaacaatt + 961 ttaattgcag gggcttcggc cccttacttg aggataaatt atgtctaata ttcaaactgg + 1021 cgccgagcgt atgccgcatg acctttccca tcttggcttc cttgctggtc agattggtcg + 1081 tcttattacc atttcaacta ctccggttat cgctggcgac tccttcgaga tggacgccgt + 1141 tggcgctctc cgtctttctc cattgcgtcg tggccttgct attgactcta ctgtagacat + 1201 ttttactttt tatgtccctc atcgtcacgt ttatggtgaa cagtggatta agttcatgaa + 1261 ggatggtgtt aatgccactc ctctcccgac tgttaacact actggttata ttgaccatgc + 1321 cgcttttctt ggcacgatta accctgatac caataaaatc cctaagcatt tgtttcaggg + 1381 ttatttgaat atctataaca actattttaa agcgccgtgg atgcctgacc gtaccgaggc + 1441 taaccctaat gagcttaatc aagatgatgc tcgttatggt ttccgttgct gccatctcaa + 1501 aaacatttgg actgctccgc ttcctcctga gactgagctt tctcgccaaa tgacgacttc + 1561 taccacatct attgacatta tgggtctgca agctgcttat gctaatttgc atactgacca + 1621 agaacgtgat tacttcatgc agcgttacca tgatgttatt tcttcatttg gaggtaaaac + 1681 ctcttatgac gctgacaacc gtcctttact tgtcatgcgc tctaatctct gggcatctgg + 1741 ctatgatgtt gatggaactg accaaacgtc gttaggccag ttttctggtc gtgttcaaca + 1801 gacctataaa cattctgtgc cgcgtttctt tgttcctgag catggcacta tgtttactct + 1861 tgcgcttgtt cgttttccgc ctactgcgac taaagagatt cagtacctta acgctaaagg + 1921 tgctttgact tataccgata ttgctggcga ccctgttttg tatggcaact tgccgccgcg + 1981 tgaaatttct atgaaggatg ttttccgttc tggtgattcg tctaagaagt ttaagattgc + 2041 tgagggtcag tggtatcgtt atgcgccttc gtatgtttct cctgcttatc accttcttga + 2101 aggcttccca ttcattcagg aaccgccttc tggtgatttg caagaacgcg tacttattcg + 2161 ccaccatgat tatgaccagt gtttccagtc cgttcagttg ttgcagtgga atagtcaggt + 2221 taaatttaat gtgaccgttt atcgcaatct gccgaccact cgcgattcaa tcatgacttc + 2281 gtgataaaag attgagtgtg aggttataac gccgaagcgg taaaaatttt aatttttgcc + 2341 gctgaggggt tgaccaagcg aagcgcggta ggttttctgc ttaggagttt aatcatgttt + 2401 cagactttta tttctcgcca taattcaaac tttttttctg ataagctggt tctcacttct + 2461 gttactccag cttcttcggc acctgtttta cagacaccta aagctacatc gtcaacgtta + 2521 tattttgata gtttgacggt taatgctggt aatggtggtt ttcttcattg cattcagatg + 2581 gatacatctg tcaacgccgc taatcaggtt gtttctgttg gtgctgatat tgcttttgat + 2641 gccgacccta aattttttgc ctgtttggtt cgctttgagt cttcttcggt tccgactacc + 2701 ctcccgactg cctatgatgt ttatcctttg gatggtcgcc atgatggtgg ttattatacc + 2761 gtcaaggact gtgtgactat tgacgtcctt ccccgtacgc cgggcaataa tgtttatgtt + 2821 ggtttcatgg tttggtctaa ctttaccgct actaaatgcc gcggattggt ttcgctgaat + 2881 caggttatta aagagattat ttgtctccag ccacttaagt gaggtgattt atgtttggtg + 2941 ctattgctgg cggtattgct tctgctcttg ctggtggcgc catgtctaaa ttgtttggag + 3001 gcggtcaaaa agccgcctcc ggtggcattc aaggtgatgt gcttgctacc gataacaata + 3061 ctgtaggcat gggtgatgct ggtattaaat ctgccattca aggctctaat gttcctaacc + 3121 ctgatgaggc cgtccctagt tttgtttctg gtgctatggc taaagctggt aaaggacttc + 3181 ttgaaggtac gttgcaggct ggcacttctg ccgtttctga taagttgctt gatttggttg + 3241 gacttggtgg caagtctgcc gctgataaag gaaaggatac tcgtgattat cttgctgctg + 3301 catttcctga gcttaatgct tgggagcgtg ctggtgctga tgcttcctct gctggtatgg + 3361 ttgacgccgg atttgagaat caaaaagagc ttactaaaat gcaactggac aatcagaaag + 3421 agattgccga gatgcaaaat gagactcaaa aagagattgc tggcattcag tcggcgactt + 3481 cacgccagaa tacgaaagac caggtatatg cacaaaatga gatgcttgct tatcaacaga + 3541 aggagtctac tgctcgcgtt gcgtctatta tggaaaacac caatctttcc aagcaacagc + 3601 aggtttccga gattatgcgc caaatgctta ctcaagctca aacggctggt cagtatttta + 3661 ccaatgacca aatcaaagaa atgactcgca aggttagtgc tgaggttgac ttagttcatc + 3721 agcaaacgca gaatcagcgg tatggctctt ctcatattgg cgctactgca aaggatattt + 3781 ctaatgtcgt cactgatgct gcttctggtg tggttgatat ttttcatggt attgataaag + 3841 ctgttgccga tacttggaac aatttctgga aagacggtaa agctgatggt attggctcta + 3901 atttgtctag gaaataaccg tcaggattga caccctccca attgtatgtt ttcatgcctc + 3961 caaatcttgg aggctttttt atggttcgtt cttattaccc ttctgaatgt cacgctgatt + 4021 attttgactt tgagcgtatc gaggctctta aacctgctat tgaggcttgt ggcatttcta + 4081 ctctttctca atccccaatg cttggcttcc ataagcagat ggataaccgc atcaagctct + 4141 tggaagagat tctgtctttt cgtatgcagg gcgttgagtt cgataatggt gatatgtatg + 4201 ttgacggcca taaggctgct tctgacgttc gtgatgagtt tgtatctgtt actgagaagt + 4261 taatggatga attggcacaa tgctacaatg tgctccccca acttgatatt aataacacta + 4321 tagaccaccg ccccgaaggg gacgaaaaat ggtttttaga gaacgagaag acggttacgc + 4381 agttttgccg caagctggct gctgaacgcc ctcttaagga tattcgcgat gagtataatt + 4441 accccaaaaa gaaaggtatt aaggatgagt gttcaagatt gctggaggcc tccactatga + 4501 aatcgcgtag aggctttgct attcagcgtt tgatgaatgc aatgcgacag gctcatgctg + 4561 atggttggtt tatcgttttt gacactctca cgttggctga cgaccgatta gaggcgtttt + 4621 atgataatcc caatgctttg cgtgactatt ttcgtgatat tggtcgtatg gttcttgctg + 4681 ccgagggtcg caaggctaat gattcacacg ccgactgcta tcagtatttt tgtgtgcctg + 4741 agtatggtac agctaatggc cgtcttcatt tccatgcggt gcactttatg cggacacttc + 4801 ctacaggtag cgttgaccct aattttggtc gtcgggtacg caatcgccgc cagttaaata + 4861 gcttgcaaaa tacgtggcct tatggttaca gtatgcccat cgcagttcgc tacacgcagg + 4921 acgctttttc acgttctggt tggttgtggc ctgttgatgc taaaggtgag ccgcttaaag + 4981 ctaccagtta tatggctgtt ggtttctatg tggctaaata cgttaacaaa aagtcagata + 5041 tggaccttgc tgctaaaggt ctaggagcta aagaatggaa caactcacta aaaaccaagc + 5101 tgtcgctact tcccaagaag ctgttcagaa tcagaatgag ccgcaacttc gggatgaaaa + 5161 tgctcacaat gacaaatctg tccacggagt gcttaatcca acttaccaag ctgggttacg + 5221 acgcgacgcc gttcaaccag atattgaagc agaacgcaaa aagagagatg agattgaggc + 5281 tgggaaaagt tactgtagcc gacgttttgg cggcgcaacc tgtgacgaca aatctgctca + 5341 aatttatgcg cgcttcgata aaaatgattg gcgtatccaa cctgca +// diff --git a/mash/example_test.go b/mash/example_test.go new file mode 100644 index 0000000..48743a7 --- /dev/null +++ b/mash/example_test.go @@ -0,0 +1,22 @@ +package mash_test + +import ( + "fmt" + + "github.com/TimothyStiles/poly/mash" +) + +func ExampleMash() { + fingerprint1 := mash.New(17, 10) + fingerprint1.Sketch("ATGCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGA") + + fingerprint2 := mash.New(17, 9) + fingerprint2.Sketch("ATGCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGA") + + distance := fingerprint1.Distance(fingerprint2) + + fmt.Println(distance) + + // Output: + // 0 +} diff --git a/mash/mash.go b/mash/mash.go index 550d5e2..0c10955 100644 --- a/mash/mash.go +++ b/mash/mash.go @@ -106,35 +106,28 @@ func (mash *Mash) Sketch(sequence string) { // Similarity returns the Jaccard similarity between two sketches (number of matching hashes / sketch size) func (mash *Mash) Similarity(other *Mash) float64 { var sameHashes int + largerSketch := mash + smallerSketch := other - var largerSketch *Mash - var smallerSketch *Mash - - if mash.SketchSize > other.SketchSize { - largerSketch = mash - smallerSketch = other - } else { + if mash.SketchSize < other.SketchSize { largerSketch = other smallerSketch = mash } - largerSketchSizeShifted := largerSketch.SketchSize - 1 - smallerSketchSizeShifted := smallerSketch.SketchSize - 1 - - // if the largest hash in the larger sketch is smaller than the smallest hash in the smaller sketch, the distance is 1 - if largerSketch.Sketches[largerSketchSizeShifted] < smallerSketch.Sketches[0] { - return 0 - } - - // if the largest hash in the smaller sketch is smaller than the smallest hash in the larger sketch, the distance is 1 - if smallerSketch.Sketches[smallerSketchSizeShifted] < largerSketch.Sketches[0] { + if largerSketch.Sketches[largerSketch.SketchSize-1] < smallerSketch.Sketches[0] || smallerSketch.Sketches[smallerSketch.SketchSize-1] < largerSketch.Sketches[0] { return 0 } - for _, hash := range smallerSketch.Sketches { - ind := sort.Search(largerSketchSizeShifted, func(ind int) bool { return largerSketch.Sketches[ind] <= hash }) - if largerSketch.Sketches[ind] == hash { + smallSketchIndex, largeSketchIndex := 0, 0 + for smallSketchIndex < smallerSketch.SketchSize && largeSketchIndex < largerSketch.SketchSize { + if smallerSketch.Sketches[smallSketchIndex] == largerSketch.Sketches[largeSketchIndex] { sameHashes++ + smallSketchIndex++ + largeSketchIndex++ + } else if smallerSketch.Sketches[smallSketchIndex] < largerSketch.Sketches[largeSketchIndex] { + smallSketchIndex++ + } else { + largeSketchIndex++ } } diff --git a/mash/mash_test.go b/mash/mash_test.go index 6ba9a66..ccf20ce 100644 --- a/mash/mash_test.go +++ b/mash/mash_test.go @@ -37,4 +37,38 @@ func TestMash(t *testing.T) { if distance != 1 { t.Errorf("Expected distance to be 1, got %f", distance) } + + fingerprint1 = mash.New(17, 10) + fingerprint1.Sketch("ATGCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGA") + + fingerprint2 = mash.New(17, 5) + fingerprint2.Sketch("ATCGATCGATCGATCGATCGATCGATCGATCGATCGAATGCGATCGATCGATCGATCGATCG") + + distance = fingerprint1.Distance(fingerprint2) + if !(distance > 0.19 && distance < 0.21) { + t.Errorf("Expected distance to be 0.19999999999999996, got %f", distance) + } + + fingerprint1 = mash.New(17, 10) + fingerprint1.Sketch("ATCGATCGATCGATCGATCGATCGATCGATCGATCGAATGCGATCGATCGATCGATCGATCG") + + fingerprint2 = mash.New(17, 5) + fingerprint2.Sketch("ATGCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGA") + + distance = fingerprint1.Distance(fingerprint2) + if distance != 0 { + t.Errorf("Expected distance to be 0, got %f", distance) + } +} + +func BenchmarkMashDistancee(b *testing.B) { + fingerprint1 := mash.New(17, 10) + fingerprint1.Sketch("ATGCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGA") + + fingerprint2 := mash.New(17, 9) + fingerprint2.Sketch("ATGCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGA") + + for i := 0; i < b.N; i++ { + fingerprint1.Distance(fingerprint2) + } } diff --git a/synthesis/codon/codon.go b/synthesis/codon/codon.go index 9a44b69..85958c5 100644 --- a/synthesis/codon/codon.go +++ b/synthesis/codon/codon.go @@ -26,37 +26,35 @@ import ( "strings" "time" + "github.com/TimothyStiles/poly/bio/genbank" weightedRand "github.com/mroth/weightedrand" ) /****************************************************************************** -Oct, 15, 2020 - File is structured as so: - Interfaces: - Table - specifies the functions that all table types must implement + Interfaces: + Table - An interface encompassing what a potentially codon optimized Translation table can do + + Structs: + TranslationTable - contains a weighted codon table, which is used when translating and optimizing sequences. The weights can be updated through the codon frequencies we observe in given DNA sequences. - Structs: - codonTable - holds all information mapping codons <-> amino acids during transformations. AminoAcid - holds amino acid related info for codonTable struct - Codon - holds codon related info for AminoAcid struct - Big functions that everything else is related to: + Codon - holds codon related info for AminoAcid struct - Translate - given a nucleic sequence string and codon table it translates sequences - to UPPERCASE amino acid sequences. + Key functions: + TranslationTable.Translate - given a nucleic sequence string and codon table it translates sequences to UPPERCASE amino acid sequences. - Optimize - given an amino acid sequence string and codon table it translates - sequences to UPPERCASE nucleic acid sequences. + TranslationTable.Optimize - will return a set of codons which can be used to encode the given amino acid sequence. The codons picked are weighted according to the computed translation table's weights -Anywho, most of this file and codonTable's struct methods are meant to help overcome -this codon bias. There's a default codonTable generator near the bottom of this file -with a whole section on how it works and why it's gotta be that way. + TranslationTable.UpdateWeightsWithSequence - will look at the coding regions in the given genbank data, and use those to generate new weights for the codons in the translation table. The next time a sequence is optimised, it will use those updated weights. + + TranslationTable.Stats - a set of statistics we maintain throughout the translation table's lifetime. For example we track the start codons observed when we update the codon table's weights with other DNA sequences ******************************************************************************/ var ( - errEmptyCodonTable = errors.New("empty codon table") + errNoCodingRegions = errors.New("no coding regions found") errEmptyAminoAcidString = errors.New("empty amino acid string") errEmptySequenceString = errors.New("empty sequence string") newChooserFn = weightedRand.NewChooser @@ -83,74 +81,69 @@ type AminoAcid struct { Codons []Codon `json:"codons"` } -// Table is an interface that specifies the functions that all table types must implement +// Table is an interface encompassing what a potentially codon optimized Translation table can do type Table interface { - Chooser() (map[string]weightedRand.Chooser, error) - GenerateTranslationTable() map[string]string - GenerateStartCodonTable() map[string]string - GetAminoAcids() []AminoAcid - GetStartCodons() []string - GetStopCodons() []string - IsEmpty() bool - OptimizeTable(string) Table + GetWeightedAminoAcids() []AminoAcid + Optimize(aminoAcids string, randomState ...int) (string, error) + Translate(dnaSeq string) (string, error) +} + +// Stats denotes a set of statistics we maintain throughout the translation table's lifetime. For example we track +// the start codons observed when we update the codon table's weights with other DNA sequences +type Stats struct { + StartCodonCount map[string]int + GeneCount int } -// codonTable holds information for a codon table. -type codonTable struct { +// NewStats returns a new instance of codon statistics (a set of statistics we maintain throughout a translation table's lifetime) +func NewStats() *Stats { + return &Stats{ + StartCodonCount: map[string]int{}, + } +} + +// TranslationTable contains a weighted codon table, which is used when translating and optimizing sequences. The +// weights can be updated through the codon frequencies we observe in given DNA sequences. +type TranslationTable struct { StartCodons []string `json:"start_codons"` StopCodons []string `json:"stop_codons"` AminoAcids []AminoAcid `json:"amino_acids"` -} -// Translate translates a codon sequence to an amino acid sequence -func Translate(sequence string, codonTable Table) (string, error) { - if codonTable.IsEmpty() { - return "", errEmptyCodonTable - } - if len(sequence) == 0 { - return "", errEmptySequenceString - } + TranslationMap map[string]string + StartCodonTable map[string]string + Choosers map[string]weightedRand.Chooser - var aminoAcids strings.Builder - var currentCodon strings.Builder - translationTable := codonTable.GenerateTranslationTable() - startCodonTable := codonTable.GenerateStartCodonTable() + Stats *Stats +} - startCodonReached := false - for _, letter := range sequence { - // add current nucleotide to currentCodon - currentCodon.WriteRune(letter) +// Copy returns a deep copy of the translation table. This is to prevent an unintended update of data used in another +// process, since the tables are generated at build time. +func (table *TranslationTable) Copy() *TranslationTable { + return &TranslationTable{ + StartCodons: table.StartCodons, + StopCodons: table.StopCodons, + AminoAcids: table.AminoAcids, - // if current nucleotide is the third in a codon translate to aminoAcid write to aminoAcids and reset currentCodon. - // use start codon table for the first codon only, erroring out if an invalid start codon is provided - if currentCodon.Len() == 3 { - if startCodonReached { - aminoAcids.WriteString(translationTable[strings.ToUpper(currentCodon.String())]) - } else { - aminoAcid, ok := startCodonTable[strings.ToUpper(currentCodon.String())] - if !ok { - return "", fmt.Errorf("start codon %q is not in start codon table %v", currentCodon.String(), startCodonTable) - } - aminoAcids.WriteString(aminoAcid) - startCodonReached = true - } + StartCodonTable: table.StartCodonTable, + TranslationMap: table.TranslationMap, + Choosers: table.Choosers, - // reset codon string builder for next codon. - currentCodon.Reset() - } + Stats: table.Stats, } - return aminoAcids.String(), nil } -// Optimize takes an amino acid sequence and codonTable and returns an optimized codon sequence. Takes an optional random seed as last argument. -func Optimize(aminoAcids string, codonTable Table, randomState ...int) (string, error) { +// GetWeightedAminoAcids returns the amino acids along with their associated codon weights +func (table *TranslationTable) GetWeightedAminoAcids() []AminoAcid { + return table.AminoAcids +} + +// Optimize will return a set of codons which can be used to encode the given amino acid sequence. The codons +// picked are weighted according to the computed translation table's weights +func (table *TranslationTable) Optimize(aminoAcids string, randomState ...int) (string, error) { // Finding any given aminoAcid is dependent upon it being capitalized, so // we do that here. aminoAcids = strings.ToUpper(aminoAcids) - if codonTable.IsEmpty() { - return "", errEmptyCodonTable - } if len(aminoAcids) == 0 { return "", errEmptyAminoAcidString } @@ -163,45 +156,149 @@ func Optimize(aminoAcids string, codonTable Table, randomState ...int) (string, } var codons strings.Builder - codonChooser, err := codonTable.Chooser() - if err != nil { - return "", err - } + codonChooser := table.Choosers for _, aminoAcid := range aminoAcids { chooser, ok := codonChooser[string(aminoAcid)] if !ok { return "", invalidAminoAcidError{aminoAcid} } + codons.WriteString(chooser.Pick().(string)) } + return codons.String(), nil } -// OptimizeTable weights each codon in a codon table according to input string codon frequency. -// This function actually mutates the codonTable struct itself. -func (table codonTable) OptimizeTable(sequence string) Table { +// UpdateWeights will update the translation table's codon pickers with the given amino acid codon weights +func (table *TranslationTable) UpdateWeights(aminoAcids []AminoAcid) error { + // regenerate a map of codons -> amino acid + + var updatedTranslationMap = make(map[string]string) + for _, aminoAcid := range table.AminoAcids { + for _, codon := range aminoAcid.Codons { + updatedTranslationMap[codon.Triplet] = aminoAcid.Letter + } + } + + table.TranslationMap = updatedTranslationMap + + // Update Chooser + updatedChoosers, err := newAminoAcidChoosers(table.AminoAcids) + if err != nil { + return err + } + + table.Choosers = updatedChoosers + table.AminoAcids = aminoAcids + + return nil +} + +// UpdateWeightsWithSequence will look at the coding regions in the given genbank data, and use those to generate new +// weights for the codons in the translation table. The next time a sequence is optimised, it will use those updated +// weights. +// +// This can be used to, for example, figure out which DNA sequence is needed to give the best yield of protein when +// trying to express a protein across different species +func (table *TranslationTable) UpdateWeightsWithSequence(data genbank.Genbank) error { + codingRegions, err := extractCodingRegion(data) + if err != nil { + return err + } + + table.Stats.GeneCount = len(codingRegions) + for _, sequence := range codingRegions { + table.Stats.StartCodonCount[sequence[:3]]++ + } + + if len(codingRegions) == 0 { + return errNoCodingRegions + } + + // weight our codon optimization table using the regions we collected from the genbank file above + newWeights := weightAminoAcids(strings.Join(codingRegions, ""), table.AminoAcids) + + return table.UpdateWeights(newWeights) +} + +// Translate will return an amino acid sequence which the given DNA will yield +func (table *TranslationTable) Translate(dnaSeq string) (string, error) { + if dnaSeq == "" { + return "", errEmptySequenceString + } + + var aminoAcids strings.Builder + var currentCodon strings.Builder + translationTable := table.TranslationMap + startCodonTable := table.StartCodonTable + + startCodonReached := false + for _, letter := range dnaSeq { + // add current nucleotide to currentCodon + currentCodon.WriteRune(letter) + + // if current nucleotide is the third in a codon translate to aminoAcid write to aminoAcids and reset currentCodon. + // use start codon table for the first codon only, erroring out if an invalid start codon is provided + if currentCodon.Len() == 3 { + if startCodonReached { + aminoAcids.WriteString(translationTable[strings.ToUpper(currentCodon.String())]) + } else { + aminoAcid, ok := startCodonTable[strings.ToUpper(currentCodon.String())] + if !ok { + return "", fmt.Errorf("start codon %q is not in start codon table %v", currentCodon.String(), startCodonTable) + } + aminoAcids.WriteString(aminoAcid) + startCodonReached = true + } + + // reset codon string builder for next codon. + currentCodon.Reset() + } + } + return aminoAcids.String(), nil +} + +// weightAminoAcids weights each codon in a codon table according to input string codon frequency, adding weight to +// the given NCBI base codon table +func weightAminoAcids(sequence string, aminoAcids []AminoAcid) []AminoAcid { sequence = strings.ToUpper(sequence) codonFrequencyMap := getCodonFrequency(sequence) - for aminoAcidIndex, aminoAcid := range table.AminoAcids { + for aminoAcidIndex, aminoAcid := range aminoAcids { // apply weights to codonTable for codonIndex, codon := range aminoAcid.Codons { - table.AminoAcids[aminoAcidIndex].Codons[codonIndex].Weight = codonFrequencyMap[codon.Triplet] + aminoAcids[aminoAcidIndex].Codons[codonIndex].Weight = codonFrequencyMap[codon.Triplet] } } - return table + + return aminoAcids } -// GenerateStartCodonTable returns a mapping from the start codons of a Table to their associated amino acids. -// For our codonTable implementation, assumes that we always map to Met. -func (table codonTable) GenerateStartCodonTable() map[string]string { - result := make(map[string]string) - for _, codon := range table.StartCodons { - result[codon] = "M" +// extractCodingRegion loops through genbank data to find all CDS (coding sequences) +func extractCodingRegion(data genbank.Genbank) ([]string, error) { + codingRegions := []string{} + + // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder + for _, feature := range data.Features { + if feature.Type == "CDS" { + sequence, err := feature.GetSequence() + if err != nil { + return nil, err + } + + // Note: sometimes, genbank files will have annotated CDSs that are pseudo genes (not having triplet codons). + // This will shift the entire codon table, messing up the end results. To fix this, make sure to do a modulo + // check. + if len(sequence)%3 != 0 { + continue + } + + codingRegions = append(codingRegions, sequence) + } } - return result + return codingRegions, nil } // getCodonFrequency takes a DNA sequence and returns a hashmap of its codons and their frequencies. @@ -231,17 +328,13 @@ func getCodonFrequency(sequence string) map[string]int { return codonFrequencyHashMap } -func (table codonTable) IsEmpty() bool { - return len(table.StartCodons) == 0 && len(table.StopCodons) == 0 && len(table.AminoAcids) == 0 -} - -// Chooser is a codonTable method to convert a codon table to a chooser -func (table codonTable) Chooser() (map[string]weightedRand.Chooser, error) { +// newAminoAcidChoosers is a codonTable method to convert a codon table to a chooser +func newAminoAcidChoosers(aminoAcids []AminoAcid) (map[string]weightedRand.Chooser, error) { // This maps codon tables structure to weightRand.NewChooser structure codonChooser := make(map[string]weightedRand.Chooser) // iterate over every amino acid in the codonTable - for _, aminoAcid := range table.AminoAcids { + for _, aminoAcid := range aminoAcids { // create a list of codon choices for this specific amino acid codonChoices := make([]weightedRand.Choice, len(aminoAcid.Codons)) @@ -264,7 +357,7 @@ func (table codonTable) Chooser() (map[string]weightedRand.Chooser, error) { // add this chooser set to the codonChooser map under the name of the aminoAcid it represents. chooser, err := newChooserFn(codonChoices...) if err != nil { - return nil, fmt.Errorf("weightedRand.NewChooser() error: %s", err) + return nil, fmt.Errorf("weightedRand.NewChooser() error: %w", err) } codonChooser[aminoAcid.Letter] = *chooser @@ -272,29 +365,6 @@ func (table codonTable) Chooser() (map[string]weightedRand.Chooser, error) { return codonChooser, nil } -// GenerateTranslationTable generates a map of codons -> amino acid -func (table codonTable) GenerateTranslationTable() map[string]string { - var translationMap = make(map[string]string) - for _, aminoAcid := range table.AminoAcids { - for _, codon := range aminoAcid.Codons { - translationMap[codon.Triplet] = aminoAcid.Letter - } - } - return translationMap -} - -func (table codonTable) GetStartCodons() []string { - return table.StartCodons -} - -func (table codonTable) GetStopCodons() []string { - return table.StopCodons -} - -func (table codonTable) GetAminoAcids() []AminoAcid { - return table.AminoAcids -} - /****************************************************************************** Oct, 15, 2020 @@ -323,7 +393,7 @@ Tim ******************************************************************************/ // Function to generate default codon tables from NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi -func generateCodonTable(aminoAcids, starts string) codonTable { +func generateCodonTable(aminoAcids, starts string) *TranslationTable { base1 := "TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG" base2 := "TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG" base3 := "TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG" @@ -349,16 +419,48 @@ func generateCodonTable(aminoAcids, starts string) codonTable { for k, v := range aminoAcidMap { aminoAcidSlice = append(aminoAcidSlice, AminoAcid{string(k), v}) } - return codonTable{startCodons, stopCodons, aminoAcidSlice} + + // generate a map of codons -> amino acid + + var translationMap = make(map[string]string) + for _, aminoAcid := range aminoAcidSlice { + for _, codon := range aminoAcid.Codons { + translationMap[codon.Triplet] = aminoAcid.Letter + } + } + + // GenerateStartCodonTable returns a mapping from the start codons of a Table to their associated amino acids. + // For our codonTable implementation, assumes that we always map to Met. + + startCodonsMap := make(map[string]string) + for _, codon := range startCodons { + startCodonsMap[codon] = "M" + } + + // This function is run at buildtime and failure here means we have an invalid codon table. + chooser, err := newAminoAcidChoosers(aminoAcidSlice) + if err != nil { + panic(fmt.Errorf("tried to generate an invalid codon table %w", err)) + } + + return &TranslationTable{ + StartCodons: startCodons, + StopCodons: stopCodons, + AminoAcids: aminoAcidSlice, + TranslationMap: translationMap, + StartCodonTable: startCodonsMap, + Choosers: chooser, + Stats: NewStats(), + } } -// GetCodonTable takes the index of desired NCBI codon table and returns it. -func GetCodonTable(index int) Table { - return defaultCodonTablesByNumber[index] +// NewTranslationTable takes the index of desired NCBI codon table and returns it. +func NewTranslationTable(index int) *TranslationTable { + return translationTablesByNumber[index].Copy() } -// defaultCodonTablesByNumber stores all codon tables published by NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi using numbered indices. -var defaultCodonTablesByNumber = map[int]codonTable{ +// translationTablesByNumber stores all codon tables published by NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi using numbered indices. +var translationTablesByNumber = map[int]*TranslationTable{ 1: generateCodonTable("FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "---M------**--*----M---------------M----------------------------"), 2: generateCodonTable("FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG", "----------**--------------------MMMM----------**---M------------"), 3: generateCodonTable("FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG", "----------**----------------------MM---------------M------------"), @@ -442,21 +544,21 @@ Keoni ******************************************************************************/ // ParseCodonJSON parses a codonTable JSON file. -func ParseCodonJSON(file []byte) Table { - var codonTable codonTable +func ParseCodonJSON(file []byte) *TranslationTable { + var codonTable TranslationTable _ = json.Unmarshal(file, &codonTable) - return codonTable + return &codonTable } // ReadCodonJSON reads a codonTable JSON file. -func ReadCodonJSON(path string) Table { +func ReadCodonJSON(path string) *TranslationTable { file, _ := os.ReadFile(path) codonTable := ParseCodonJSON(file) return codonTable } // WriteCodonJSON writes a codonTable struct out to JSON. -func WriteCodonJSON(codonTable Table, path string) { +func WriteCodonJSON(codonTable *TranslationTable, path string) { file, _ := json.MarshalIndent(codonTable, "", " ") _ = os.WriteFile(path, file, 0644) } @@ -492,27 +594,26 @@ Keoni // CompromiseCodonTable takes 2 CodonTables and makes a new codonTable // that is an equal compromise between the two tables. -func CompromiseCodonTable(firstCodonTable, secondCodonTable Table, cutOff float64) (Table, error) { - // Initialize output codonTable, c - var c codonTable +func CompromiseCodonTable(firstCodonTable, secondCodonTable *TranslationTable, cutOff float64) (*TranslationTable, error) { + // Copy first table to base our merge on + // + // this take start and stop strings from first table + // and use them as start + stops in final codonTable + mergedTable := firstCodonTable.Copy() + // Check if cutOff is too high or low (this is converted to a percent) if cutOff < 0 { - return c, errors.New("cut off too low, cannot be less than 0") + return mergedTable, errors.New("cut off too low, cannot be less than 0") } if cutOff > 1 { - return c, errors.New("cut off too high, cannot be greater than 1") + return mergedTable, errors.New("cut off too high, cannot be greater than 1") } - // Take start and stop strings from first table - // and use them as start + stops in final codonTable - c.StartCodons = firstCodonTable.GetStartCodons() - c.StopCodons = firstCodonTable.GetStopCodons() - // Initialize the finalAminoAcid list for the output codonTable var finalAminoAcids []AminoAcid // Loop over all AminoAcids represented in the first codonTable - for _, firstAa := range firstCodonTable.GetAminoAcids() { + for _, firstAa := range firstCodonTable.AminoAcids { var firstTriplets []string var firstWeights []int var firstTotal int @@ -525,7 +626,7 @@ func CompromiseCodonTable(firstCodonTable, secondCodonTable Table, cutOff float6 firstTriplets = append(firstTriplets, firstCodon.Triplet) firstWeights = append(firstWeights, firstCodon.Weight) firstTotal = firstTotal + firstCodon.Weight - for _, secondAa := range secondCodonTable.GetAminoAcids() { + for _, secondAa := range secondCodonTable.AminoAcids { if secondAa.Letter == firstAa.Letter { for _, secondCodon := range secondAa.Codons { // For each codon from firstCodonTable, get the @@ -568,19 +669,24 @@ func CompromiseCodonTable(firstCodonTable, secondCodonTable Table, cutOff float6 // Append list of Codons to finalAminoAcids finalAminoAcids = append(finalAminoAcids, AminoAcid{firstAa.Letter, finalCodons}) } - c.AminoAcids = finalAminoAcids - return c, nil + + err := mergedTable.UpdateWeights(finalAminoAcids) + if err != nil { + return nil, err + } + + return mergedTable, nil } // AddCodonTable takes 2 CodonTables and adds them together to create // a new codonTable. -func AddCodonTable(firstCodonTable, secondCodonTable Table) Table { +func AddCodonTable(firstCodonTable, secondCodonTable *TranslationTable) (*TranslationTable, error) { // Add up codons var finalAminoAcids []AminoAcid - for _, firstAa := range firstCodonTable.GetAminoAcids() { + for _, firstAa := range firstCodonTable.AminoAcids { var finalCodons []Codon for _, firstCodon := range firstAa.Codons { - for _, secondAa := range secondCodonTable.GetAminoAcids() { + for _, secondAa := range secondCodonTable.AminoAcids { for _, secondCodon := range secondAa.Codons { if firstCodon.Triplet == secondCodon.Triplet { finalCodons = append(finalCodons, Codon{firstCodon.Triplet, firstCodon.Weight + secondCodon.Weight}) @@ -591,9 +697,12 @@ func AddCodonTable(firstCodonTable, secondCodonTable Table) Table { finalAminoAcids = append(finalAminoAcids, AminoAcid{firstAa.Letter, finalCodons}) } - return codonTable{ - StartCodons: firstCodonTable.GetStartCodons(), - StopCodons: firstCodonTable.GetStopCodons(), - AminoAcids: finalAminoAcids, + mergedTable := firstCodonTable.Copy() + + err := mergedTable.UpdateWeights(finalAminoAcids) + if err != nil { + return nil, err } + + return mergedTable, nil } diff --git a/synthesis/codon/codon_test.go b/synthesis/codon/codon_test.go index 0374e79..6638696 100644 --- a/synthesis/codon/codon_test.go +++ b/synthesis/codon/codon_test.go @@ -7,6 +7,7 @@ import ( "testing" "github.com/TimothyStiles/poly/bio" + "github.com/TimothyStiles/poly/bio/genbank" "github.com/google/go-cmp/cmp" weightedRand "github.com/mroth/weightedrand" "github.com/stretchr/testify/assert" @@ -18,7 +19,7 @@ func TestTranslation(t *testing.T) { gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" gfpDnaSequence := "ATGGCTAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCTACATACGGAAAGCTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCATATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACATTGAAGATGGATCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAATAA" - if got, _ := Translate(gfpDnaSequence, GetCodonTable(11)); got != gfpTranslation { + if got, _ := NewTranslationTable(11).Translate(gfpDnaSequence); got != gfpTranslation { t.Errorf("TestTranslation has failed. Translate has returned %q, want %q", got, gfpTranslation) } } @@ -29,7 +30,7 @@ func TestTranslationAlwaysMapsStartCodonToMet(t *testing.T) { gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" gfpDnaSequence := "TTGGCTAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCTACATACGGAAAGCTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCATATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACATTGAAGATGGATCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAATAA" - if got, _ := Translate(gfpDnaSequence, GetCodonTable(11)); got != gfpTranslation { + if got, _ := NewTranslationTable(11).Translate(gfpDnaSequence); got != gfpTranslation { t.Errorf("TestTranslation has failed. Translate has returned %q, want %q", got, gfpTranslation) } } @@ -37,23 +38,14 @@ func TestTranslationAlwaysMapsStartCodonToMet(t *testing.T) { func TestTranslationErrorsOnIncorrectStartCodon(t *testing.T) { badSequence := "GGG" - if _, gotErr := Translate(badSequence, GetCodonTable(11)); gotErr == nil { + if _, gotErr := NewTranslationTable(11).Translate(badSequence); gotErr == nil { t.Errorf("Translation should return an error if given an incorrect start codon") } } -func TestTranslationErrorsOnEmptyCodonTable(t *testing.T) { - emtpyCodonTable := codonTable{} - _, err := Translate("A", emtpyCodonTable) - - if err != errEmptyCodonTable { - t.Error("Translation should return an error if given an empty codon table") - } -} - func TestTranslationErrorsOnEmptyAminoAcidString(t *testing.T) { - nonEmptyCodonTable := GetCodonTable(1) - _, err := Translate("", nonEmptyCodonTable) + nonEmptyCodonTable := NewTranslationTable(1) + _, err := nonEmptyCodonTable.Translate("") if err != errEmptySequenceString { t.Error("Translation should return an error if given an empty sequence string") @@ -63,7 +55,7 @@ func TestTranslationErrorsOnEmptyAminoAcidString(t *testing.T) { func TestTranslationMixedCase(t *testing.T) { gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" gfpDnaSequence := "atggctagcaaaggagaagaacttttcactggagttgtcccaaTTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCTACATACGGAAAGCTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCATATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACATTGAAGATGGATCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAATAA" - if got, _ := Translate(gfpDnaSequence, GetCodonTable(11)); got != gfpTranslation { + if got, _ := NewTranslationTable(11).Translate(gfpDnaSequence); got != gfpTranslation { t.Errorf("TestTranslationMixedCase has failed. Translate has returned %q, want %q", got, gfpTranslation) } } @@ -71,7 +63,7 @@ func TestTranslationMixedCase(t *testing.T) { func TestTranslationLowerCase(t *testing.T) { gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" gfpDnaSequence := "atggctagcaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgctacatacggaaagcttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttctcttatggtgttcaatgcttttcccgttatccggatcatatgaaacggcatgactttttcaagagtgccatgcccgaaggttatgtacaggaacgcactatatctttcaaagatgacgggaactacaagacgcgtgctgaagtcaagtttgaaggtgatacccttgttaatcgtatcgagttaaaaggtattgattttaaagaagatggaaacattctcggacacaaactcgagtacaactataactcacacaatgtatacatcacggcagacaaacaaaagaatggaatcaaagctaacttcaaaattcgccacaacattgaagatggatccgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtcgacacaatctgccctttcgaaagatcccaacgaaaagcgtgaccacatggtccttcttgagtttgtaactgctgctgggattacacatggcatggatgagctctacaaataa" - if got, _ := Translate(gfpDnaSequence, GetCodonTable(11)); got != gfpTranslation { + if got, _ := NewTranslationTable(11).Translate(gfpDnaSequence); got != gfpTranslation { t.Errorf("TestTranslationLowerCase has failed. Translate has returned %q, want %q", got, gfpTranslation) } } @@ -84,27 +76,16 @@ func TestOptimize(t *testing.T) { parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - codonTable := GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder - - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder.WriteString(sequence) - } + table := NewTranslationTable(11) + err := table.UpdateWeightsWithSequence(*sequence) + if err != nil { + t.Error(err) } - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() - - // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable := codonTable.OptimizeTable(codingRegions) + codonTable := NewTranslationTable(11) - optimizedSequence, _ := Optimize(gfpTranslation, optimizationTable) - optimizedSequenceTranslation, _ := Translate(optimizedSequence, optimizationTable) + optimizedSequence, _ := table.Optimize(gfpTranslation) + optimizedSequenceTranslation, _ := codonTable.Translate(optimizedSequence) if optimizedSequenceTranslation != gfpTranslation { t.Errorf("TestOptimize has failed. Translate has returned %q, want %q", optimizedSequenceTranslation, gfpTranslation) @@ -117,27 +98,17 @@ func TestOptimizeSameSeed(t *testing.T) { defer file.Close() parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - var codonTable = GetCodonTable(11) - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder - - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder.WriteString(sequence) - } + optimizationTable := NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(*sequence) + if err != nil { + t.Error(err) } - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() - - var optimizationTable = codonTable.OptimizeTable(codingRegions) randomSeed := 10 - optimizedSequence, _ := Optimize(gfpTranslation, optimizationTable, randomSeed) - otherOptimizedSequence, _ := Optimize(gfpTranslation, optimizationTable, randomSeed) + optimizedSequence, _ := optimizationTable.Optimize(gfpTranslation, randomSeed) + otherOptimizedSequence, _ := optimizationTable.Optimize(gfpTranslation, randomSeed) if optimizedSequence != otherOptimizedSequence { t.Error("Optimized sequence with the same random seed are not the same") @@ -150,44 +121,24 @@ func TestOptimizeDifferentSeed(t *testing.T) { defer file.Close() parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - var codonTable = GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder.WriteString(sequence) - } + optimizationTable := NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(*sequence) + if err != nil { + t.Error(err) } - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() - - var optimizationTable = codonTable.OptimizeTable(codingRegions) - - optimizedSequence, _ := Optimize(gfpTranslation, optimizationTable) - otherOptimizedSequence, _ := Optimize(gfpTranslation, optimizationTable) + optimizedSequence, _ := optimizationTable.Optimize(gfpTranslation) + otherOptimizedSequence, _ := optimizationTable.Optimize(gfpTranslation) if optimizedSequence == otherOptimizedSequence { t.Error("Optimized sequence with different random seed have the same result") } } -func TestOptimizeErrorsOnEmptyCodonTable(t *testing.T) { - emtpyCodonTable := codonTable{} - _, err := Optimize("A", emtpyCodonTable) - - if err != errEmptyCodonTable { - t.Error("Optimize should return an error if given an empty codon table") - } -} - func TestOptimizeErrorsOnEmptyAminoAcidString(t *testing.T) { - nonEmptyCodonTable := GetCodonTable(1) - _, err := Optimize("", nonEmptyCodonTable) + nonEmptyCodonTable := NewTranslationTable(1) + _, err := nonEmptyCodonTable.Optimize("") if err != errEmptyAminoAcidString { t.Error("Optimize should return an error if given an empty amino acid string") @@ -195,32 +146,14 @@ func TestOptimizeErrorsOnEmptyAminoAcidString(t *testing.T) { } func TestOptimizeErrorsOnInvalidAminoAcid(t *testing.T) { aminoAcids := "TOP" - table := GetCodonTable(1) // does not contain 'O' + table := NewTranslationTable(1) // does not contain 'O' - _, optimizeErr := Optimize(aminoAcids, table) + _, optimizeErr := table.Optimize(aminoAcids) assert.EqualError(t, optimizeErr, invalidAminoAcidError{'O'}.Error()) } -func TestOptimizeErrorsOnBrokenChooser(t *testing.T) { - gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" - - chooserErr := errors.New("chooser rigged to fail") - - codonTable := &mockTable{ - ChooserFn: func() (map[string]weightedRand.Chooser, error) { - return nil, chooserErr - }, - IsEmptyFn: func() bool { - return false - }, - } - - _, err := Optimize(gfpTranslation, codonTable) - assert.EqualError(t, err, chooserErr.Error()) -} - func TestGetCodonFrequency(t *testing.T) { - translationTable := GetCodonTable(11).GenerateTranslationTable() + translationTable := NewTranslationTable(11).TranslationMap var codons strings.Builder @@ -263,21 +196,6 @@ func TestGetCodonFrequency(t *testing.T) { } } -func TestChooserError(t *testing.T) { - codonTable := GetCodonTable(11) - - oldChooserFn := newChooserFn - newChooserFn = func(choices ...weightedRand.Choice) (*weightedRand.Chooser, error) { - return nil, errors.New("new chooser rigged to fail") - } - defer func() { - newChooserFn = oldChooserFn - }() - - _, err := codonTable.Chooser() - assert.EqualError(t, err, "weightedRand.NewChooser() error: new chooser rigged to fail") -} - /****************************************************************************** JSON related tests begin here. @@ -309,49 +227,25 @@ func TestCompromiseCodonTable(t *testing.T) { defer file.Close() parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - codonTable := GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder - - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder.WriteString(sequence) - } - } - - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable := codonTable.OptimizeTable(codingRegions) + optimizationTable := NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(*sequence) + if err != nil { + t.Error(err) + } file2, _ := os.Open("../../data/phix174.gb") defer file2.Close() parser2, _ := bio.NewGenbankParser(file2) sequence2, _ := parser2.Next() - codonTable2 := GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder2 strings.Builder - - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence2.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder2.WriteString(sequence) - } + optimizationTable2 := NewTranslationTable(11) + err = optimizationTable2.UpdateWeightsWithSequence(*sequence2) + if err != nil { + t.Error(err) } - // get the concatenated sequence string of the coding regions - codingRegions2 := codingRegionsBuilder2.String() - - // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable2 := codonTable2.OptimizeTable(codingRegions2) - - _, err := CompromiseCodonTable(optimizationTable, optimizationTable2, -1.0) // Fails too low + _, err = CompromiseCodonTable(optimizationTable, optimizationTable2, -1.0) // Fails too low if err == nil { t.Errorf("Compromise table should fail on -1.0") } @@ -359,20 +253,59 @@ func TestCompromiseCodonTable(t *testing.T) { if err == nil { t.Errorf("Compromise table should fail on 10.0") } -} -type mockTable struct { - codonTable - ChooserFn func() (map[string]weightedRand.Chooser, error) - IsEmptyFn func() bool -} + // replace chooser fn with test one + newChooserFn = func(choices ...weightedRand.Choice) (*weightedRand.Chooser, error) { + return nil, errors.New("new chooser rigged to fail") + } + + defer func() { + newChooserFn = weightedRand.NewChooser + }() -func (t *mockTable) Chooser() (map[string]weightedRand.Chooser, error) { - return t.ChooserFn() + _, err = CompromiseCodonTable(optimizationTable, optimizationTable2, 0.1) + if err == nil { + t.Errorf("Compromise table should fail when new chooser func rigged") + } } -func (t *mockTable) IsEmpty() bool { - return t.IsEmptyFn() +func TestAddCodonTable(t *testing.T) { + file, _ := os.Open(puc19path) + defer file.Close() + parser, _ := bio.NewGenbankParser(file) + sequence, _ := parser.Next() + + // weight our codon optimization table using the regions we collected from the genbank file above + + optimizationTable := NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(*sequence) + if err != nil { + t.Error(err) + } + + file2, _ := os.Open("../../data/phix174.gb") + defer file2.Close() + parser2, _ := bio.NewGenbankParser(file2) + sequence2, _ := parser2.Next() + optimizationTable2 := NewTranslationTable(11) + err = optimizationTable2.UpdateWeightsWithSequence(*sequence2) + if err != nil { + t.Error(err) + } + + // replace chooser fn with test one + newChooserFn = func(choices ...weightedRand.Choice) (*weightedRand.Chooser, error) { + return nil, errors.New("new chooser rigged to fail") + } + + defer func() { + newChooserFn = weightedRand.NewChooser + }() + + _, err = AddCodonTable(optimizationTable, optimizationTable2) + if err == nil { + t.Errorf("Compromise table should fail when new chooser func rigged") + } } func TestCapitalizationRegression(t *testing.T) { @@ -383,29 +316,245 @@ func TestCapitalizationRegression(t *testing.T) { defer file.Close() parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - codonTable := GetCodonTable(11) - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder + optimizationTable := NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(*sequence) + if err != nil { + t.Error(err) + } + + optimizedSequence, _ := optimizationTable.Optimize(gfpTranslation, 1) + optimizedSequenceTranslation, _ := optimizationTable.Translate(optimizedSequence) - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder.WriteString(sequence) - } + if optimizedSequenceTranslation != strings.ToUpper(gfpTranslation) { + t.Errorf("TestOptimize has failed. Translate has returned %q, want %q", optimizedSequenceTranslation, gfpTranslation) } +} + +func TestOptimizeSequence(t *testing.T) { + t.Parallel() + + var ( + gfpTranslation = "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" + optimisedGFP = "ATGGCAAGTAAGGGAGAAGAGCTTTTTACCGGCGTAGTACCAATTCTGGTAGAACTGGATGGTGATGTAAACGGTCACAAATTTAGTGTAAGCGGAGAAGGTGAGGGTGATGCTACCTATGGCAAACTGACCCTAAAGTTTATATGCACGACTGGAAAACTTCCGGTACCGTGGCCAACGTTAGTTACAACGTTTTCTTATGGAGTACAGTGCTTCAGCCGCTACCCAGATCATATGAAACGCCATGATTTCTTTAAGAGCGCCATGCCAGAGGGTTATGTTCAGGAGCGCACGATCTCGTTTAAGGATGATGGTAACTATAAGACTCGTGCTGAGGTGAAGTTCGAAGGCGATACCCTTGTAAATCGTATTGAATTGAAGGGTATAGACTTCAAGGAGGATGGAAATATTCTTGGACATAAGCTGGAATACAATTACAATTCACATAACGTTTATATAACTGCCGACAAGCAAAAAAACGGGATAAAAGCTAATTTTAAAATACGCCACAACATAGAGGACGGGTCGGTGCAACTAGCCGATCATTATCAACAAAACACACCAATCGGCGACGGACCAGTTCTGTTGCCCGATAATCATTACTTATCAACCCAAAGTGCCTTAAGTAAGGATCCGAACGAAAAGCGCGATCATATGGTACTTCTTGAGTTTGTTACCGCTGCAGGCATAACGCATGGCATGGACGAGCTATACAAATAA" + puc19 = func() genbank.Genbank { + file, _ := os.Open("../../bio/genbank/data/puc19.gbk") + defer file.Close() + parser, _ := bio.NewGenbankParser(file) + sequence, _ := parser.Next() + return *sequence + }() + ) + + tests := []struct { + name string + + sequenceToOptimise string + updateWeightsWith genbank.Genbank + wantOptimised string + + wantUpdateWeightsErr error + wantOptimiseErr error + }{ + { + name: "ok", + + sequenceToOptimise: gfpTranslation, + updateWeightsWith: puc19, + wantOptimised: optimisedGFP, + + wantUpdateWeightsErr: nil, + wantOptimiseErr: nil, + }, + { + name: "giving no sequence to optimise", - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() + sequenceToOptimise: "", + updateWeightsWith: puc19, + wantOptimised: "", - // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable := codonTable.OptimizeTable(codingRegions) + wantUpdateWeightsErr: nil, + wantOptimiseErr: errEmptyAminoAcidString, + }, + { + name: "updating weights with a sequence with no CDS", - optimizedSequence, _ := Optimize(gfpTranslation, optimizationTable) - optimizedSequenceTranslation, _ := Translate(optimizedSequence, optimizationTable) + sequenceToOptimise: "", + updateWeightsWith: genbank.Genbank{}, + wantOptimised: "", - if optimizedSequenceTranslation != strings.ToUpper(gfpTranslation) { - t.Errorf("TestOptimize has failed. Translate has returned %q, want %q", optimizedSequenceTranslation, gfpTranslation) + wantUpdateWeightsErr: errNoCodingRegions, + wantOptimiseErr: errEmptyAminoAcidString, + }, + } + + for _, tt := range tests { + var tt = tt + t.Run(tt.name, func(t *testing.T) { + t.Parallel() + + optimizationTable := NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(tt.updateWeightsWith) + if !errors.Is(err, tt.wantUpdateWeightsErr) { + t.Errorf("got %v, want %v", err, tt.wantUpdateWeightsErr) + } + + got, err := optimizationTable.Optimize(tt.sequenceToOptimise, 1) + if !errors.Is(err, tt.wantOptimiseErr) { + t.Errorf("got %v, want %v", err, tt.wantOptimiseErr) + } + + if !cmp.Equal(got, tt.wantOptimised) { + t.Errorf("got and tt.wantOptimised didn't match %s", cmp.Diff(got, tt.wantOptimised)) + } + }) + } +} + +func TestNewAminoAcidChooser(t *testing.T) { + var ( + mockError = errors.New("new chooser rigged to fail") + ) + + tests := []struct { + name string + + aminoAcids []AminoAcid + + chooserFn func(choices ...weightedRand.Choice) (*weightedRand.Chooser, error) + + wantErr error + }{ + { + name: "ok", + + aminoAcids: []AminoAcid{ + { + Letter: "R", + Codons: []Codon{ + { + Triplet: "CGU", + Weight: 1, + }, + }, + }, + }, + + chooserFn: weightedRand.NewChooser, + + wantErr: nil, + }, + { + name: "chooser fn constructor error", + + aminoAcids: []AminoAcid{ + { + Letter: "R", + Codons: []Codon{ + { + Triplet: "CGU", + Weight: 1, + }, + }, + }, + }, + + chooserFn: func(choices ...weightedRand.Choice) (*weightedRand.Chooser, error) { + return nil, mockError + }, + + wantErr: mockError, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + // replace chooser fn with test one + newChooserFn = tt.chooserFn + + defer func() { + newChooserFn = weightedRand.NewChooser + }() + + _, err := newAminoAcidChoosers(tt.aminoAcids) + if !errors.Is(err, tt.wantErr) { + t.Errorf("got %v, want %v", err, tt.wantErr) + } + }) + } +} + +func TestUpdateWeights(t *testing.T) { + var ( + mockError = errors.New("new chooser rigged to fail") + ) + + tests := []struct { + name string + + aminoAcids []AminoAcid + + chooserFn func(choices ...weightedRand.Choice) (*weightedRand.Chooser, error) + + wantErr error + }{ + { + name: "ok", + + aminoAcids: []AminoAcid{ + { + Letter: "R", + Codons: []Codon{ + { + Triplet: "CGU", + Weight: 1, + }, + }, + }, + }, + + chooserFn: weightedRand.NewChooser, + + wantErr: nil, + }, + { + name: "chooser fn constructor error", + + aminoAcids: []AminoAcid{ + { + Letter: "R", + Codons: []Codon{ + { + Triplet: "CGU", + Weight: 1, + }, + }, + }, + }, + + chooserFn: func(choices ...weightedRand.Choice) (*weightedRand.Chooser, error) { + return nil, mockError + }, + + wantErr: mockError, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + // replace chooser fn with test one + newChooserFn = tt.chooserFn + + defer func() { + newChooserFn = weightedRand.NewChooser + }() + + optimizationTable := NewTranslationTable(11) + + err := optimizationTable.UpdateWeights(tt.aminoAcids) + if !errors.Is(err, tt.wantErr) { + t.Errorf("got %v, want %v", err, tt.wantErr) + } + }) } } diff --git a/synthesis/codon/example_test.go b/synthesis/codon/example_test.go index 213ce65..5896fcc 100644 --- a/synthesis/codon/example_test.go +++ b/synthesis/codon/example_test.go @@ -3,7 +3,6 @@ package codon_test import ( "fmt" "os" - "strings" "github.com/TimothyStiles/poly/bio" "github.com/TimothyStiles/poly/synthesis/codon" @@ -15,63 +14,84 @@ const phix174path = "../../bio/genbank/data/phix174.gb" func ExampleTranslate() { gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" gfpDnaSequence := "ATGGCTAGCAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCTACATACGGAAAGCTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTCTCTTATGGTGTTCAATGCTTTTCCCGTTATCCGGATCATATGAAACGGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTATGTACAGGAACGCACTATATCTTTCAAAGATGACGGGAACTACAAGACGCGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATCGTATCGAGTTAAAAGGTATTGATTTTAAAGAAGATGGAAACATTCTCGGACACAAACTCGAGTACAACTATAACTCACACAATGTATACATCACGGCAGACAAACAAAAGAATGGAATCAAAGCTAACTTCAAAATTCGCCACAACATTGAAGATGGATCCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTACCTGTCGACACAATCTGCCCTTTCGAAAGATCCCAACGAAAAGCGTGACCACATGGTCCTTCTTGAGTTTGTAACTGCTGCTGGGATTACACATGGCATGGATGAGCTCTACAAATAA" - testTranslation, _ := codon.Translate(gfpDnaSequence, codon.GetCodonTable(11)) // need to specify which codons map to which amino acids per NCBI table + testTranslation, _ := codon.NewTranslationTable(11).Translate(gfpDnaSequence) // need to specify which codons map to which amino acids per NCBI table fmt.Println(gfpTranslation == testTranslation) // output: true } -func ExampleOptimize() { +func ExampleTranslationTable_UpdateWeights() { + gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" + sequenceWithCustomWeights := "ATGGCAAGTAAGGGAGAAGAGCTTTTTACCGGCGTAGTACCAATTCTGGTAGAACTGGATGGTGATGTAAACGGTCACAAATTTAGTGTAAGCGGAGAAGGTGAGGGTGATGCTACCTATGGCAAACTGACCCTAAAGTTTATATGCACGACTGGAAAACTTCCGGTACCGTGGCCAACGTTAGTTACAACGTTTTCTTATGGAGTACAGTGCTTCAGCCGCTACCCAGATCATATGAAACGCCATGATTTCTTTAAGAGCGCCATGCCAGAGGGTTATGTTCAGGAGCGCACGATCTCGTTTAAGGATGATGGTAACTATAAGACTCGTGCTGAGGTGAAGTTCGAAGGCGATACCCTTGTAAATCGTATTGAATTGAAGGGTATAGACTTCAAGGAGGATGGAAATATTCTTGGACATAAGCTGGAATACAATTACAATTCACATAACGTTTATATAACTGCCGACAAGCAAAAAAACGGGATAAAAGCTAATTTTAAAATACGCCACAACATAGAGGACGGGTCGGTGCAACTAGCCGATCATTATCAACAAAACACACCAATCGGCGACGGACCAGTTCTGTTGCCCGATAATCATTACTTATCAACCCAAAGTGCCTTAAGTAAGGATCCGAACGAAAAGCGCGATCATATGGTACTTCTTGAGTTTGTTACCGCTGCAGGCATAACGCATGGCATGGACGAGCTATACAAATAA" + + table := codon.NewTranslationTable(11) + + // this example is using custom weights for different codons for Arginine. Use this if you would rather use your own + // codon weights, they can also be computed for you with `UpdateWeightsWithSequence`. + + err := table.UpdateWeights([]codon.AminoAcid{ + { + Letter: "R", + Codons: []codon.Codon{ + { + Triplet: "CGU", + Weight: 1, + }, + { + Triplet: "CGA", + Weight: 2, + }, + { + Triplet: "CGG", + Weight: 4, + }, + { + Triplet: "AGA", + Weight: 6, + }, + { + Triplet: "AGG", + Weight: 2, + }, + }, + }, + }) + if err != nil { + fmt.Println("Could not update weights in example") + } + + optimizedSequence, _ := table.Optimize(gfpTranslation, 1) + + fmt.Println(optimizedSequence == sequenceWithCustomWeights) + // output: true +} + +func ExampleTranslationTable_Optimize() { gfpTranslation := "MASKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK*" file, _ := os.Open(puc19path) defer file.Close() parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - codonTable := codon.GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder - - // initiate genes - genes := 0 - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - // Note: sometimes, genbank files will have annotated CDSs that are pseudo genes (not having triplet codons). - // This will shift the entire codon table, messing up the end results. To fix this, make sure to do a modulo - // check. - if len(sequence)%3 == 0 { - codingRegionsBuilder.WriteString(sequence) - - // Another good double check is to count genes, then count stop codons. - genes++ - } - } - } - - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() - - // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable := codonTable.OptimizeTable(codingRegions) + codonTable := codon.NewTranslationTable(11) + _ = codonTable.UpdateWeightsWithSequence(*sequence) // Here, we double check if the number of genes is equal to the number of stop codons stopCodonCount := 0 - for _, aa := range optimizationTable.GetAminoAcids() { + for _, aa := range codonTable.AminoAcids { if aa.Letter == "*" { for _, codon := range aa.Codons { stopCodonCount = stopCodonCount + codon.Weight } } } - if stopCodonCount != genes { + + if stopCodonCount != codonTable.Stats.GeneCount { fmt.Println("Stop codons don't equal number of genes!") } - optimizedSequence, _ := codon.Optimize(gfpTranslation, optimizationTable) - optimizedSequenceTranslation, _ := codon.Translate(optimizedSequence, optimizationTable) + optimizedSequence, _ := codonTable.Optimize(gfpTranslation) + optimizedSequenceTranslation, _ := codonTable.Translate(optimizedSequence) fmt.Println(optimizedSequenceTranslation == gfpTranslation) // output: true @@ -80,7 +100,7 @@ func ExampleOptimize() { func ExampleReadCodonJSON() { codontable := codon.ReadCodonJSON("../../data/bsub_codon_test.json") - fmt.Println(codontable.GetAminoAcids()[0].Codons[0].Weight) + fmt.Println(codontable.GetWeightedAminoAcids()[0].Codons[0].Weight) //output: 28327 } @@ -88,7 +108,7 @@ func ExampleParseCodonJSON() { file, _ := os.ReadFile("../../data/bsub_codon_test.json") codontable := codon.ParseCodonJSON(file) - fmt.Println(codontable.GetAminoAcids()[0].Codons[0].Weight) + fmt.Println(codontable.GetWeightedAminoAcids()[0].Codons[0].Weight) //output: 28327 } @@ -100,7 +120,7 @@ func ExampleWriteCodonJSON() { // cleaning up test data os.Remove("../../data/codon_test.json") - fmt.Println(testCodonTable.GetAminoAcids()[0].Codons[0].Weight) + fmt.Println(testCodonTable.GetWeightedAminoAcids()[0].Codons[0].Weight) //output: 28327 } @@ -109,50 +129,27 @@ func ExampleCompromiseCodonTable() { defer file.Close() parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - codonTable := codon.GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder - - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder.WriteString(sequence) - } - } - - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable := codonTable.OptimizeTable(codingRegions) + optimizationTable := codon.NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(*sequence) + if err != nil { + panic(fmt.Errorf("got unexpected error in an example: %w", err)) + } file2, _ := os.Open(phix174path) defer file2.Close() parser2, _ := bio.NewGenbankParser(file2) sequence2, _ := parser2.Next() - codonTable2 := codon.GetCodonTable(11) - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder2 strings.Builder - - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence2.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder2.WriteString(sequence) - } + optimizationTable2 := codon.NewTranslationTable(11) + err = optimizationTable2.UpdateWeightsWithSequence(*sequence2) + if err != nil { + panic(fmt.Errorf("got unexpected error in an example: %w", err)) } - // get the concatenated sequence string of the coding regions - codingRegions2 := codingRegionsBuilder2.String() - - // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable2 := codonTable2.OptimizeTable(codingRegions2) - finalTable, _ := codon.CompromiseCodonTable(optimizationTable, optimizationTable2, 0.1) - for _, aa := range finalTable.GetAminoAcids() { + for _, aa := range finalTable.GetWeightedAminoAcids() { for _, codon := range aa.Codons { if codon.Triplet == "TAA" { fmt.Println(codon.Weight) @@ -167,50 +164,31 @@ func ExampleAddCodonTable() { defer file.Close() parser, _ := bio.NewGenbankParser(file) sequence, _ := parser.Next() - codonTable := codon.GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder strings.Builder - - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder.WriteString(sequence) - } - } - - // get the concatenated sequence string of the coding regions - codingRegions := codingRegionsBuilder.String() // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable := codonTable.OptimizeTable(codingRegions) + optimizationTable := codon.NewTranslationTable(11) + err := optimizationTable.UpdateWeightsWithSequence(*sequence) + if err != nil { + panic(fmt.Errorf("got unexpected error in an example: %w", err)) + } file2, _ := os.Open(phix174path) defer file2.Close() parser2, _ := bio.NewGenbankParser(file2) sequence2, _ := parser2.Next() - codonTable2 := codon.GetCodonTable(11) - - // a string builder to build a single concatenated string of all coding regions - var codingRegionsBuilder2 strings.Builder - // iterate through the features of the genbank file and if the feature is a coding region, append the sequence to the string builder - for _, feature := range sequence2.Features { - if feature.Type == "CDS" { - sequence, _ := feature.GetSequence() - codingRegionsBuilder2.WriteString(sequence) - } + optimizationTable2 := codon.NewTranslationTable(11) + err = optimizationTable2.UpdateWeightsWithSequence(*sequence2) + if err != nil { + panic(fmt.Errorf("got unexpected error in an example: %w", err)) } - // get the concatenated sequence string of the coding regions - codingRegions2 := codingRegionsBuilder2.String() - - // weight our codon optimization table using the regions we collected from the genbank file above - optimizationTable2 := codonTable2.OptimizeTable(codingRegions2) + finalTable, err := codon.AddCodonTable(optimizationTable, optimizationTable2) + if err != nil { + panic(fmt.Errorf("got error in adding codon table example: %w", err)) + } - finalTable := codon.AddCodonTable(optimizationTable, optimizationTable2) - for _, aa := range finalTable.GetAminoAcids() { + for _, aa := range finalTable.AminoAcids { for _, codon := range aa.Codons { if codon.Triplet == "GGC" { fmt.Println(codon.Weight) diff --git a/synthesis/fix/synthesis.go b/synthesis/fix/synthesis.go index 4aa5076..ad4e166 100644 --- a/synthesis/fix/synthesis.go +++ b/synthesis/fix/synthesis.go @@ -238,7 +238,7 @@ func Cds(sequence string, codontable codon.Table, problematicSequenceFuncs []fun // Build historical maps and full amino acid weights aminoAcidWeightTable := make(map[string]int) - for _, aminoAcid := range codontable.GetAminoAcids() { + for _, aminoAcid := range codontable.GetWeightedAminoAcids() { var aminoAcidTotal int for _, codon := range aminoAcid.Codons { // Get the total weights of all the codons for a given amino acid. @@ -271,7 +271,7 @@ func Cds(sequence string, codontable codon.Table, problematicSequenceFuncs []fun // Build weight map. The weight map gives the relative normalized weight of // any given codon triplet. - for _, aminoAcid := range codontable.GetAminoAcids() { + for _, aminoAcid := range codontable.GetWeightedAminoAcids() { for _, codon := range aminoAcid.Codons { codonWeightRatio := float64(codon.Weight) / float64(aminoAcidWeightTable[aminoAcid.Letter]) normalizedCodonWeight := 100 * codonWeightRatio diff --git a/synthesis/fix/synthesis_test.go b/synthesis/fix/synthesis_test.go index 9e274d9..e86deb8 100644 --- a/synthesis/fix/synthesis_test.go +++ b/synthesis/fix/synthesis_test.go @@ -40,7 +40,7 @@ func BenchmarkCds(b *testing.B) { var functions []func(string, chan DnaSuggestion, *sync.WaitGroup) functions = append(functions, RemoveSequence([]string{"GAAGAC", "GGTCTC", "GCGATG", "CGTCTC", "GCTCTTC", "CACCTGC"}, "TypeIIS restriction enzyme site.")) for i := 0; i < b.N; i++ { - seq, _ := codon.Optimize(phusion, codonTable) + seq, _ := codonTable.Optimize(phusion) optimizedSeq, changes, err := Cds(seq, codonTable, functions) if err != nil { b.Errorf("Failed to fix phusion with error: %s", err) @@ -76,7 +76,7 @@ func TestCds(t *testing.T) { phusion := "MGHHHHHHHHHHSSGILDVDYITEEGKPVIRLFKKENGKFKIEHDRTFRPYIYALLRDDSKIEEVKKITGERHGKIVRIVDVEKVEKKFLGKPITVWKLYLEHPQDVPTIREKVREHPAVVDIFEYDIPFAKRYLIDKGLIPMEGEEELKILAFDIETLYHEGEEFGKGPIIMISYADENEAKVITWKNIDLPYVEVVSSEREMIKRFLRIIREKDPDIIVTYNGDSFDFPYLAKRAEKLGIKLTIGRDGSEPKMQRIGDMTAVEVKGRIHFDLYHVITRTINLPTYTLEAVYEAIFGKPKEKVYADEIAKAWESGENLERVAKYSMEDAKATYELGKEFLPMEIQLSRLVGQPLWDVSRSSTGNLVEWFLLRKAYERNEVAPNKPSEEEYQRRLRESYTGGFVKEPEKGLWENIVYLDFRALYPSIIITHNVSPDTLNLEGCKNYDIAPQVGHKFCKDIPGFIPSLLGHLLEERQKIKTKMKETQDPIEKILLDYRQKAIKLLANSFYGYYGYAKARWYCKECAESVTAWGRKYIELVWKELEEKFGFKVLYIDTDGLYATIPGGESEEIKKKALEFVKYINSKLPGLLELEYEGFYKRGFFVTKKRYAVIDEEGKVITRGLEIVRRDWSEIAKETQARVLETILKHGDVEEAVRIVKEVIQKLANYEIPPEKLAIYEQITRPLHEYKAIGPHVAVAKKLAAKGVKIKPGMVIGYIVLRGDGPISNRAILAEEYDPKKHKYDAEYYIENQVLPAVLRILEGFGYRKEDLRYQKTRQVGLTSWLNIKKSGTGGGGATVKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQMLEKQKK*" var functions []func(string, chan DnaSuggestion, *sync.WaitGroup) functions = append(functions, RemoveSequence([]string{"GAAGAC", "GGTCTC", "GCGATG", "CGTCTC", "GCTCTTC", "CACCTGC"}, "TypeIIS restriction enzyme site.")) - seq, _ := codon.Optimize(phusion, codonTable) + seq, _ := codonTable.Optimize(phusion) optimizedSeq, _, err := Cds(seq, codonTable, functions) if err != nil { t.Errorf("Failed with error: %s", err) diff --git a/synthesis/fragment/fragment.go b/synthesis/fragment/fragment.go index 23a59b2..ab01d06 100644 --- a/synthesis/fragment/fragment.go +++ b/synthesis/fragment/fragment.go @@ -98,11 +98,11 @@ func NextOverhang(currentOverhangs []string) string { } // optimizeOverhangIteration takes in a sequence and optimally fragments it. -func optimizeOverhangIteration(sequence string, minFragmentSize int, maxFragmentSize int, existingFragments []string, existingOverhangs []string) ([]string, float64, error) { +func optimizeOverhangIteration(sequence string, minFragmentSize int, maxFragmentSize int, existingFragments []string, excludeOverhangs []string, includeOverhangs []string) ([]string, float64, error) { // If the sequence is smaller than maxFragment size, stop iteration. if len(sequence) < maxFragmentSize { existingFragments = append(existingFragments, sequence) - return existingFragments, SetEfficiency(existingOverhangs), nil + return existingFragments, SetEfficiency(excludeOverhangs), nil } // Make sure minFragmentSize > maxFragmentSize @@ -136,6 +136,7 @@ func optimizeOverhangIteration(sequence string, minFragmentSize int, maxFragment var bestOverhangEfficiency float64 var bestOverhangPosition int var alreadyExists bool + var buildAvailable bool for overhangOffset := 0; overhangOffset <= maxFragmentSize-minFragmentSize; overhangOffset++ { // We go from max -> min, so we can maximize the size of our fragments overhangPosition := maxFragmentSize - overhangOffset @@ -143,16 +144,27 @@ func optimizeOverhangIteration(sequence string, minFragmentSize int, maxFragment // Make sure overhang isn't already in set alreadyExists = false - for _, existingOverhang := range existingOverhangs { - if existingOverhang == overhangToTest || transform.ReverseComplement(existingOverhang) == overhangToTest { + for _, excludeOverhang := range excludeOverhangs { + if excludeOverhang == overhangToTest || transform.ReverseComplement(excludeOverhang) == overhangToTest { alreadyExists = true } } - if !alreadyExists { + // Make sure overhang is in set of includeOverhangs. If includeOverhangs is + // blank, skip this check. + buildAvailable = false + if len(includeOverhangs) == 0 { + buildAvailable = true + } + for _, includeOverhang := range includeOverhangs { + if includeOverhang == overhangToTest || transform.ReverseComplement(includeOverhang) == overhangToTest { + buildAvailable = true + } + } + if !alreadyExists && buildAvailable { // See if this overhang is a palindrome if !checks.IsPalindromic(overhangToTest) { // Get this overhang set's efficiency - setEfficiency := SetEfficiency(append(existingOverhangs, overhangToTest)) + setEfficiency := SetEfficiency(append(excludeOverhangs, overhangToTest)) // If this overhang is more efficient than any other found so far, set it as the best! if setEfficiency > bestOverhangEfficiency { @@ -167,16 +179,24 @@ func optimizeOverhangIteration(sequence string, minFragmentSize int, maxFragment return []string{}, float64(0), fmt.Errorf("bestOverhangPosition failed by equaling zero") } existingFragments = append(existingFragments, sequence[:bestOverhangPosition]) - existingOverhangs = append(existingOverhangs, sequence[bestOverhangPosition-4:bestOverhangPosition]) + excludeOverhangs = append(excludeOverhangs, sequence[bestOverhangPosition-4:bestOverhangPosition]) sequence = sequence[bestOverhangPosition-4:] - return optimizeOverhangIteration(sequence, minFragmentSize, maxFragmentSize, existingFragments, existingOverhangs) + return optimizeOverhangIteration(sequence, minFragmentSize, maxFragmentSize, existingFragments, excludeOverhangs, includeOverhangs) } // Fragment fragments a sequence into fragments between the min and max size, // choosing fragment ends for optimal assembly efficiency. Since fragments will // be inserted into either a vector or primer binding sites, the first 4 and // last 4 base pairs are the initial overhang set. -func Fragment(sequence string, minFragmentSize int, maxFragmentSize int, existingOverhangs []string) ([]string, float64, error) { +func Fragment(sequence string, minFragmentSize int, maxFragmentSize int, excludeOverhangs []string) ([]string, float64, error) { + sequence = strings.ToUpper(sequence) + return optimizeOverhangIteration(sequence, minFragmentSize, maxFragmentSize, []string{}, append([]string{sequence[:4], sequence[len(sequence)-4:]}, excludeOverhangs...), []string{}) +} + +// FragmentWithOverhangs fragments a sequence with only a certain overhang set. +// This is useful if you are constraining the set of possible overhangs when +// doing more advanced forms of cloning. +func FragmentWithOverhangs(sequence string, minFragmentSize int, maxFragmentSize int, excludeOverhangs []string, includeOverhangs []string) ([]string, float64, error) { sequence = strings.ToUpper(sequence) - return optimizeOverhangIteration(sequence, minFragmentSize, maxFragmentSize, []string{}, append([]string{sequence[:4], sequence[len(sequence)-4:]}, existingOverhangs...)) + return optimizeOverhangIteration(sequence, minFragmentSize, maxFragmentSize, []string{}, append([]string{sequence[:4], sequence[len(sequence)-4:]}, excludeOverhangs...), includeOverhangs) } diff --git a/synthesis/fragment/fragment_test.go b/synthesis/fragment/fragment_test.go index 0a71bfd..ab3f0c1 100644 --- a/synthesis/fragment/fragment_test.go +++ b/synthesis/fragment/fragment_test.go @@ -85,3 +85,13 @@ func TestRegressionTestMatching12(t *testing.T) { t.Errorf("Expected efficiency of .99 - approximately matches NEB ligase fidelity viewer of .97. Got: %g", efficiency) } } + +func TestFragmentWithOverhangs(t *testing.T) { + defaultOverhangs := []string{"CGAG", "GTCT", "GGGG", "AAAA", "AACT", "AATG", "ATCC", "CGCT", "TTCT", "AAGC", "ATAG", "ATTA", "ATGT", "ACTC", "ACGA", "TATC", "TAGG", "TACA", "TTAC", "TTGA", "TGGA", "GAAG", "GACC", "GCCG", "TCTG", "GTTG", "GTGC", "TGCC", "CTGG", "TAAA", "TGAG", "AAGA", "AGGT", "TTCG", "ACTA", "TTAG", "TCTC", "TCGG", "ATAA", "ATCA", "TTGC", "CACG", "AATA", "ACAA", "ATGG", "TATG", "AAAT", "TCAC"} + gene := "atgaaaaaatttaactggaagaaaatagtcgcgccaattgcaatgctaattattggcttactaggtggtttacttggtgcctttatcctactaacagcagccggggtatcttttaccaatacaacagatactggagtaaaaacggctaagaccgtctacaccaatataacagatacaactaaggctgttaagaaagtacaaaatgccgttgtttctgtcatcaattatcaagaaggttcatcttcagattctctaaatgacctttatggccgtatctttggcggaggggacagttctgattctagccaagaaaattcaaaagattcagatggtctacaggtcgctggtgaaggttctggagtcatctataaaaaagatggcaaagaagcctacatcgtaaccaataaccatgttgtcgatggggctaaaaaacttgaaatcatgctttcggatggttcgaaaattactggtgaacttgttggtaaagacacttactctgacctagcagttgtcaaagtatcttcagataaaataacaactgttgcagaatttgcagactcaaactcccttactgttggtgaaaaagcaattgctatcggtagcccacttggtaccgaatacgccaactcagtaacagaaggaatcgtttctagccttagccgtactataacgatgcaaaacgataatggtgaaactgtatcaacaaacgctatccaaacagatgcagccattaaccctggtaactctggtggtgccctagtcaatattgaaggacaagttatcggtattaattcaagtaaaatttcatcaacgtctgcagtcgctggtagtgctgttgaaggtatggggtttgccattccatcaaacgatgttgttgaaatcatcaatcaattagaaaaagatggtaaagttacacgaccagcactaggaatctcaatagcagatcttaatagcctttctagcagcgcaacttctaaattagatttaccagatgaggtcaaatccggtgttgttgtcggtagtgttcagaaaggtatgccagctgacggtaaacttcaagaatatgatgttatcactgagattgatggtaagaaaatcagctcaaaaactgatattcaaaccaatctttacagccatagtatcggagatactatcaaggtaaccttctatcgtggtaaagataagaaaactgtagatcttaaattaacaaaatctacagaagacatatctgattaa" + + _, _, err := FragmentWithOverhangs(gene, 90, 110, []string{}, defaultOverhangs) + if err != nil { + t.Errorf(err.Error()) + } +}