diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md deleted file mode 100644 index f39b24c..0000000 --- a/.github/ISSUE_TEMPLATE.md +++ /dev/null @@ -1,40 +0,0 @@ - - -> _This template is rather extensive. Fill out all that you can, if are a new contributor or you're unsure about any section, leave it unchanged and a reviewer will help you_ :smile:. _This template is simply a tool to help everyone remember the BioJulia guidelines, if you feel anything in this template is not relevant, simply delete it._ - -## Expected Behavior - - - -## Current Behavior - - - -## Possible Solution / Implementation - - - -## Steps to Reproduce (for bugs) - -1. -2. -3. -4. - - - - - - -## Context - - - -## Your Environment - -- Package Version used: -- Julia Version used: -- Operating System and version (desktop or mobile): -- Link to your project: - - diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md deleted file mode 100644 index 3575d00..0000000 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ /dev/null @@ -1,47 +0,0 @@ -# A clear and descriptive title (No issue numbers please) - -> _This template is rather extensive. Fill out all that you can, if are a new contributor or you're unsure about any section, leave it unchanged and a reviewer will help you_ :smile:. _This template is simply a tool to help everyone remember the BioJulia guidelines, if you feel anything in this template is not relevant, simply delete it._ - -## Types of changes - -This PR implements the following changes: -_(Please tick any or all of the following that are applicable)_ - -* [ ] :sparkles: New feature (A non-breaking change which adds functionality). -* [ ] :bug: Bug fix (A non-breaking change, which fixes an issue). -* [ ] :boom: Breaking change (fix or feature that would cause existing functionality to change). - -## :clipboard: Additional detail - -- If you have implemented new features or behaviour - - **Provide a description of the addition** in as many details as possible. - - - **Provide justification of the addition**. - - - **Provide a runnable example of use of your addition**. This lets reviewers - and others try out the feature before it is merged or makes it's way to release. - -- If you have changed current behaviour... - - **Describe the behaviour prior to you changes** - - - **Describe the behaviour after your changes** and justify why you have made the changes, - Please describe any breakages you anticipate as a result of these changes. - - - **Does your change alter APIs or existing exposed methods/types?** - If so, this may cause dependency issues and breakages, so the maintainer - will need to consider this when versioning the next release. - - - If you are implementing changes that are intended to increase performance, you - should provide the results of a simple performance benchmark exercise - demonstrating the improvement. Especially if the changes make code less legible. - -## :ballot_box_with_check: Checklist - -- [ ] :art: The changes implemented is consistent with the [julia style guide](https://docs.julialang.org/en/stable/manual/style-guide/). -- [ ] :blue_book: I have updated and added relevant docstrings, in a manner consistent with the [documentation styleguide](https://docs.julialang.org/en/stable/manual/documentation/). -- [ ] :blue_book: I have added or updated relevant user and developer manuals/documentation in `docs/src/`. -- [ ] :ok: There are unit tests that cover the code changes I have made. -- [ ] :ok: The unit tests cover my code changes AND they pass. -- [ ] :pencil: I have added an entry to the `[UNRELEASED]` section of the manually curated `CHANGELOG.md` file for this repository. -- [ ] :ok: All changes should be compatible with the latest stable version of Julia. -- [ ] :thought_balloon: I have commented liberally for any complex pieces of internal code. diff --git a/.github/workflows/DocCleanup.yml b/.github/workflows/DocCleanup.yml new file mode 100644 index 0000000..f8e2302 --- /dev/null +++ b/.github/workflows/DocCleanup.yml @@ -0,0 +1,27 @@ +# https://documenter.juliadocs.org/stable/man/hosting/#gh-pages-Branch +name: Doc Preview Cleanup + +on: + pull_request: + types: [closed] + +jobs: + doc-preview-cleanup: + runs-on: ubuntu-latest + steps: + - name: Checkout gh-pages branch + uses: actions/checkout@v2 + with: + ref: gh-pages + - name: Delete preview and history + push changes + run: | + if [ -d "previews/PR$PRNUM" ]; then + git config user.name "Documenter.jl" + git config user.email "documenter@juliadocs.github.io" + git rm -rf "previews/PR$PRNUM" + git commit -m "delete preview" + git branch gh-pages-new $(echo "delete history" | git commit-tree HEAD^{tree}) + git push --force origin gh-pages-new:gh-pages + fi + env: + PRNUM: ${{ github.event.number }} \ No newline at end of file diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml new file mode 100644 index 0000000..5061e81 --- /dev/null +++ b/.github/workflows/documentation.yml @@ -0,0 +1,51 @@ +# Sample workflow for building and deploying a VitePress site to GitHub Pages +# +name: Deploy documentation + +on: + # Runs on pushes targeting the `main` branch. Change this to `master` if you're + # using the `master` branch as the default branch. + push: + branches: + - main + tags: ['*'] + pull_request: + + # Allows you to run this workflow manually from the Actions tab + workflow_dispatch: + +# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages +permissions: + contents: write + pages: write + id-token: write + actions: write + + +# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued. +# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete. +concurrency: + group: pages + cancel-in-progress: false + +jobs: + # Build job + build: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + with: # Fetches the last commit only + fetch-depth: 0 + - name: Setup Julia + uses: julia-actions/setup-julia@v1 + - name: Pull Julia cache + uses: julia-actions/cache@v1 + - name: Deploy + uses: julia-actions/julia-docdeploy@v1 + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # For authentication with GitHub Actions token + DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # For authentication with SSH deploy key + GKSwstype: "100" # https://discourse.julialang.org/t/generation-of-documentation-fails-qt-qpa-xcb-could-not-connect-to-display/60988 + JULIA_DEBUG: "Documenter" + DATADEPS_ALWAYS_ACCEPT: true diff --git a/.github/workflows/pr_comment.yml b/.github/workflows/pr_comment.yml new file mode 100644 index 0000000..c7d29b6 --- /dev/null +++ b/.github/workflows/pr_comment.yml @@ -0,0 +1,14 @@ +name: pr_comment +on: + pull_request: + types: [opened, reopened] +jobs: + pr_comment: + runs-on: ubuntu-latest + steps: + - name: Create PR comment + if: github.event_name == 'pull_request' && github.repository == github.event.pull_request.head.repo.full_name # if this is a pull request build AND the pull request is NOT made from a fork + uses: thollander/actions-comment-pull-request@71efef56b184328c7ef1f213577c3a90edaa4aff + with: + message: 'Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR${{ github.event.number }}/' + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md deleted file mode 100644 index d0b21a7..0000000 --- a/CODE_OF_CONDUCT.md +++ /dev/null @@ -1,119 +0,0 @@ -# Etiquette and conduct in BioJulia - -As you interact with other members of the BioJulia group, or make contributions -you may have revisions and suggestions on your pull request from BioJulia members -or others which they want to be implemented before they will merge your pull request. - -You may also have disagreements with people on the forums or chats maintained by -BioJulia. - -In order to keep BioJulia a civil and enjoyable place, where technical disagreements -and issues can be discussed and resolved in a mature and constructive way, we -outline three principles of etiquette we expect members and contributors to abide by. - -Anybody violating these principles in order to upset any member or contributor -may be flagged to the BioJulia admins who will decide on an appropriate -course of action. This includes locking conversations for cool-off periods, or -even bans of individuals. - -This statement on etiquette is not an exhaustive list of things that you can or can’t do. -Rather, it is a statement of our spirit and attitude towards interacting with each other. - -This statement applies in all spaces managed by the BioJulia organisation. -This includes any gitter, mailing lists, issue trackers, repositories, or any -other forums used by BioJulia for communication (such as Skype, Google Hangouts, etc). -It also applies in real-world events and spaces organised by BioJulia. - -## The principles of etiquette - -### 1. Be welcoming, friendly and patient. - -Be welcoming. We strive to welcome and support any individual participating in -BioJulia activities to any extent (from developing code, to support seeking -users). We have even been known to have a few members on our Gitter who are not -Biologists, but they enjoy the forum, like what we do, and stick around for the -programming chat. All are welcome (yes including _you_! :smile:). - -### 2. Be considerate. - -Your work will be used by other people, and you in turn will depend on the work -of others. From any code you make, to any support questions you ask or answer! -Any decision you take will affect users and colleagues, and you should take -those consequences into account when making decisions. - -Remember that we're a world-wide community, so you might not be communicating -in someone else's primary language. - -### 3. Be respectful. - -Not all of us will agree all the time, but disagreement is no excuse for poor -behaviour and poor manners. We might all experience some frustration now and then, -but we cannot allow that frustration to turn into a personal attack. -It’s important to remember that a community where people feel uncomfortable or -threatened is not a productive or fun community. -Members of the BioJulia community should be respectful when dealing with other -members as well as with people outside the BioJulia community. - -Please do not insult or put down other participants. -Harassment and other exclusionary behaviour is not acceptable. -This includes, but is not limited to: - - Violent threats or language directed against another person. - - Prejudiced, bigoted, or intolerant, jokes and language. - - Posting sexually explicit or violent material. - - Posting (or threatening to post) other people's personally identifying - information ("doxing"). - - Personal insults, especially those using racist or sexist terms. - - Unwelcome sexual attention. - - Advocating for, or encouraging, any of the above behaviour. - - Repeated harassment of others. In general, if someone asks you to stop, - then stop. - -When we disagree, try to understand why. -Disagreements, both social and technical, happen all the time and this -community is unlikely to be any exception! -It is important that we resolve disagreements and differing views constructively. -Different people have different perspectives on issues. -Being unable to understand why someone holds a viewpoint doesn’t mean that -they’re wrong. -Don’t forget that it is human to err and blaming each other doesn’t get us -anywhere. -Instead, focus on helping to resolve issues and learning from mistakes. - -Assume the person you have a disagreement with really does want the best for -BioJulia, just as you do. -Therefore, if you are ever unsure what the meaning or tone of a comment may be, -in the first instance, assume your fellow BioJulia member is acting in good -faith, this may well be a mistake in communication -(with the scientific community as diverse as it is, such mis-steps are likely!). -If you are comfortable doing so, ask them to clarify what they mean or to rephrase -their point. If you don't feel comfortable doing this, or if it is clear the -behaviour is hostile and not acceptable, please report it (see next section). - -## Is someone behaving inappropriately? - -If you are affected by the behaviour of a member or contributor of BioJulia, -we ask that you report it by contacting the -[BioJulia Admin Team](https://github.com/orgs/BioJulia/teams/admin/members) -collectively, by emailing [admin@biojulia.net](admin@biojulia.net). -They will get back to you and begin to resolve the situation. -In some cases we may determine that a public statement will need to be made. -If that's the case, the identities of all involved will remain -confidential unless those individuals instruct us otherwise. - -Ensure to include in your email: - -- Your contact info (so we can get in touch with you if we need to follow up). - -- Names (real, nicknames, or pseudonyms) of any individuals involved. - If there were other witnesses besides you, please try to include them as well. - -- When and where the incident occurred. Please be as specific as possible. - -- Your account of what occurred. If there is a publicly available record - (e.g. a mailing list archive or a public IRC logger) please include a link. - -- Any extra context you believe existed for the incident. - -- If you believe this incident is ongoing. - -- Any other information you believe we should have.s diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index a69d759..0000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1,731 +0,0 @@ -# Contributing to BioJulia - -:+1::tada: First off, thanks for taking the time to contribute! :tada::+1: - -The following is a set of guidelines for contributing to BioJulia repositories, -which are hosted in the [BioJulia Organization](https://github.com/BioJulia) on -GitHub. - -These are mostly guidelines, not rules. -Use your best judgment, and feel free to propose changes to this document in a -pull request. - -## Table of contents - -[I don't want to read this whole thing, I just have a question!!!](#i-dont-want-to-read-this-whole-thing-i-just-have-a-question) - -[What should I know about BioJulia before I get started?](#what-should-i-know-about-biojulia-before-i-get-started) - - [BioJulia Package Maintainers](#biojulia-package-maintainers) - - [BioJulia Administrators](#biojulia-administrators) - - [Etiquette and conduct](#etiquette-and-conduct) - - [Package Conventions](#package-conventions) - -[How Can I Contribute?](#how-can-i-contribute) - - [Reporting Bugs](#reporting-bugs) - - [Suggesting an Enhancement](#suggest-an-enhancement) - - [Making Pull Requests](#pull-requests) - - [Become a BioJulia package maintainer](#become-a-biojulia-package-maintainer) - -[Styleguides](#styleguides) - - [Git Commit Messages](#git-commit-messages) - - [Additional julia style suggestions](#additional-julia-style-suggestions) - - [Documentation Styleguide](#documentation-styleguide) - -[Additional notes](#additional-notes) - - [A suggested branching model](#a-suggested-branching-model) - -## I don't want to read this whole thing I just have a question!!! - -We understand you are excited to get involved already! -But please don't file an issue to ask a question. -You'll get faster results by using the resources below. - -We have a Gitter message chat room where the community -chimes in with helpful advice if you have questions. -If you just have a question, or a problem that is not covered by this guide, -then come on over to the Gitter and we'll be happy to help. - -* [Gitter, BioJulia message board](https://gitter.im/BioJulia/Bio.jl) - -## What should I know about BioJulia **BEFORE** I get started? - -### BioJulia Package Maintainers - -In order to provide the best possible experience for new and existing users of -Julia from the life-sciences, a little bit of structure and organization is -necessary. - -Each package is dedicated to introducing a specific data type or algorithm, or -dealing with a specific biological problem or pipeline. - -Whilst there are some "meta-packages" such as Bio.jl, which bundle individual -packages together for convenience of installation and use, most of the BioJulia -software ecosystem is quite decentralized. - -Therefore, it made sense that maintenance of the packages should also be -fairly decentralized, to achieve this, we created the role of a "Package -Maintainer". - -The maintainer(s) for a given package are listed in the packages README.md file. - -The maintainers of a package are responsible for the following aspects of the -package they maintain. - -1. Deciding the branching model used and how branches are protected. -2. Reviewing pull requests, and issues for that package. -3. To tag releases of a package at suitable points in the lifetime of the package. -4. To be considerate and of assistance to new contributors, new community members and new maintainers. -5. To report potential incidents of antisocial to a BioJulia admin member. - -**See [HERE](#additional-notes) for extra -guidance and suggestions on branching models and tagging releases.** - -Package maintainers hold **admin** level access for any package(s) for which they -are listed as maintainer, and so new contributors to BioJulia should -rest assured they will not be 'giving up' any package they transfer to BioJulia, -they shall remain that package's administrator. Package maintainers also have -**push** (but not **admin**) access to all other code packages in the BioJulia -ecosystem. - -This allows for a community spirit where maintainers who are dedicated primarily -to other packages may step in to help other maintainers to resolve a PR or issue. -As such, newer maintainers and researchers contributing a package to the BioJulia -ecosystem can rest assured help will always be at hand from our community. - -However, if you are a maintainer stepping in to help the maintainer(s) dedicated -to another package, please respect them by first offering to step in and help, -before changing anything. Secondly, ask them before doing -advanced and potentially destructive git operations e.g forcing pushes to -branches (especially master), or re-writing history of branches. -Please defer to the judgement of the maintainers dedicated in the README of the -package. - -### BioJulia Administrators - -BioJulia has a select group of members in an Admin team. -This team has administrative access to all repositories in the BioJulia project. - -The admin team is expected to: - -1. Respond and resolve any disputes between any two BioJulia contributors. -2. Act as mentors to all other BioJulia maintainers. -3. Assist maintainers in the upkeep of packages when requested. Especially when - more difficult re-bases and history manipulation are required. -4. Some administrators maintain the BioJulia infrastructure. - This includes being responsible for the accounts and billing of any - platforms used by BioJulia, and the maintenance of any hardware like - servers owned and used by BioJulia. - -### Etiquette and conduct - -BioJulia outlines a [statement of etiquette and conduct](CODE_OF_CONDUCT.md) -that all members and contributors are expected to uphold. Please take the time -to read and understand this statement. - -### Package conventions - -First, be familiar with the official julia documentation on: - -* [Packages](https://docs.julialang.org/en/stable/manual/packages/) -* [Package Development](https://docs.julialang.org/en/stable/manual/packages/#Package-Development-1) -* [Modules](https://docs.julialang.org/en/stable/manual/modules/) - -Package names should be a simple and self explanatory as possible, avoiding -unneeded acronyms. - -Packages introducing some key type or method/algorithm should be named -accordingly. - -For example, the BioJulia package introducing biological sequence types and -functionality to process sequence data is called "BioSequences". -GitHub repository names of BioJulia packages should end in `.jl`, even though -the package name itself does not. -i.e. "BioSequences" is the name of the package, and the name of its GitHub -repository is "BioSequences.jl". - -Considerate and simple naming greatly assists people in finding the kind of -package or functionality they are looking for. - -File names of files containing julia code in packages should end in `.jl`. - -All user facing types and functions (i.e. all types and functions -exported from the module of a package), should be documented. -Documentation regarding specific implementation details that aren't relevant -to users should be in the form of comments. Please *DO* comment liberally for -complex pieces of code! - -We use [Documenter.jl](https://github.com/JuliaDocs/Documenter.jl), -to generate user and developer documentation and host it on the web. -The source markdown files for such manuals is kept in the `docs/src/` -folder of each BioJulia package/repository. - -The code in all BioJulia packages is unit tested. Such tests should be -organized into contexts, and into separate files based on module. - -Files for tests for a module go into an appropriately named folder, within the -`test` folder in the git repo. - -## How can I contribute? - -### Reporting Bugs - -Here we show you how to submit a bug report for a BioJulia repository. -If you follow the advice here, BioJulia maintainers and the community will -better understand your report :pencil:, be able to reproduce the behaviour -:computer: :computer:, and identify related problems :mag_right:. - -#### Before creating a bug report: - -Please do the following: - -1. Check the GitHub issue list for the package that is giving you problems. - -2. If you find an issue already open for your problem, add a comment to let - everyone know that you are experiencing the same issue. - -3. If no **currently open** issue already exists for your problem that has already been - then you should create a new issue. - - > **Note:** If you find a **Closed** issue that seems like it is the same thing - > that you're experiencing, open a new issue and include a link to the original - > issue in the body of your new one. - -#### How to create a (good) new bug report: - -Bugs are tracked as [GitHub issues](https://guides.github.com/features/issues/). -After you've determined [which repository](https://github.com/BioJulia) -your bug is related to, create an issue on that repository and provide the -following information by filling in [template](.github/ISSUE_TEMPLATE.md). -This template will help you to follow the guidance below. - -When you are creating a bug report, please do the following: - -1. **Explain the problem** - - - *Use a clear and descriptive title* for the issue to identify the problem. - - *Describe the exact steps which reproduce the problem* in as many details as possible. - - Which function / method exactly you used? - - What arguments or parameters were used? - - *Provide a specific example*. (Includes links to pastebin, gists and so on.) - If you're providing snippets in the issue, use - [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines). - - - *Describe the behaviour you observed after following the steps* - - Point out what exactly is the problem with that behaviour. - - *Explain which behaviour you expected to see instead and why.* - - *OPTIONALLY: Include screenshots and animated GIFs* which show you - following the described steps and clearly demonstrate the problem. - You can use [this tool](https://www.cockos.com/licecap/) to record GIFs on - macOS and Windows, or [this tool](https://github.com/colinkeenan/silentcast) - or [this tool](https://github.com/GNOME/byzanz) on Linux. - -2. **Provide additional context for the problem (some of these may not always apply)** - - - *Did the problem start happening recently* (e.g. after updating to a new version)? - - If the problem started recently, *can you reproduce the problem in older versions?* - - Do you know the most recent package version in which the problem doesn't happen? - - - *Can you reliably reproduce the issue?* If not... - - Provide details about how often the problem happens. - - Provide details about under which conditions it normally happens. - - - Is the problem is related to *working with files*? If so.... - - Does the problem happen for all files and projects or only some? - - Does the problem happen only when working with local or remote files? - - Does the problem happen for files of a specific type, size, or encoding? - - Is there anything else special about the files you are using? - -3. **Include details about your configuration and environment** - -- *Which version of the package are you using?* - -- *What's the name and version of the OS you're using?* - -- *Which julia packages do you have installed?* - -- Are you using local configuration files to customize julia behaviour? If so... - - Please provide the contents of those files, preferably in a - [code block](https://help.github.com/articles/markdown-basics/#multiple-lines) - or with a link to a [gist](https://gist.github.com/). - -*Note: All of the above guidance is included in the [template](.github/ISSUE_TEMPLATE.md) for your convenience.* - -### Suggest an Enhancement - -This section explains how to submit an enhancement proposal for a BioJulia -package. This includes completely new features, as well as minor improvements to -existing functionality. -Following these suggestions will help maintainers and the community understand -your suggestion :pencil: and find related suggestions :mag_right:. - -#### Before Submitting An Enhancement Proposal - -* **Check if there's already [a package](https://github.com/BioJulia) which provides that enhancement.** - -* **Determine which package the enhancement should be suggested in.** - -* **Perform a cursory issue search** to see if the enhancement has already been suggested. - * If it has not, open a new issue as per the guidance below. - * If it has... - * Add a comment to the existing issue instead of opening a new one. - * If it was closed, take the time to understand why this was so (it's ok to - ask! :) ), and consider whether anything has changed that makes the reason - outdated. If you can think of a convincing reason to reconsider the - enhancement, feel free to open a new issue as per the guidance below. - -#### How to submit a (good) new enhancement proposal - -Enhancement proposals are tracked as -[GitHub issues](https://guides.github.com/features/issues/). -After you've determined which package your enhancement proposals is related to, -create an issue on that repository and provide the following information by -filling in [template](.github/ISSUE_TEMPLATE.md). -This template will help you to follow the guidance below. - -1. **Explain the enhancement** - - *Use a clear and descriptive title* for the issue to identify the suggestion. - - *Provide a step-by-step description of the suggested enhancement* in as many details as possible. - - *Provide specific examples to demonstrate the steps*. - Include copy/pasteable snippets which you use in those examples, as - [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines). - - - If you want to change current behaviour... - - Describe the *current* behaviour. - - *Explain which behaviour you expected* to see instead and *why*. - - *Will the proposed change alter APIs or existing exposed methods/types?* - If so, this may cause dependency issues and breakages, so the maintainer - will need to consider this when versioning the next release. - - - *OPTIONALLY: Include screenshots and animated GIFs*. - You can use [this tool](https://www.cockos.com/licecap/) to record GIFs on - macOS and Windows, and [this tool](https://github.com/colinkeenan/silentcast) - or [this tool](https://github.com/GNOME/byzanz) on Linux. - -2. **Provide additional context for the enhancement** - - - *Explain why this enhancement would be useful* to most BioJulia users and - isn't something that can or should be implemented as a separate package. - - - *Do you know of other projects where this enhancement exists?* - -3. **Include details about your configuration and environment** - - - Specify which *version of the package* you're using. - - - Specify the *name and version of the OS* you're using. - -*Note: All of the above guidance is included in the [template](.github/ISSUE_TEMPLATE.md) for your convenience.* - -### Making Pull Requests - -BioJulia packages (and all julia packages) can be developed locally. -For information on how to do this, see this section of the julia -[documentation](https://docs.julialang.org/en/stable/manual/packages/#Package-Development-1). - -Before you start working on code, it is often a good idea to open an enhancement -[suggestion](#suggest-an-enhancement) - -Once you decide to start working on code, the first thing you should do is make -yourself an account on [Github](https://github.com). -The chances are you already have one if you've done coding before and wanted to -make any scripts or software from a science project public. - -The first step to contributing is to find the -[BioJulia repository](https://github.com/BioJulia) for the package. -Hit the 'Fork' button on the repositories page to create a forked copy of the -package for your own Github account. This is your blank slate to work on, and -will ensure your work and experiments won't hinder other users of the released -and stable package. - -From there you can clone your fork of the package and work on it on your -machine using git. -Here's an example of cloning, assuming you already forked the BioJulia package "BioSequences.jl": - -```sh -git clone https://github.com//BioSequences.jl.git -``` - -Git will download or "clone" your fork and put it in a folder called -BioSequences.jl it creates in your current directory. - -It is beyond the scope of this document to describe good git and github use in -more specific detail, as the folks at Git and GitHub have already done that wonderfully -on their own sites. If you have additional questions, simply ping a BioJulia -member or the [BioJulia Gitter](https://gitter.im/BioJulia/Bio.jl). - -#### How to make (good) code contributions and new Pull-Requests - -1. **In your code changes** - - - **Branch properly!** - - If you are making a bug-fix, then you need to checkout your bug-fix branch - from the last release tag. - - If you are making a feature addition or other enhancement, checkout your - branch from master. - - See [here](#a-suggested-branching-model) for more information (or ask a package maintainer :smile:). - - - Follow the [julia style guide](https://docs.julialang.org/en/stable/manual/style-guide/). - - - Follow the [additional style suggestions](#additional-julia-code-style-suggestions). - - - Follow the [julia performance tips](https://docs.julialang.org/en/stable/manual/performance-tips/). - - - Update and add docstrings for new code, consistent with the [documentation styleguide](https://docs.julialang.org/en/stable/manual/documentation/). - - - Update information in the documentation located in the `docs/src/` - folder of the package/repository if necessary. - - - Ensure that unit tests have been added which cover your code changes. - - - Ensure that you have added an entry to the `[UNRELEASED]` section of the - manually curated `CHANGELOG.md` file for the package. Use previous entries as - an example. Ensure the `CHANGELOG.md` is consistent with the - recommended [changelog style](EXAMPLE_CHANGELOG.md). - - - All changes should be compatible with the latest stable version of - Julia. - - - Please comment liberally for complex pieces of internal code to facilitate comprehension. - -2. **In your pull request** - - - **Use the [pull request template](.github/PULL_REQUEST_TEMPLATE.md)** - - - *Describe* the changes in the pull request - - - Provide a *clear, simple, descriptive title*. - - - Do not include issue numbers in the PR title. - - - If you have implemented *new features* or behaviour - - *Provide a description of the addition* in as many details as possible. - - *Provide justification of the addition*. - - *Provide a runnable example of use of your addition*. This lets reviewers - and others try out the feature before it is merged or makes it's way to release. - - - If you have *changed current behaviour*... - - *Describe the behaviour prior to you changes* - - *Describe the behaviour after your changes* and justify why you have made the changes. - - *Does your change alter APIs or existing exposed methods/types?* - If so, this may cause dependency issues and breakages, so the maintainer - will need to consider this when versioning the next release. - - If you are implementing changes that are intended to increase performance, you - should provide the results of a simple performance benchmark exercise - demonstrating the improvement. Especially if the changes make code less legible. - -*Note: All of the above guidance is included in the [template](.github/PULL_REQUEST_TEMPLATE.md) for your convenience.* - -#### Reviews and merging - -You can open a pull request early on and push changes to it until it is ready, -or you can do all your editing locally and make a pull request only when it is -finished - it is up to you. - -When your pull request is ready on Github, mention one of the maintainers of the repo -in a comment e.g. `@Ward9250` and ask them to review it. You can also use Github's -review feature. They will review the code and documentation in the pull request, -and will assess it. - -Your pull request will be accepted and merged if: - -1. The dedicated package maintainers approve the pull request for merging. -2. The automated build system confirms that all unit tests pass without any issues. - -There may be package-specific requirements or guidelines for contributors with -some of BioJulia's packages. Most of the time there will not be, the maintainers -will let you know. - -It may also be that the reviewers or package maintainers will want to you to make -changes to your pull request before they will merge it. Take the time to -understand why any such request has been made, and freely discuss it with the -reviewers. Feedback you receive should be constructive and considerate -(also see [here](#etiquette-and-conduct)). - -### Submitting a package to BioJulia - -If you have written a package, and would like to have it listed under - -and endorsed by - the BioJulia organization, you're agreeing to the following: - -1. Allowing BioJulia to have joint ownership of the package. - This is so that the members can help you review and merge pull requests and - other contributions, and also help you to develop new features. - This policy ensures that you (as the package author and current maintainer) - will have good support in maintaining your package to the highest possible - quality. - -2. Go through a joint review/decision on a suitable package name. - This usually the original package name. However, package authors may be asked - to rename their package to something more official and discoverable (by - search engines and such) if it is contentious or non-standard. - -To submit your package, follow these steps: - -1. Introduce yourself and your package on the BioJulia Gitter channel. -2. At this point maintainers will reach out to mentor and vouch for you and your package. They will: - 1. Discuss with you a suitable name. - 2. Help you ensure the the package is up to standard, and meets the code and contribution guidelines described on this site. - 3. Add you to the BioJulia organisation if you wish to become a BioJulia maintainer. - 4. Transfer ownership of the package. - -### Become a BioJulia package maintainer - -You may ask the current admin or maintainers of a BioJulia package to invite you. - -They will generally be willing to do so if you have done one or -more of the following to [contribute](#how-can-i-contribute) to BioJulia in the past: - -1. You have [submitted a new package](#submitting-a-package-to-biojulia) to BioJulia. -2. [Reported a bug](#reporting-bugs). -3. [Suggested enhancements](#suggesting-enhancements). -4. [Made one or more pull requests](#pull-requests) implementing one or more... - - Fixed bugs. - - Improved performance. - - Added new functionality. - - Increased test coverage. - - Improved documentation. - -None of these requirements are set in stone, but we prefer you to have done one -or more of the above, as it gives good confidence that you are familiar with the -tasks and responsibilities of maintaining a package used by others, and are -willing to do so. -Any other avenue for demonstrating commitment to the community and the -GitHub organisation will also be considered. - -### BioJulia members can sometimes become administrators - -Members of the admin team have often been contributing to BioJulia for a long -time, and may even be founders present at the inception of the project. -In order to become an admin, one does not necessarily have to contribute large -amounts of code to the project. -Rather the decision to on-board a member to an admin position requires a history -of using and contributing to BioJulia, and a positive -interaction and involvement with the community. Any BioJulia member fulfilling -this, may offer to take on this [responsibility](#biojulia-administrators). - -## Styleguides - -### Git Commit messages - -* Use the present tense ("Add feature" not "Added feature"). -* Use the imperative mood ("Move cursor to..." not "Moves cursor to..."). -* Limit the first line to 72 characters or less. -* Reference issues and pull requests liberally after the first line. -* Consider starting the commit message with an applicable emoji: - * :art: `:art:` when improving the format/structure of the code - * :racehorse: `:racehorse:` when improving performance - * :memo: `:memo:` when writing docs - * :penguin: `:penguin:` when fixing something on Linux - * :apple: `:apple:` when fixing something on macOS - * :checkered_flag: `:checkered_flag:` when fixing something on Windows - * :bug: `:bug:` when fixing a bug - * :fire: `:fire:` when removing code or files - * :green_heart: `:green_heart:` when fixing the CI build - * :white_check_mark: `:white_check_mark:` when adding tests - * :arrow_up: `:arrow_up:` when upgrading dependencies - * :arrow_down: `:arrow_down:` when downgrading dependencies - * :exclamation: `:exclamation:` when removing warnings or depreciations - -### Additional julia style suggestions - -- Source code files should have the following style of header: - - ```julia - # Title - # ===== - # - # Short description. - # - # [Long description (optional)] - # - # This file is a part of BioJulia. License is MIT: - ``` - -- Indent with 4 spaces. - -- For functions that are not a single expression, it is preferred to use an explicit `return`. - Be aware that functions in julia implicitly return the the result of the last - expression in the function, so plain `return` should be used to indicate that - the function returns `nothing`. - -- Type names are camel case, with the first letter capitalized. E.g. - `SomeVeryUsefulType`. - -- Module names should be camel case. - -- Separate logical blocks of code with one blank line. Although it is common - and acceptable for short single-line functions to be defined together on - consecutive lines with no blank lines between them. - -- Function names, apart from constructors, are all lowercase. - Include underscores between words only if the name would be hard - to read without. - E.g. `start`, `stop`, `find_letter` `find_last_digit`. - It is good to separate concepts in a name with a `_`. - -- Generally try to keep lines below 100-columns, unless splitting a long line - onto multiple lines makes it harder to read. - -- Files that declare modules should only declare the module, and import any - modules that it requires. Any subsequent significant code should be included - from separate files. E.g. - -```julia -module AwesomeFeatures - -using IntervalsTrees, JSON - -include("feature1.jl") -include("feature2.jl") - -end -``` - -- Files that declare modules should have the same name name of the module. - E.g the module `SomeModule` is declared under the file `SomeModule.jl`. - -- When extending method definitions, define the methods with a module name prefix. E.g. - -```julia -function Base.start(iter::YourType) - ... -end - -Base.done(iter::YourType, state) = ... -``` - -- Functions that get or set variables in a struct should not be - prefixed with 'get' or 'set'. - The getter should be named for the variable it gets, and the setter - should have the same name as the getter, with the suffix `!`. - For example, for the variable `names`: - -```julia -name(node) # get node name -name!(node, "somename") # set node name -``` - -- When using conditional branching, if code is statement-like, an - if-else block should be used. However if the code is expression-like - then julia's ternary operator should be used. - ```julia - matches == sketchlen ? 1.0 : matches / (2 * sketchlen - matches) - ``` - Some simple checks and expressions are also expressed using the `&&` or `||` - operators instead of if-else syntax. For example: - ```julia - isvalid(foo) || throw(ArgumentError("$foo is not valid")) - ``` - -## Additional Notes - -### A suggested branching model - -If you are a [dedicated maintainer](#biojulia-package-maintainers) on a BioJulia -package, you may be wondering which branching model to choose for development -and maintenance of your code. - -If you are a contributor, knowing the branching model of a package may help -you work more smoothly with the maintainer of the package. - -There are several options available, including git-flow. - -Below is a recommended branching model for your repo, but it is -only a suggestion. What is best for you as the -[dedicated maintainer(s)](#biojulia-package-maintainers), is best for _you_. - -The model below is a brief summary of the ['OneFlow model'](http://endoflineblog.com/oneflow-a-git-branching-model-and-workflow). -We describe it in summary here for convenience, but we recommend you check out -the blog article as a lot more justification and reasoning is presented on _why_ -this model is the way it is. - -#### During development - -1. There is only one main branch - you can call it anything, but usually it's - called `master`. - -2. Use temporary branches for features, releases, and bug-fixes. These temporary - branches are used as a convenience to share code with other developers and as a - backup measure. They are always removed once the changes present on them are - added to master. - -3. Features are integrated onto the master branch primarily in a way which keeps - the history linear and simple. A good compromise to the rebase vs. merge commit - debate for this step is to first do an interactive rebase of the feature branch - on master, and then do a non-fast-forward merge. - Github now does squashed commits when merging a PR and this is fine too. - -_Feature Example:_ - -```sh -git checkout -b feature/my-feature master - -... Make commits to feature/my-feature to finish the feature ... - -git rebase -i master -git checkout master -git merge --no-ff feature/my-feature -git push origin master -git branch -d feature/my-feature -``` - -#### :sparkles: Making new releases - -1. You create a new branch for a new release. It branches off from `master` at the - point that you decided `master` has all the necessary features. This is not - necessarily the tip of the `master` branch. - -2. From then on new work, aimed for the _next_ release, is pushed to `master` as - always, and any necessary changes for the _current_ release are pushed to the - release branch. Once the release is ready, you tag the top of the release branch. - -3. Once the release is ready, tag the top of the release branch with a version - number. Then do a typical merge of the release branch into `master`. - Any changes that were made during the release will now be part of `master`. - Delete the release branch. - -_Release Example:_ - -```sh -git checkout -b release/2.3.0 9efc5d - -... Make commits to release/2.3.0 to finish the release ... - -git tag 2.3.0 -git checkout master -git merge release/2.3.0 -git push --tags origin master -git branch -d release/2.3.0 -git push origin :release/2.3.0 -``` - -7. Do your pushes, and go to GitHub to make your release available. - -#### :bug: Hot-fixes and hot-fix releases - -1. When a hot-fix is needed, create a hot-fix branch, that branches from the - release tag that you want to apply the fix to. - -2. Push the needed fixes to the hot-fix branch. - -3. When the fix is ready, tag the top of the fix branch with a new release, - merge it into master, finally delete the hot-fix branch. - -_Hot-fix example:_ - -```sh -git checkout -b hotfix/2.3.1 2.3.0 - -... Add commits which fix the problem ... - -git tag 2.3.1 -git checkout master -git merge hotfix/2.3.1 -git push --tags origin master -git branch -d hotfix/2.3.1 -``` - -**IMPORTANT:** -There is one special case when finishing a hot-fix branch. -If a release branch has already been cut in preparation for the next release -before the hot-fix was finished, you need to merge the hot-fix branch not to -master, but to that release branch. diff --git a/Project.toml b/Project.toml index 663e473..0f2eea5 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,9 @@ -title = "BioTutorials" +name = "BioTutorials" uuid = "33e7be4a-8e14-4baf-892c-424bb664d307" authors = ["Kevin Bonham (@kescobo)", "Kenta Sato (@bicycle1885)"] +title = "BioTutorials" version = "0.1.0" [deps] +BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306" diff --git a/Rosalind.info/1-Getting-started.jl b/Rosalind.info/1-Getting-started.jl deleted file mode 100644 index 80cdf84..0000000 --- a/Rosalind.info/1-Getting-started.jl +++ /dev/null @@ -1,971 +0,0 @@ -### A Pluto.jl notebook ### -# v0.19.22 - -using Markdown -using InteractiveUtils - -# This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error). -macro bind(def, element) - quote - local iv = try Base.loaded_modules[Base.PkgId(Base.UUID("6e696c72-6542-2067-7265-42206c756150"), "AbstractPlutoDingetjes")].Bonds.initial_value catch; b -> missing; end - local el = $(esc(element)) - global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : iv(el) - el - end -end - -# ╔═╡ 072db8c0-d3c1-11ed-18fa-bff69835f8cd -using PlutoUI - -# ╔═╡ 4f2e0acf-5ac8-400a-82da-c66c7b07c467 -using BioSequences - -# ╔═╡ 41defb0d-7c92-46af-a4e2-35edac47a690 -using BenchmarkTools - -# ╔═╡ 76df740a-a130-41f3-8c3e-1f24729cc41b -md""" -# Getting Started with Rosalind.info Problems - -If you're just learning bioinformatics, -or diving into a new programming language with an interest in biology, -[Rosalind.info](https://rosalind.info/) is a fantastic resource! -It has a series of problems that get progressively harder, -and introduce different concepts. - -These tutorial notebooks will take you through how to solve many of the problems, -both using functionality from the Base julia language, -as well as using functionality from the BioJulia family of packages. - -Once you've signed up for an account at rosalind.info, come on back here, -and we'll get started! -""" - -# ╔═╡ 82d06ce9-e588-4312-87c1-d1c97263958a -md""" -## 🧬 Problem 1: Counting DNA nucleotides - -🤔 [Problem link](https://rosalind.info/problems/dna/) - -!!! note - For each of these problems, you are strongly encouraged - to read through the problem descriptions, - especially if you're somewhat new to molecular biology. - We will mostly not repeat the background concepts in these notebooks, - except where they are relevant to the solutions. - -""" - -# ╔═╡ 943cd6d5-6339-4851-b663-c9c0ef77e7eb -PlutoUI.TableOfContents() - -# ╔═╡ fb80d512-6cbe-4b02-919e-6c557729bb42 -md""" -This section contains the code for counting each letter ('A', 'C', 'G', or 'T'), -and showing the counts, in that order. - -You can enter a DNA sequence (as long as it only contains those 4 letters) -here, and the counts will be displayed below. -Note, the current values represent the demo input provided in the problem: - -$(@bind input_dna TextField((50,3); default = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC")) -""" - -# ╔═╡ f810b26d-a7cb-44d4-af35-c82fc3791ff6 -md""" -Let's see how it's done! -""" - -# ╔═╡ fe8daef1-b3e9-4cca-94bb-8a1dec71f0f7 -md""" -### DNA sequences are `String`s of `Char`s - -In julia, single characters and strings, -which are made up of multiple characters, have different types. -`Char` and `String` respectively. - -They are also written differently - single quotes for a `Char`, -and double quotes for a `String'. -""" - -# ╔═╡ 05b5fb9f-d2d3-4917-8ac4-1b14d746fe9c -chr = 'a' - -# ╔═╡ 4173f591-4232-4918-ae09-a3de449d42fe -str = "A" - -# ╔═╡ 024cd6e2-23e4-4490-b482-641f490db2cc -typeof(chr) - -# ╔═╡ 75f92ac2-a747-470e-aec1-8f0c906248a7 -typeof(str) - -# ╔═╡ 62004c70-e76a-4da2-92a1-d7c4b8b22aa7 -md""" -In many ways, a `String` can be thought of as a vector of `Char`s, -and many julia functions that operate on collections like `Vector`s -will work on `String`s. -We can also loop over the contents of a string, -which will treat each `Char` separately. -""" - -# ╔═╡ 7ae2154e-c161-4d45-a476-d3fa79757f2d -for c in "banana" - @info c -end - -# ╔═╡ 755014bc-470f-469f-a508-df3f372f228b -md""" -### Approach 1: counting in loops - -One relatively straightforward way to approach this problem -is to set a variable to `0` for each base, -then loop through the sequence, adding `1` to the appropriate -variable at each character. - -I'll also stick this into a function, -so we can easily reuse the code. -""" - -# ╔═╡ c39e25c7-0071-4be0-8fe1-38eb5e2e4263 -function countbases(seq) # here `seq` is an "argument" for the function - a = 0 - c = 0 - g = 0 - t = 0 - for base in seq - if base == 'A' - a += 1 # this is equivalent to `a = a + 1` - elseif base == 'C' - c += 1 - elseif base == 'G' - g += 1 - elseif base == 'T' - t += 1 - else - # it is often a good idea to try to handle possible mistakes explicitly - error("Base $base is not supported") - end - end - return (a, c, g, t) -end - - -# ╔═╡ f0a203b4-7262-4fba-92b1-8bc535f1222a -md""" -✅ The answer is: **$(join(countbases(input_dna), ' '))**! -""" - -# ╔═╡ 8ab5eb1c-a991-4c1e-a3b9-f412ec2579a4 -countbases("AAA") - -# ╔═╡ 6db22a7e-3f50-4ad0-adc0-ee3a4f53dc91 -countbases(input_dna) # `input_dna` stores what was entered into the box above - -# ╔═╡ 2368c64a-3ce7-4194-8182-56b5fdefcc96 -md""" -### Approach 2: using `count()` - -Another approach is to use the built-in `count()` function, -which takes a "predicate" function as the first argument -and an iterable collection as the second argument. -The predicate function must take each element of the collection, -and return either `true` or `false`. -The `count()` function then returns the number of elements -that returned `true`. - -For example, if I define the `lessthan5()` function -to return `true` if a value is less than 5, -I can then use it as a predicate to count the number of values -in a `Vector` of numbers that are less than 5. -""" - -# ╔═╡ 8146ba26-b7a2-4776-85ee-a16f4d28177b -function lessthan5(num) - return num < 5 -end - -# ╔═╡ de67bd68-2a68-4739-b815-c203079f3826 -count(lessthan5, [1, 5, 6, -3, 3]) - -# ╔═╡ 538f6a28-90c0-4f3e-ae94-9ef143cb54d0 -md""" -Often, we don't want to have to define a simple function like `lessthan5()` -for every predicate we want to test, esepcially if they will only be used once. -Instead, we can use an "anonymous" function (also sometimes called "lambdas") -as the first argument. - -In julia, anonymous functions have the syntax `arg -> func. body`. -In other words, the same expression above could be written as: -""" - -# ╔═╡ 1ac75946-02b9-43e5-a878-b8c961ed08a4 -count(num -> num < 5, [1, 5, 6, -3, 3]) - -# ╔═╡ 7a21f39d-add6-4f8e-9407-ca8baa7b08a5 -md""" -Here, `num -> num < 5` is identical to the definition for `lessthan5(num)`. - -So, now we can write a different formulation of `countbases()` using `count()`: -""" - -# ╔═╡ 7824b8ef-f5f5-49fa-9751-f1cf762283fe -function countbases2(seq) - a = count(base-> base == 'A', seq) - c = count(base-> base == 'C', seq) - g = count(base-> base == 'G', seq) - t = count(base-> base == 'T', seq) - return (a,c,g,t) -end - - -# ╔═╡ ae34b79a-f04f-4a29-839a-06c96972f8ec -countbases2(input_dna) == countbases(input_dna) - -# ╔═╡ b2d6550e-33f5-4031-b018-f5618bc9b763 -md""" -!!! tip - Even though this approach is quite a bit more suscinct, - it might end up being a bit slower than `countbases`, since - it has to loop over the sequence 4 times instead of just once. - - Sometimes, you need to make trade-offs between clarity and efficiency. - One of the great things about `julia` is that a lot of ways of approaching - the same problem are often possible, and often fast (or they can be made fast). -""" - -# ╔═╡ 45b6ad47-0618-4433-9b9f-7809de3cbe32 -md""" -### Approach 3: using BioSequences.jl - -The `BioSequences.jl` package is designed to efficiently work -with biological sequences like DNA sequences. -`BioSequences.jl` efficiently encodes biological sequences using -special types that are not `Char` or `String`s. -""" - -# ╔═╡ c30d7552-6a79-4f94-97c8-c890c780ce3d -seq = LongDNA{2}(input_dna) - -# ╔═╡ dbaf67d6-ba52-4e06-92f0-ea96fff08658 -sizeof(input_dna) - -# ╔═╡ 02ebb987-bb0b-4070-89ff-b39bc74338a1 -sizeof(seq) - -# ╔═╡ a8385030-4de5-4fd8-89ba-737f5d720a43 -md""" -Counting individual nucleotides isn't the most common operation, -but `BioSequences.jl` has some [advanced searching](https://biojulia.github.io/BioSequences.jl/stable/sequence_search/) functionality -built-in. It's a bit overkill for this task, but for completeness: -""" - -# ╔═╡ 6892dd16-194f-4832-bcd8-2fbb26b081c2 -function countbases3(seq) - a = count(==(DNA_A), seq) - c = count(==(DNA_C), seq) - g = count(==(DNA_G), seq) - t = count(==(DNA_T), seq) - return (a,c,g,t) -end - -# ╔═╡ dff4bf6c-137e-44da-b201-fff7fb0b777d -countbases3(seq) == countbases2(input_dna) - -# ╔═╡ 32798077-3ad2-4ca8-85b5-8e5ab9c12bf9 -md""" -### Benchmarking - -Julia programmers like speed, -so let's benchmark our approaches! - -""" - -# ╔═╡ 95373362-00eb-461c-b043-e4f61eacdf84 -testseq = randdnaseq(100_000) - -# ╔═╡ 0dd66cd4-2ef9-478a-86de-61f0ad5a6c84 -testseq_str = string(testseq) - -# ╔═╡ d187e8fa-2fa4-4b44-9e4f-ff12e6f97ccd -@benchmark countbases($testseq_str) - -# ╔═╡ 57b42a4c-5ae6-413d-892f-ffd32651d7ca -@benchmark(countbases2($testseq_str)) - -# ╔═╡ e6a5e5cd-124b-4fb1-b1b0-544c27201f78 -@benchmark countbases3($testseq) - -# ╔═╡ 85ba5566-8292-433d-bea8-4994dafd223e -md""" -Interestingly, on my system, `countbases2()` is actually faster than `countbases()`, -at least for this longer sequence. This may be bacause [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) lets the calls to `count()` work in parallel. - -But, as you can see, `countbases3()` is even faster. Let me make one more function that mimics the behavior of the original `countbases()` but uses `BioSequences.jl` instead. -""" - -# ╔═╡ b32129ac-07c4-40bb-82e0-8773af956bd5 -function countbases4(seq) - a = 0 - c = 0 - g = 0 - t = 0 - for base in seq - if base == DNA_A - a += 1 # this is equivalent to `a = a + 1` - elseif base == DNA_C - c += 1 - elseif base == DNA_G - g += 1 - elseif base == DNA_T - t += 1 - else - # it is often a good idea to try to handle possible mistakes explicitly - error("Base $base is not supported") - end - end - return (a, c, g, t) -end - -# ╔═╡ 6ce1a73f-0b2b-45bc-a19a-67f40b4d27ac -@benchmark countbases4($testseq) - -# ╔═╡ 281f2869-21f3-4a8c-9744-91b5d644c5c4 -md""" -## ✍️ Problem 2: Transcription - -🤔 [Problem link](https://rosalind.info/problems/rna/) - -Enter your DNA to transcribe here: - -$(@bind input_rna TextField((50,3); default = "GATGGAACTTGACTACGTAAATT")) -""" - -# ╔═╡ e9456da2-2a65-40c8-a8c8-35f655b79635 -md""" -### Approach 1 - string `replace()` - -This one is pretty straightforward, as described. -All we need to do is replace any `'T'`s with `'U'`s. -Happily, julia has a handy `replace()` function -that takes a string, and a `Pair` that is `pattern => replacement`. -In principle, the pattern can be a literal `String`, -or even a regular expression. But here, we can just use a `Char`. - -I'll also write the function using julia's one-line function definition syntax: -""" - -# ╔═╡ 4760025e-b111-460b-8cf6-4d168c5dd4c6 -simple_transcribe(seq) = replace(seq, 'T'=> 'U') - -# ╔═╡ 62435b78-20bb-41e3-b9d0-8f8c2e376664 -md""" -As always, there are lots of ways you *could* do this. -This function won't hanndle poorly formatted sequences, -for example. Or rather, it will handle them, even though it shouldn't: -""" - -# ╔═╡ 49c46fd4-bda8-4a45-9feb-40f9973acd48 -md""" -### Approach 2 - BioSequences `convert()` - -As you might expect, `BioSequences.jl` has a way to do this as well. -`BioSequences.jl` doesn't just use a `String` to represent sequences, -there are special types that can efficiently encode nucleic acid -or amino acid sequences. -In some cases, eg DNA or RNA with no ambiguous bases, using as few as 2 bits -per base. -""" - -# ╔═╡ f8ea3c52-2a10-41a7-b721-042e28686a33 -dna_seq = LongDNA{2}(input_rna) - -# ╔═╡ e322c9eb-b337-4984-a2d0-bd778d4ec728 -simple_transcribe(seq::LongDNA{N}) where N = convert(LongRNA{N}, seq) - -# ╔═╡ 8f33d625-2857-421c-ba27-ccc5eb6a3837 -md""" -✅ The answer is: $(simple_transcribe(input_rna)) -""" - -# ╔═╡ a00c305d-1bfd-4421-88a1-43b4d20ff993 -simple_transcribe("This Is QUITE silly") - -# ╔═╡ 2bdc52a3-a42f-41ba-a6ee-09b6b5323a20 -md""" -A couple of things to note here. First, -I'm taking advantage of julia's multiple dispatch system. -Instead of writing a separate function name for dealing with -a `LongDNA` from `BioSequences.jl`, I wrote a new *method* -for the same function by adding `::LongDNA{N}` to the argument. - -This tells julia to call this version of `simple_transcribe()` -whenever the argument is a `LongDNA`. Otherwise, it will fall back to the original -(julia always uses the method that is most specific for its arguments). - -The last thing to note is the `{N} ... where N`. This is just a way -that we can use any DNA alphabet (2 bit or 4 bit), and get similar behavior. -""" - -# ╔═╡ 192d1957-86a9-4188-b544-96b085d03f2b -simple_transcribe(dna_seq) - -# ╔═╡ c3fc2e23-75e0-4add-bdfb-19b366dd60aa -md""" -### Benchmarks -""" - -# ╔═╡ f0e4083c-940a-4237-9930-1e2131d1f7d7 -@benchmark simple_transcribe($testseq) - -# ╔═╡ 29ff77ca-d7c8-4838-a7ef-77a69943484f -@benchmark simple_transcribe(x) setup=(x=LongDNA{2}(testseq)) - -# ╔═╡ ff5198bf-ad93-4c1f-b127-15e07dce13a9 -@benchmark simple_transcribe(x) setup=(x=LongDNA{4}(testseq)) - -# ╔═╡ 1956d6f3-de4a-48c3-8856-8a4b529aaeca -md""" -### Conclusions - -I'm actually a little surprised that the `replace()` method does so well, -but there you have it. The `BioJulia method is about 2x faster on a 2-bit sequence -(that is, if there's no ambiguity), but about the same speed on 4-bit sequences. -""" - -# ╔═╡ 01290bd7-1ce9-4223-8cb1-fb5e122831af -md""" -## 😉 Problem 3 - Getting the complement - -I know, I know, [not the *compliment*](https://www.grammarly.com/blog/complement-compliment/), but if you have a better emoji idea, let me know. - -Enter your puzzle input here: - -$(@bind input_revc TextField((50,3); default = "AAAACCCGGT")) -""" - -# ╔═╡ 067cf6b0-b4ab-4e5f-a02d-c69379a5d71b -md""" -This one is a bit tougher - we need to change each base coming in, -and then reverse the result. Actually, that second part is easy, -becuase julia has a built-in `reverse()` function that works for `String`s. -""" - -# ╔═╡ f8a760b7-5adf-49d8-844d-c84063cb5472 -reverse("complement") - -# ╔═╡ fd9aa35c-bcaf-40aa-a20c-ce3dfbb18085 -md""" -### Approach 1: using a `Dict`ionary - -In my opinion, the easiest thing to do is to use a `Dict()`, -a data structure that allows arbitrary keys to look up arbitrary entries. - -For example: -""" - -# ╔═╡ 3a56cd1a-2f24-45be-bd49-f73211e16f67 -my_dictionary = Dict("thing1"=> "hello", "thing2" => "world!") - -# ╔═╡ 76982256-6875-4454-a5ab-de8262cfa208 -my_dictionary["thing1"] - -# ╔═╡ 0a3e0260-7dcd-45d8-831b-c132ab5eb045 -my_dictionary["thing2"] - -# ╔═╡ 47180953-e165-49d8-b81e-b3efa9fdb6e4 -md""" -So, we just make a dictionary with 4 entries, one for each base. -Then, to apply this to every base in the sequence, we have a couple of options. -One is to use the `String()` constructor and a "comprehension" - -basically a `for` loop in a single phrase: -""" - -# ╔═╡ 3c7705d9-0f5d-41ee-87dd-29204929a8fa -function revc(seq) - comp_dict = Dict( - 'A'=>'T', - 'C'=>'G', - 'G'=>'C', - 'T'=>'A' - ) - comp = String([comp_dict[base] for base in seq]) - return reverse(comp) -end - -# ╔═╡ 646833e1-2bc3-4cb1-bdbb-d8cfda92b4f0 -md""" -✅ The answer is: $(revc(input_revc)) -""" - -# ╔═╡ c84b26fb-ede4-4ff4-af4b-80175fedf558 -revc("AATTGGC") - -# ╔═╡ 7b161955-e13f-4054-b0d9-c337ab6bf664 -md""" -Here, the comprehension `[comp_dict[base] for base in seq]` is equivalent to something like - -```julia -comp = Char[] -for base in seq - push!(comp, comp_dict[base]) -end -``` - -""" - -# ╔═╡ 2de2619b-a1c3-4ee1-8a0d-959b11fc7d76 -md""" -### Approach 2: using `replace()` again - -It turns out, the `replace()` function we used for the transcription problem -can be passed mulitple `Pair`s of patterns to replace! - -So we can just pass the pairs directly: -""" - -# ╔═╡ 72fb06f1-6a5d-46ff-ba39-d73d7bb5c0bf -function revc2(seq) - comp = replace(seq, - 'A'=>'T', - 'C'=>'G', - 'G'=>'C', - 'T'=>'A' - ) - return reverse(comp) -end - -# ╔═╡ d8ba106a-f859-4e60-a6b0-cfee3ac23164 -revc(input_revc) == revc2(input_revc) - -# ╔═╡ afc7c059-e0c9-4125-b891-5733882c3a02 -md""" -### Approach 3: `BioSequences.jl` - -This is a pretty common need in bioinformatics, so `BioSequences.jl` actually has a `reverse_complement()` function built-in. -""" - -# ╔═╡ 26a60458-5ad4-432d-86bb-06bf839d552e -reverse_complement(LongDNA{2}(input_revc)) - -# ╔═╡ 20802225-0c76-404a-821f-7057bf24c103 -md""" -### Once more, benchmarks -""" - -# ╔═╡ fd52b9d0-8fcb-4913-8e02-d94f7a290a25 -@benchmark revc($testseq_str) - -# ╔═╡ c6b28492-3780-45d3-8783-9abc1744daa2 -@benchmark revc2($testseq_str) - -# ╔═╡ 79b66b01-5c1a-4dd9-b042-13af046ab7ae -@benchmark reverse_complement($testseq) - -# ╔═╡ 7f00da1d-8e3e-48d3-976c-1df011cf52f1 -@benchmark reverse_complement(testseq_4bit) setup=(testseq_4bit = convert(LongDNA{4}, testseq)) - -# ╔═╡ 80bf8696-ef49-49b4-b826-618a3e806e61 -md""" -### Conclusions - -This one is a no-brainer! The `reverse_complement()` function is about 200x faster than the dictionary method, and about 1000x faster than `replace()` for both 2 bit and 4 bit DNA sequences. -""" - -# ╔═╡ 9a466bd5-0cf1-4324-8132-f6f27fc2650b -md""" -## ⌛ Overall Conclusions - -A lot of bioinformatics is essentially string manipulation. -Julia has a lot of useful functionality to work with `String`s -directly, but those methods often leave a lot of performance on the table. - -`BioSequences.jl` provides some nice sequence types and incredibly efficient -data structures. We'll be seeing more of them in coming tutorials. - -""" - -# ╔═╡ 00000000-0000-0000-0000-000000000001 -PLUTO_PROJECT_TOML_CONTENTS = """ -[deps] -BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" -BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" -PlutoUI = "7f904dfe-b85e-4ff6-b463-dae2292396a8" - -[compat] -BenchmarkTools = "~1.3.2" -BioSequences = "~3.1.3" -PlutoUI = "~0.7.50" -""" - -# ╔═╡ 00000000-0000-0000-0000-000000000002 -PLUTO_MANIFEST_TOML_CONTENTS = """ -# This file is machine-generated - editing it directly is not advised - -julia_version = "1.8.5" -manifest_format = "2.0" -project_hash = "fcff54ffafa620feff21401e4e674b157e87fb4a" - -[[deps.AbstractPlutoDingetjes]] -deps = ["Pkg"] -git-tree-sha1 = "8eaf9f1b4921132a4cff3f36a1d9ba923b14a481" -uuid = "6e696c72-6542-2067-7265-42206c756150" -version = "1.1.4" - -[[deps.ArgTools]] -uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f" -version = "1.1.1" - -[[deps.Artifacts]] -uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33" - -[[deps.Base64]] -uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f" - -[[deps.BenchmarkTools]] -deps = ["JSON", "Logging", "Printf", "Profile", "Statistics", "UUIDs"] -git-tree-sha1 = "d9a9701b899b30332bbcb3e1679c41cce81fb0e8" -uuid = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" -version = "1.3.2" - -[[deps.BioSequences]] -deps = ["BioSymbols", "Random", "SnoopPrecompile", "Twiddle"] -git-tree-sha1 = "c96ede1c34ac948b108f11e4d9ae66df13d57454" -uuid = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" -version = "3.1.3" - -[[deps.BioSymbols]] -deps = ["SnoopPrecompile"] -git-tree-sha1 = "2052c3ec7c41b69efa0e9ff7e2734aa6658d4c40" -uuid = "3c28c6f8-a34d-59c4-9654-267d177fcfa9" -version = "5.1.2" - -[[deps.ColorTypes]] -deps = ["FixedPointNumbers", "Random"] -git-tree-sha1 = "eb7f0f8307f71fac7c606984ea5fb2817275d6e4" -uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f" -version = "0.11.4" - -[[deps.CompilerSupportLibraries_jll]] -deps = ["Artifacts", "Libdl"] -uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae" -version = "1.0.1+0" - -[[deps.Dates]] -deps = ["Printf"] -uuid = "ade2ca70-3891-5945-98fb-dc099432e06a" - -[[deps.Downloads]] -deps = ["ArgTools", "FileWatching", "LibCURL", "NetworkOptions"] -uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6" -version = "1.6.0" - -[[deps.FileWatching]] -uuid = "7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee" - -[[deps.FixedPointNumbers]] -deps = ["Statistics"] -git-tree-sha1 = "335bfdceacc84c5cdf16aadc768aa5ddfc5383cc" -uuid = "53c48c17-4a7d-5ca2-90c5-79b7896eea93" -version = "0.8.4" - -[[deps.Hyperscript]] -deps = ["Test"] -git-tree-sha1 = "8d511d5b81240fc8e6802386302675bdf47737b9" -uuid = "47d2ed2b-36de-50cf-bf87-49c2cf4b8b91" -version = "0.0.4" - -[[deps.HypertextLiteral]] -deps = ["Tricks"] -git-tree-sha1 = "c47c5fa4c5308f27ccaac35504858d8914e102f9" -uuid = "ac1192a8-f4b3-4bfe-ba22-af5b92cd3ab2" -version = "0.9.4" - -[[deps.IOCapture]] -deps = ["Logging", "Random"] -git-tree-sha1 = "f7be53659ab06ddc986428d3a9dcc95f6fa6705a" -uuid = "b5f81e59-6552-4d32-b1f0-c071b021bf89" -version = "0.2.2" - -[[deps.InteractiveUtils]] -deps = ["Markdown"] -uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240" - -[[deps.JSON]] -deps = ["Dates", "Mmap", "Parsers", "Unicode"] -git-tree-sha1 = "3c837543ddb02250ef42f4738347454f95079d4e" -uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6" -version = "0.21.3" - -[[deps.LibCURL]] -deps = ["LibCURL_jll", "MozillaCACerts_jll"] -uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21" -version = "0.6.3" - -[[deps.LibCURL_jll]] -deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"] -uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0" -version = "7.84.0+0" - -[[deps.LibGit2]] -deps = ["Base64", "NetworkOptions", "Printf", "SHA"] -uuid = "76f85450-5226-5b5a-8eaa-529ad045b433" - -[[deps.LibSSH2_jll]] -deps = ["Artifacts", "Libdl", "MbedTLS_jll"] -uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8" -version = "1.10.2+0" - -[[deps.Libdl]] -uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb" - -[[deps.LinearAlgebra]] -deps = ["Libdl", "libblastrampoline_jll"] -uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" - -[[deps.Logging]] -uuid = "56ddb016-857b-54e1-b83d-db4d58db5568" - -[[deps.MIMEs]] -git-tree-sha1 = "65f28ad4b594aebe22157d6fac869786a255b7eb" -uuid = "6c6e2e6c-3030-632d-7369-2d6c69616d65" -version = "0.1.4" - -[[deps.Markdown]] -deps = ["Base64"] -uuid = "d6f4376e-aef5-505a-96c1-9c027394607a" - -[[deps.MbedTLS_jll]] -deps = ["Artifacts", "Libdl"] -uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1" -version = "2.28.0+0" - -[[deps.Mmap]] -uuid = "a63ad114-7e13-5084-954f-fe012c677804" - -[[deps.MozillaCACerts_jll]] -uuid = "14a3606d-f60d-562e-9121-12d972cd8159" -version = "2022.2.1" - -[[deps.NetworkOptions]] -uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908" -version = "1.2.0" - -[[deps.OpenBLAS_jll]] -deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"] -uuid = "4536629a-c528-5b80-bd46-f80d51c5b363" -version = "0.3.20+0" - -[[deps.Parsers]] -deps = ["Dates", "SnoopPrecompile"] -git-tree-sha1 = "478ac6c952fddd4399e71d4779797c538d0ff2bf" -uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0" -version = "2.5.8" - -[[deps.Pkg]] -deps = ["Artifacts", "Dates", "Downloads", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"] -uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f" -version = "1.8.0" - -[[deps.PlutoUI]] -deps = ["AbstractPlutoDingetjes", "Base64", "ColorTypes", "Dates", "FixedPointNumbers", "Hyperscript", "HypertextLiteral", "IOCapture", "InteractiveUtils", "JSON", "Logging", "MIMEs", "Markdown", "Random", "Reexport", "URIs", "UUIDs"] -git-tree-sha1 = "5bb5129fdd62a2bbbe17c2756932259acf467386" -uuid = "7f904dfe-b85e-4ff6-b463-dae2292396a8" -version = "0.7.50" - -[[deps.Preferences]] -deps = ["TOML"] -git-tree-sha1 = "47e5f437cc0e7ef2ce8406ce1e7e24d44915f88d" -uuid = "21216c6a-2e73-6563-6e65-726566657250" -version = "1.3.0" - -[[deps.Printf]] -deps = ["Unicode"] -uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7" - -[[deps.Profile]] -deps = ["Printf"] -uuid = "9abbd945-dff8-562f-b5e8-e1ebf5ef1b79" - -[[deps.REPL]] -deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"] -uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb" - -[[deps.Random]] -deps = ["SHA", "Serialization"] -uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" - -[[deps.Reexport]] -git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b" -uuid = "189a3867-3050-52da-a836-e630ba90ab69" -version = "1.2.2" - -[[deps.SHA]] -uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce" -version = "0.7.0" - -[[deps.Serialization]] -uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b" - -[[deps.SnoopPrecompile]] -deps = ["Preferences"] -git-tree-sha1 = "e760a70afdcd461cf01a575947738d359234665c" -uuid = "66db9d55-30c0-4569-8b51-7e840670fc0c" -version = "1.0.3" - -[[deps.Sockets]] -uuid = "6462fe0b-24de-5631-8697-dd941f90decc" - -[[deps.SparseArrays]] -deps = ["LinearAlgebra", "Random"] -uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf" - -[[deps.Statistics]] -deps = ["LinearAlgebra", "SparseArrays"] -uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" - -[[deps.TOML]] -deps = ["Dates"] -uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76" -version = "1.0.0" - -[[deps.Tar]] -deps = ["ArgTools", "SHA"] -uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e" -version = "1.10.1" - -[[deps.Test]] -deps = ["InteractiveUtils", "Logging", "Random", "Serialization"] -uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40" - -[[deps.Tricks]] -git-tree-sha1 = "aadb748be58b492045b4f56166b5188aa63ce549" -uuid = "410a4b4d-49e4-4fbc-ab6d-cb71b17b3775" -version = "0.1.7" - -[[deps.Twiddle]] -git-tree-sha1 = "29509c4862bfb5da9e76eb6937125ab93986270a" -uuid = "7200193e-83a8-5a55-b20d-5d36d44a0795" -version = "1.1.2" - -[[deps.URIs]] -git-tree-sha1 = "074f993b0ca030848b897beff716d93aca60f06a" -uuid = "5c2747f8-b7ea-4ff2-ba2e-563bfd36b1d4" -version = "1.4.2" - -[[deps.UUIDs]] -deps = ["Random", "SHA"] -uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4" - -[[deps.Unicode]] -uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5" - -[[deps.Zlib_jll]] -deps = ["Libdl"] -uuid = "83775a58-1f1d-513f-b197-d71354ab007a" -version = "1.2.12+3" - -[[deps.libblastrampoline_jll]] -deps = ["Artifacts", "Libdl", "OpenBLAS_jll"] -uuid = "8e850b90-86db-534c-a0d3-1478176c7d93" -version = "5.1.1+0" - -[[deps.nghttp2_jll]] -deps = ["Artifacts", "Libdl"] -uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d" -version = "1.48.0+0" - -[[deps.p7zip_jll]] -deps = ["Artifacts", "Libdl"] -uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0" -version = "17.4.0+0" -""" - -# ╔═╡ Cell order: -# ╟─76df740a-a130-41f3-8c3e-1f24729cc41b -# ╟─82d06ce9-e588-4312-87c1-d1c97263958a -# ╟─072db8c0-d3c1-11ed-18fa-bff69835f8cd -# ╟─943cd6d5-6339-4851-b663-c9c0ef77e7eb -# ╟─fb80d512-6cbe-4b02-919e-6c557729bb42 -# ╟─f0a203b4-7262-4fba-92b1-8bc535f1222a -# ╟─f810b26d-a7cb-44d4-af35-c82fc3791ff6 -# ╠═fe8daef1-b3e9-4cca-94bb-8a1dec71f0f7 -# ╠═05b5fb9f-d2d3-4917-8ac4-1b14d746fe9c -# ╠═4173f591-4232-4918-ae09-a3de449d42fe -# ╠═024cd6e2-23e4-4490-b482-641f490db2cc -# ╠═75f92ac2-a747-470e-aec1-8f0c906248a7 -# ╟─62004c70-e76a-4da2-92a1-d7c4b8b22aa7 -# ╠═7ae2154e-c161-4d45-a476-d3fa79757f2d -# ╟─755014bc-470f-469f-a508-df3f372f228b -# ╠═c39e25c7-0071-4be0-8fe1-38eb5e2e4263 -# ╠═8ab5eb1c-a991-4c1e-a3b9-f412ec2579a4 -# ╠═6db22a7e-3f50-4ad0-adc0-ee3a4f53dc91 -# ╟─2368c64a-3ce7-4194-8182-56b5fdefcc96 -# ╠═8146ba26-b7a2-4776-85ee-a16f4d28177b -# ╠═de67bd68-2a68-4739-b815-c203079f3826 -# ╟─538f6a28-90c0-4f3e-ae94-9ef143cb54d0 -# ╠═1ac75946-02b9-43e5-a878-b8c961ed08a4 -# ╟─7a21f39d-add6-4f8e-9407-ca8baa7b08a5 -# ╠═7824b8ef-f5f5-49fa-9751-f1cf762283fe -# ╠═ae34b79a-f04f-4a29-839a-06c96972f8ec -# ╟─b2d6550e-33f5-4031-b018-f5618bc9b763 -# ╟─45b6ad47-0618-4433-9b9f-7809de3cbe32 -# ╠═4f2e0acf-5ac8-400a-82da-c66c7b07c467 -# ╠═c30d7552-6a79-4f94-97c8-c890c780ce3d -# ╠═dbaf67d6-ba52-4e06-92f0-ea96fff08658 -# ╠═02ebb987-bb0b-4070-89ff-b39bc74338a1 -# ╟─a8385030-4de5-4fd8-89ba-737f5d720a43 -# ╠═6892dd16-194f-4832-bcd8-2fbb26b081c2 -# ╠═dff4bf6c-137e-44da-b201-fff7fb0b777d -# ╟─32798077-3ad2-4ca8-85b5-8e5ab9c12bf9 -# ╠═41defb0d-7c92-46af-a4e2-35edac47a690 -# ╠═95373362-00eb-461c-b043-e4f61eacdf84 -# ╠═0dd66cd4-2ef9-478a-86de-61f0ad5a6c84 -# ╠═d187e8fa-2fa4-4b44-9e4f-ff12e6f97ccd -# ╠═57b42a4c-5ae6-413d-892f-ffd32651d7ca -# ╠═e6a5e5cd-124b-4fb1-b1b0-544c27201f78 -# ╟─85ba5566-8292-433d-bea8-4994dafd223e -# ╠═b32129ac-07c4-40bb-82e0-8773af956bd5 -# ╠═6ce1a73f-0b2b-45bc-a19a-67f40b4d27ac -# ╟─281f2869-21f3-4a8c-9744-91b5d644c5c4 -# ╟─8f33d625-2857-421c-ba27-ccc5eb6a3837 -# ╟─e9456da2-2a65-40c8-a8c8-35f655b79635 -# ╠═4760025e-b111-460b-8cf6-4d168c5dd4c6 -# ╟─62435b78-20bb-41e3-b9d0-8f8c2e376664 -# ╠═a00c305d-1bfd-4421-88a1-43b4d20ff993 -# ╟─49c46fd4-bda8-4a45-9feb-40f9973acd48 -# ╠═f8ea3c52-2a10-41a7-b721-042e28686a33 -# ╠═e322c9eb-b337-4984-a2d0-bd778d4ec728 -# ╟─2bdc52a3-a42f-41ba-a6ee-09b6b5323a20 -# ╠═192d1957-86a9-4188-b544-96b085d03f2b -# ╟─c3fc2e23-75e0-4add-bdfb-19b366dd60aa -# ╠═f0e4083c-940a-4237-9930-1e2131d1f7d7 -# ╠═29ff77ca-d7c8-4838-a7ef-77a69943484f -# ╠═ff5198bf-ad93-4c1f-b127-15e07dce13a9 -# ╟─1956d6f3-de4a-48c3-8856-8a4b529aaeca -# ╟─01290bd7-1ce9-4223-8cb1-fb5e122831af -# ╠═646833e1-2bc3-4cb1-bdbb-d8cfda92b4f0 -# ╟─067cf6b0-b4ab-4e5f-a02d-c69379a5d71b -# ╠═f8a760b7-5adf-49d8-844d-c84063cb5472 -# ╠═fd9aa35c-bcaf-40aa-a20c-ce3dfbb18085 -# ╠═3a56cd1a-2f24-45be-bd49-f73211e16f67 -# ╠═76982256-6875-4454-a5ab-de8262cfa208 -# ╠═0a3e0260-7dcd-45d8-831b-c132ab5eb045 -# ╟─47180953-e165-49d8-b81e-b3efa9fdb6e4 -# ╠═3c7705d9-0f5d-41ee-87dd-29204929a8fa -# ╠═c84b26fb-ede4-4ff4-af4b-80175fedf558 -# ╟─7b161955-e13f-4054-b0d9-c337ab6bf664 -# ╟─2de2619b-a1c3-4ee1-8a0d-959b11fc7d76 -# ╠═72fb06f1-6a5d-46ff-ba39-d73d7bb5c0bf -# ╠═d8ba106a-f859-4e60-a6b0-cfee3ac23164 -# ╟─afc7c059-e0c9-4125-b891-5733882c3a02 -# ╠═26a60458-5ad4-432d-86bb-06bf839d552e -# ╟─20802225-0c76-404a-821f-7057bf24c103 -# ╠═fd52b9d0-8fcb-4913-8e02-d94f7a290a25 -# ╠═c6b28492-3780-45d3-8783-9abc1744daa2 -# ╠═79b66b01-5c1a-4dd9-b042-13af046ab7ae -# ╠═7f00da1d-8e3e-48d3-976c-1df011cf52f1 -# ╟─80bf8696-ef49-49b4-b826-618a3e806e61 -# ╠═9a466bd5-0cf1-4324-8132-f6f27fc2650b -# ╟─00000000-0000-0000-0000-000000000001 -# ╟─00000000-0000-0000-0000-000000000002 diff --git a/Rosalind.info/Project.toml b/Rosalind.info/Project.toml deleted file mode 100644 index c39603e..0000000 --- a/Rosalind.info/Project.toml +++ /dev/null @@ -1,2 +0,0 @@ -[deps] -Pluto = "c3e4b0f8-55cb-11ea-2926-15256bba5781" diff --git a/docs/.gitignore b/docs/.gitignore new file mode 100644 index 0000000..f4ef123 --- /dev/null +++ b/docs/.gitignore @@ -0,0 +1,5 @@ +build/ +site/ +Manifest.toml +node_modules/ +package-lock.json diff --git a/docs/Project.toml b/docs/Project.toml new file mode 100644 index 0000000..459adf0 --- /dev/null +++ b/docs/Project.toml @@ -0,0 +1,12 @@ +[deps] +BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" +BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" +BioSymbols = "3c28c6f8-a34d-59c4-9654-267d177fcfa9" +Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" +DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8" +DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365" +FASTX = "c2308a5c-f048-11e8-3e8a-31650f418d12" +FormatSpecimens = "3372ea36-2a1a-11e9-3eb7-996970b6ffbd" +JuliaFormatter = "98e50ef6-434e-11e9-1051-2b60c6c9e899" +LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589" +XAM = "d759349c-bcba-11e9-07c2-5b90f8f05f7c" diff --git a/docs/make.jl b/docs/make.jl new file mode 100644 index 0000000..f37b1f6 --- /dev/null +++ b/docs/make.jl @@ -0,0 +1,34 @@ +using Documenter +using DocumenterVitepress + +ROSALIND_PAGES = [ + "rosalind/index.md", + "rosalind/01-dna.md", + "rosalind/02-rna.md", + "rosalind/03-revc.md", + ] + +makedocs( + sitename = "BioTutorials", + authors = "Kevin Bonham and Contributors", + modules = Module[], + clean = false, + doctest = true, + draft = false, + format = DocumenterVitepress.MarkdownVitepress( + repo = "https://github.com/BioJulia/BioTutorials", + # md_output_path = ".", # remove when pushing + build_vitepress = haskey(ENV, "CI") + ), + pages = [ + "Rosalind.info" => ROSALIND_PAGES + ] +) + +deploydocs( + repo = "https://github.com/BioJulia/BioJuliaDocs.git", + target = "build", + devbranch = "main", + branch = "gh-pages", + push_preview = true, +) diff --git a/docs/package.json b/docs/package.json new file mode 100644 index 0000000..d45cf53 --- /dev/null +++ b/docs/package.json @@ -0,0 +1,16 @@ +{ + "scripts": { + "docs:dev": "vitepress dev build/.documenter", + "docs:build": "vitepress build build/.documenter", + "docs:preview": "vitepress preview build/.documenter" + }, + "dependencies": { + "@nolebase/vitepress-plugin-enhanced-readabilities": "^2.12.1", + "@shikijs/transformers": "^2.0.3", + "markdown-it": "^14.1.0", + "markdown-it-footnote": "^4.0.0", + "markdown-it-mathjax3": "^4.3.2", + "vitepress": "^1.6.3", + "vitepress-plugin-tabs": "^0.5.0" + } +} diff --git a/docs/src/index.md b/docs/src/index.md new file mode 100644 index 0000000..c9e1bbb --- /dev/null +++ b/docs/src/index.md @@ -0,0 +1,39 @@ +```@raw html +--- +# https://vitepress.dev/reference/default-theme-home-page +# Cribbed from DimensionalData.jl +layout: home + +hero: + name: "BioJulia" + text: "Tutorials" + tagline: "Doing biology with Julia" + actions: + - theme: brand + text: Basic Tutorals + link: /rosalind/index + - theme: alt + text: BioJulia documentation + link: https://github.com/BioJulia/BioJuliaDocs + - theme: alt + text: Source code on github + link: https://github.com/BioJulia/BioTutorials +features: + - title: Rosalind.info Problems + details: These are still a work in progress + link: https://biojulia.dev/BioTutorials + - title: BioSequences.jl + details: Optimized types for working with biological sequences (eg DNA, RNA, proteins) + link: https://biojulia.dev/BioSequences.jl + - title: Automa.jl + details: Efficient state-machine generation to quickly and correctly parse bespoke file formats + link: https://biojulia.dev/Automa.jl + - title: BioMakie.jl + details: Visualize sequences and 3D proteins with ease + link: https://biojulia.dev/BioMakie.jl + icon: + - title: SingleCellProjections.jl + details: More cells? No Problem! Get UMAPs and other projections of your singlg cell data using the power of Sparse Matrices + link: https://biojulia.dev/SingleCellProjections.jl +--- +``` diff --git a/docs/src/rosalind/01-dna.md b/docs/src/rosalind/01-dna.md new file mode 100644 index 0000000..712bf9e --- /dev/null +++ b/docs/src/rosalind/01-dna.md @@ -0,0 +1,292 @@ + +## 🧬 Problem 1: Counting DNA nucleotides + +🤔 [Problem link](https://rosalind.info/problems/dna/) + +!!! note + For each of these problems, you are strongly encouraged + to read through the problem descriptions, + especially if you're somewhat new to molecular biology. + We will mostly not repeat the background concepts in these notebooks, + except where they are relevant to the solutions. + +!!! warning "The Problem" + + A string is simply an ordered collection of symbols selected from some + alphabet and formed into a word; the length of a string is the number of symbols that it contains. + + An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG." + + **Given**: A DNA string `s` + of length at most 1000 nt. + + **Return**: Four integers (separated by spaces) counting the respective number + of times that the symbols 'A', 'C', 'G', and 'T' occur in `s` + . + + **Sample Dataset** + + ``` + AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC + ``` + + **Sample Output** + + ``` + 20 12 17 21 + ``` + +Let's see how it's done! + +### DNA sequences are `String`s of `Char`s + +In julia, single characters and strings, +which are made up of multiple characters, have different types. +`Char` and `String` respectively. + +They are also written differently - single quotes for a `Char`, +and double quotes for a `String'. + + +```@example dna +chr = 'a' +str = "A" +typeof(chr) +``` + +```@example dna +typeof(str) +``` + +In many ways, a `String` can be thought of as a vector of `Char`s, +and many julia functions that operate on collections like `Vector`s +will work on `String`s. +We can also loop over the contents of a string, +which will treat each `Char` separately. + + +```@example dna +for c in "banana" + @info c +end +``` + + + +### Approach 1: counting in loops + +One relatively straightforward way to approach this problem +is to set a variable to `0` for each base, +then loop through the sequence, adding `1` to the appropriate +variable at each character. + +I'll also stick this into a function, +so we can easily reuse the code. + +```@example dna +function countbases(seq) # here `seq` is an "argument" for the function + a = 0 + c = 0 + g = 0 + t = 0 + for base in seq + if base == 'A' + a += 1 # this is equivalent to `a = a + 1` + elseif base == 'C' + c += 1 + elseif base == 'G' + g += 1 + elseif base == 'T' + t += 1 + else + # it is often a good idea to try to handle possible mistakes explicitly + error("Base $base is not supported") + end + end + return (a, c, g, t) +end + +countbases("AAA") +``` + +Now let's see if it works on the example dataset. +Remember, we should be getting the answer `20 12 17 21` + +```@example dna +answer = "20 12 17 21" +input_dna = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC" +countbases(input_dna) +``` + +Well, the formatting is just a bit different - +The julia type is a `Tuple`, which is surrounded by parentheses. +To fix this, we can use the `join` function. + +```@example dna +@assert join(countbases(input_dna), " ") == answer +``` + + +### Approach 2: using `count()` + +Another approach is to use the built-in `count()` function, +which takes a "predicate" function as the first argument +and an iterable collection as the second argument. +The predicate function must take each element of the collection, +and return either `true` or `false`. +The `count()` function then returns the number of elements +that returned `true`. + +For example, if I define the `lessthan5()` function +to return `true` if a value is less than 5, +I can then use it as a predicate to count the number of values +in a `Vector` of numbers that are less than 5. + + +```@example dna +function lessthan5(num) + return num < 5 +end + +count(lessthan5, [1, 5, 6, -3, 3]) +``` + +Often, we don't want to have to define a simple function like `lessthan5()` +for every predicate we want to test, especially if they will only be used once. +Instead, we can use an "anonymous" function (also sometimes called "lambdas") +as the first argument. + +In julia, anonymous functions have the syntax `arg -> func. body`. +In other words, the same expression above could be written as: + + +```@example dna +count(num -> num < 5, [1, 5, 6, -3, 3]) +``` + +Here, `num -> num < 5` is identical to the definition for `lessthan5(num)`. + +So, now we can write a different formulation of `countbases()` using `count()`: + +```@example dna +function countbases2(seq) + a = count(base-> base == 'A', seq) + c = count(base-> base == 'C', seq) + g = count(base-> base == 'G', seq) + t = count(base-> base == 'T', seq) + return (a,c,g,t) +end +``` + +```@example dna +@assert countbases2(input_dna) == countbases(input_dna) +``` + + +!!! tip + Even though this approach is quite a bit more suscinct, + it might end up being a bit slower than `countbases`, since + it has to loop over the sequence 4 times instead of just once. + + Sometimes, you need to make trade-offs between clarity and efficiency. + One of the great things about `julia` is that a lot of ways of approaching + the same problem are often possible, and often fast (or they can be made fast). + + +### Approach 3: using BioSequences.jl + +The `BioSequences.jl` package is designed to efficiently work +with biological sequences like DNA sequences. +`BioSequences.jl` efficiently encodes biological sequences using +special types that are not `Char` or `String`s. + +```@example dna +using BioSequences + +seq = LongDNA{2}(input_dna) + +sizeof(input_dna) +``` + +```@example dna +sizeof(seq) +``` + +Counting individual nucleotides isn't the most common operation, +but `BioSequences.jl` has some [advanced searching](https://biojulia.github.io/BioSequences.jl/stable/sequence_search/) functionality +built-in. It's a bit overkill for this task, but for completeness: + + +```@example dna +function countbases3(seq) + a = count(==(DNA_A), seq) + c = count(==(DNA_C), seq) + g = count(==(DNA_G), seq) + t = count(==(DNA_T), seq) + return (a,c,g,t) +end + +@assert countbases3(seq) == countbases2(input_dna) +``` + + + +### Benchmarking + +Julia programmers like speed, +so let's benchmark our approaches! + + + +```@example dna +using BenchmarkTools + +testseq = randdnaseq(100_000) #this is defined in BioSequences +testseq_str = string(testseq) + +@benchmark countbases($testseq_str) +``` + + +```@example dna +@benchmark(countbases2($testseq_str)) +``` + +```@example dna +@benchmark countbases3($testseq) +``` + + + +Interestingly, on my system, `countbases2()` is actually faster than `countbases()`, +at least for this longer sequence. This may be because [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) lets the calls to `count()` work in parallel. + +But, as you can see, `countbases3()` is even faster. Let me make one more function that mimics the behavior of the original `countbases()` but uses `BioSequences.jl` instead. + +```@example dna +function countbases4(seq) + a = 0 + c = 0 + g = 0 + t = 0 + for base in seq + if base == DNA_A + a += 1 # this is equivalent to `a = a + 1` + elseif base == DNA_C + c += 1 + elseif base == DNA_G + g += 1 + elseif base == DNA_T + t += 1 + else + # it is often a good idea to try to handle possible mistakes explicitly + error("Base $base is not supported") + end + end + return (a, c, g, t) +end + +@benchmark countbases4($testseq) +``` + + diff --git a/docs/src/rosalind/02-rna.md b/docs/src/rosalind/02-rna.md new file mode 100644 index 0000000..274d6a7 --- /dev/null +++ b/docs/src/rosalind/02-rna.md @@ -0,0 +1,131 @@ + +## ✍️ Problem 2: Transcription + +🤔 [Problem link](https://rosalind.info/problems/rna/) + +!!! warning "The Problem" + An RNA string is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'. + + Given a DNA string $t$ corresponding to a coding strand, + its transcribed RNA string $u$ is formed + by replacing all occurrences of 'T' in $t$ with 'U' in $u$. + + _Given_: A DNA string $t$ having length at most 1000 nt. + + _Return_: The transcribed RNA string of $t$. + + **Sample Dataset** + + ```txt + GATGGAACTTGACTACGTAAATT + ``` + + **Sample Output** + + ```txt + GAUGGAACUUGACUACGUAAAUU + ``` + +### Approach 1 - string `replace()` + +```@example rna; output=false +input_dna = "GATGGAACTTGACTACGTAAATT" +answer = "GAUGGAACUUGACUACGUAAAUU" +``` + +This one is pretty straightforward, as described. +All we need to do is replace any `'T'`s with `'U'`s. +Happily, julia has a handy `replace()` function +that takes a string, and a `Pair` that is `pattern => replacement`. +In principle, the pattern can be a literal `String`, +or even a regular expression. But here, we can just use a `Char`. + +I'll also write the function using julia's one-line function definition syntax: + + +```@example rna +input_dna == "GATGGAACTTGACTACGTAAATT" + +simple_transcribe(seq) = replace(seq, 'T'=> 'U') + +@assert simple_transcribe(input_dna) == answer +``` + +As always, there are lots of ways you *could* do this. +This function won't hanndle poorly formatted sequences, +for example. Or rather, it will handle them, even though it shouldn't: + +### Approach 2 - BioSequences `convert()` + +As you might expect, `BioSequences.jl` has a way to do this as well. +`BioSequences.jl` doesn't just use a `String` to represent sequences, +there are special types that can efficiently encode nucleic acid +or amino acid sequences. +In some cases, eg DNA or RNA with no ambiguous bases, using as few as 2 bits +per base. + +```@example rna +using BioSequences + +dna_seq = LongDNA{2}(input_dna) + + +simple_transcribe(seq::LongDNA{N}) where N = convert(LongRNA{N}, seq) + +rna_seq = simple_transcribe(dna_seq) +``` + +```@example rna +@assert String(rna_seq) == answer +``` + +```@example rna +simple_transcribe("This Is QUITE silly") +``` + + + +A couple of things to note here. First, +I'm taking advantage of julia's multiple dispatch system. +Instead of writing a separate function name for dealing with +a `LongDNA` from `BioSequences.jl`, I wrote a new *method* +for the same function by adding `::LongDNA{N}` to the argument. + +This tells julia to call this version of `simple_transcribe()` +whenever the argument is a `LongDNA`. Otherwise, it will fall back to the original +(julia always uses the method that is most specific for its arguments). + +The last thing to note is the `{N} ... where N`. This is just a way +that we can use any DNA alphabet (2 bit or 4 bit), and get similar behavior. + + +### Benchmarks + + +```@example rna +using BenchmarkTools + +testseq = randdnaseq(100_000) #this is defined in BioSequences +testseq_str = string(testseq) + + +@benchmark simple_transcribe($testseq) +``` + + +```@example rna +@benchmark simple_transcribe(x) setup=(x=LongDNA{2}(testseq)) +``` + +```@example rna +@benchmark simple_transcribe(x) setup=(x=LongDNA{4}(testseq)) +``` + + + +### Conclusions + +I'm actually a little surprised that the `replace()` method does so well, +but there you have it. The `BioJulia method is about 2x faster on a 2-bit sequence +(that is, if there's no ambiguity), but about the same speed on 4-bit sequences. + diff --git a/docs/src/rosalind/03-revc.md b/docs/src/rosalind/03-revc.md new file mode 100644 index 0000000..f2d6b14 --- /dev/null +++ b/docs/src/rosalind/03-revc.md @@ -0,0 +1,177 @@ + +## 😉 Problem 3 - Getting the complement + +I know, I know, [not the *compliment*](https://www.grammarly.com/blog/complement-compliment/), but if you have a better emoji idea, let me know. + +!!! warning "The Problem" + In DNA strings, symbols 'A' and 'T' are complements of each other, as are 'C' and 'G'. + + The reverse complement of a DNA string $s$ is the string $sc$ formed by reversing the symbols of $s$, + then taking the complement of each symbol (e.g., the reverse complement of "GTCA" is "TGAC"). + + _Given_: A DNA string $s$ of length at most 1000 bp. + + _Return_: The reverse complement $sc$ of $s$. + + **Sample Dataset** + + ```txt + AAAACCCGGT + ``` + **Sample Output** + + ```txt + ACCGGGTTTT + ``` + + +This one is a bit tougher - we need to change each base coming in, +and then reverse the result. Actually, that second part is easy, +becuase julia has a built-in `reverse()` function that works for `String`s. + + +```@example revc +reverse("complement") +``` + + + +### Approach 1: using a `Dict`ionary + +In my opinion, the easiest thing to do is to use a `Dict()`, +a data structure that allows arbitrary keys to look up arbitrary entries. + +For example: + + +```@example revc +my_dictionary = Dict("thing1"=> "hello", "thing2" => "world!") + + +my_dictionary["thing1"] +``` + +```@example revc +my_dictionary["thing2"] +``` + +So, we just make a dictionary with 4 entries, one for each base. +Then, to apply this to every base in the sequence, we have a couple of options. +One is to use the `String()` constructor and a "comprehension" - +basically a `for` loop in a single phrase: + + +```@example revc +function revc(seq) + comp_dict = Dict( + 'A'=>'T', + 'C'=>'G', + 'G'=>'C', + 'T'=>'A' + ) + comp = String([comp_dict[base] for base in seq]) + return reverse(comp) +end +``` + +Here, the "comprehension" `[comp_dict[base] for base in seq]` is equivalent to something like + +```julia +comp = Char[] +for base in seq + push!(comp, comp_dict[base]) +end +``` + +So let's see if it works! + +```@example revc +input_dna = "AAAACCCGGT" +answer = "ACCGGGTTTT" + +@assert revc(input_dna) == answer +``` + + +### Approach 2: using `replace()` again + +It turns out, the `replace()` function we used for the transcription problem +can be passed mulitple `Pair`s of patterns to replace! + +So we can just pass the pairs directly: + + +```@example revc +function revc2(seq) + comp = replace(seq, + 'A'=>'T', + 'C'=>'G', + 'G'=>'C', + 'T'=>'A' + ) + return reverse(comp) +end + + +@assert revc(input_dna) == revc2(input_dna) +``` + + +### Approach 3: `BioSequences.jl` + +This is a pretty common need in bioinformatics, so `BioSequences.jl` actually has a `reverse_complement()` function built-in. + + +```@example revc +using BioSequences + +reverse_complement(LongDNA{2}(input_dna)) +``` + + + +### Once more, benchmarks + + +```@example revc +using BenchmarkTools + + +testseq = randdnaseq(100_000) #this is defined in BioSequences +testseq_str = string(testseq) + + +@benchmark revc($testseq_str) +``` + +```@example revc +@benchmark revc2($testseq_str) +``` + + +```@example revc +@benchmark reverse_complement($testseq) +``` + + +```@example revc +@benchmark reverse_complement(testseq_4bit) setup=(testseq_4bit = convert(LongDNA{4}, testseq)) +``` + +### Conclusions + +This one is a no-brainer! The `reverse_complement()` function is about 200x faster than the dictionary method, and about 1000x faster than `replace()` for both 2 bit and 4 bit DNA sequences. + + + + +## ⌛ Overall Conclusions + +A lot of bioinformatics is essentially string manipulation. +Julia has a lot of useful functionality to work with `String`s +directly, but those methods often leave a lot of performance on the table. + +`BioSequences.jl` provides some nice sequence types and incredibly efficient +data structures. We'll be seeing more of them in coming tutorials. + + diff --git a/docs/src/rosalind/index.md b/docs/src/rosalind/index.md new file mode 100644 index 0000000..31d0ace --- /dev/null +++ b/docs/src/rosalind/index.md @@ -0,0 +1,19 @@ +# 🧑‍🔬 Getting Started with Rosalind.info Problems + +If you're just learning bioinformatics, +or diving into a new programming language with an interest in biology, +[Rosalind.info](https://rosalind.info/) is a fantastic resource! +It has a series of problems that get progressively harder, +and introduce different concepts. + +These tutorial notebooks will take you through how to solve many of the problems, +both using functionality from the Base julia language, +as well as using functionality from the BioJulia family of packages. + +Once you've signed up for an account at rosalind.info, come on back here, +and we'll get started! + + +```@contents +Pages = Main.ROSALIND_PAGES +```