Skip to content

Commit

Permalink
Merge pull request #85 from Goosang-Yu/dev_ntfargo
Browse files Browse the repository at this point in the history
Enhancements to Genetic Analysis Functions
  • Loading branch information
Goosang-Yu authored Apr 22, 2024
2 parents 9f1d8d3 + c1ce9f1 commit 20aa9cc
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 23 deletions.
4 changes: 2 additions & 2 deletions docs/en/1_Predict/1_howworks.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## gRNA 디자인이 중요할까?
Why is gRNA design important?

CRISPR system들은 gRNA와 그에 대응하는 target 서열 정보에 따라 genome editing 효율이 결정된다. 서열의 특정 motif 또는 GC contents 등이 영향을 미칠 수 있다.
The efficiency of CRISPR systems for genome editing is determined by the gRNA and its corresponding target sequence information. Specific motifs or GC contents of the sequence can have an impact.

## High-throughput screening
![CRISPR High-throughput screening](../assets/contents/en_1_1_1_High-throughput_screening.svg)
Expand Down
2 changes: 1 addition & 1 deletion docs/en/3_Database/2_Genome_resource_background.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Backgrounds

Genome resources를 활용하기 위해서는 먼저 알아둬야 할 기초 지식들이 있다. 이 중 `genet.database`를 활용하기 위해 알아야 할 최소한의 내용을 소개한다.
To utilize genome resources effectively, there are foundational concepts one must understand. Here, we introduce the minimum knowledge required to utilize the `genet.database`

### 1. Databases
Genome resource를 제공하는 database는 여러 개가 존재한다. 그 중 대표적으로 몇 개를 소개한다.
Expand Down
2 changes: 1 addition & 1 deletion genet/analysis/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from genet.analysis.functional import(
loadseq,
SortByBarcodes,

)

from genet.analysis.SGE_analysis import *
from genet.analysis.UMItools import *
from genet.analysis._dev_UMI import *
50 changes: 31 additions & 19 deletions genet/analysis/functional.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,12 +189,9 @@ def sort_barcode(list_sParameters):
SeqIO.write(seq_rec, '%s/%s.%s' % (temp_dir, barcode, output_format), output_format)

# def END: sort_barcode



def combine_files(list_combine_param):
"""Combine files by name
"""
"""Combine files by name"""

# parameters
splits_dir = list_combine_param[0]
Expand Down Expand Up @@ -278,17 +275,32 @@ def sort_barcode_and_combine(list_sParameters):
if silence == False: print('Make temp sorted %s file: %s' % (output_format, subsplit_name))

for barcode, seq_rec in dict_barcode.items():
SeqIO.write(seq_rec, '%s/%s.%s' % (temp_dir, barcode, output_format), output_format)

# def END: sort_barcode


def loadseq():
'''
테스트용으로 만든 코드
'''

print('For testing')


SeqIO.write(seq_rec, '%s/%s.%s' % (temp_dir, barcode, output_format), output_format)

""" Codon usage analysis "temporary" """

def calculate_codon_composition(seq):
"""Calculates the frequency of each codon in a DNA sequence."""
codon_counts = {}
for i in range(0, len(seq) - 2, 3):
codon = seq[i:i+3]
codon_counts[codon] = codon_counts.get(codon, 0) + 1

total_count = sum(codon_counts.values())
for codon, count in codon_counts.items():
codon_counts[codon] = count / total_count

return codon_counts

def find_orfs(seq):
"""Identifies potential open reading frames (ORFs) in a DNA sequence."""
orfs = []
for frame in range(3):
for start in range(frame, len(seq), 3):
codon = seq[start:start + 3]
if codon == 'ATG':
end = start + 3
while end < len(seq) and seq[end:end + 3] not in ['TAA', 'TAG', 'TGA']:
end += 3
orfs.append((start, end, '+'))
return orfs

0 comments on commit 20aa9cc

Please sign in to comment.