Merge pull request #85 from Goosang-Yu/dev_ntfargo

Enhancements to Genetic Analysis Functions
Goosang-Yu · Apr 22, 2024 · 20aa9cc · 20aa9cc
2 parents 9f1d8d3 + c1ce9f1
commit 20aa9cc
Show file tree

Hide file tree

Showing 4 changed files with 35 additions and 23 deletions.
diff --git a/docs/en/1_Predict/1_howworks.md b/docs/en/1_Predict/1_howworks.md
@@ -1,6 +1,6 @@
-## 왜 gRNA 디자인이 중요할까?
+Why is gRNA design important?
 
-CRISPR system들은 gRNA와 그에 대응하는 target 서열 정보에 따라 genome editing 효율이 결정된다. 서열의 특정 motif 또는 GC contents 등이 영향을 미칠 수 있다. 
+The efficiency of CRISPR systems for genome editing is determined by the gRNA and its corresponding target sequence information. Specific motifs or GC contents of the sequence can have an impact.
 
 ## High-throughput screening
 ![CRISPR High-throughput screening](../assets/contents/en_1_1_1_High-throughput_screening.svg)

diff --git a/docs/en/3_Database/2_Genome_resource_background.md b/docs/en/3_Database/2_Genome_resource_background.md
@@ -1,6 +1,6 @@
 # Backgrounds
 
-Genome resources를 활용하기 위해서는 먼저 알아둬야 할 기초 지식들이 있다. 이 중 `genet.database`를 활용하기 위해 알아야 할 최소한의 내용을 소개한다. 
+To utilize genome resources effectively, there are foundational concepts one must understand. Here, we introduce the minimum knowledge required to utilize the  `genet.database`
 
 ### 1. Databases
 Genome resource를 제공하는 database는 여러 개가 존재한다. 그 중 대표적으로 몇 개를 소개한다. 

diff --git a/genet/analysis/__init__.py b/genet/analysis/__init__.py
@@ -1,8 +1,8 @@
 from genet.analysis.functional import(
     loadseq,
     SortByBarcodes,
-
 )
+
 from genet.analysis.SGE_analysis import *
 from genet.analysis.UMItools import *
 from genet.analysis._dev_UMI import *
diff --git a/genet/analysis/functional.py b/genet/analysis/functional.py
@@ -189,12 +189,9 @@ def sort_barcode(list_sParameters):
         SeqIO.write(seq_rec, '%s/%s.%s' % (temp_dir, barcode, output_format), output_format)
 
 # def END: sort_barcode
-
-
+
 def combine_files(list_combine_param):
-    """Combine files by name
-
-    """
+    """Combine files by name"""
 
     # parameters
     splits_dir    = list_combine_param[0]
@@ -278,17 +275,32 @@ def sort_barcode_and_combine(list_sParameters):
     if silence == False: print('Make temp sorted %s file: %s' % (output_format, subsplit_name))
 
     for barcode, seq_rec in dict_barcode.items():
-        SeqIO.write(seq_rec, '%s/%s.%s' % (temp_dir, barcode, output_format), output_format)
-
-# def END: sort_barcode
-
-
-def loadseq():
-    '''
-    테스트용으로 만든 코드
-    
-    '''
-
-    print('For testing')
-
-
+        SeqIO.write(seq_rec, '%s/%s.%s' % (temp_dir, barcode, output_format), output_format) 
+
+""" Codon usage analysis "temporary" """
+
+def calculate_codon_composition(seq):
+    """Calculates the frequency of each codon in a DNA sequence."""
+    codon_counts = {}
+    for i in range(0, len(seq) - 2, 3):
+        codon = seq[i:i+3]
+        codon_counts[codon] = codon_counts.get(codon, 0) + 1
+
+    total_count = sum(codon_counts.values())
+    for codon, count in codon_counts.items():
+        codon_counts[codon] = count / total_count
+
+    return codon_counts
+
+def find_orfs(seq):
+    """Identifies potential open reading frames (ORFs) in a DNA sequence."""
+    orfs = []
+    for frame in range(3):
+        for start in range(frame, len(seq), 3):
+            codon = seq[start:start + 3]
+            if codon == 'ATG':
+                end = start + 3
+                while end < len(seq) and seq[end:end + 3] not in ['TAA', 'TAG', 'TGA']:
+                    end += 3
+                orfs.append((start, end, '+'))
+    return orfs