Sourcery refactored master branch #1

sourcery-ai · 2023-12-04T23:01:55Z

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

sourcery-ai

Due to GitHub API limits, only the first 60 comments can be shown.

sourcery-ai · 2023-12-04T23:01:57Z

code/python/driver/analyze_gcode_parameters.py


-    fig = plt.figure()
-    ax = fig.gca(projection='3d')
-    ax.plot_trisurf(df['Chunk Size'], df['p4,p11'], df['Match Rate'], linewidth=0.2)
-    ax.set_xlabel("Chunk Size")
-    ax.set_ylabel("Parameters")
-    ax.set_zlabel("Match Rate")
-    plt.show()
-
-    df2 = df[df["Tool"] == "mgm2"].groupby(["p4", "p11"], as_index=False).mean()
-
-    idx = df2["Match Rate"].argmax()
-    p4 = df2.at[idx, "p4"]
-    p11 = df2.at[idx, "p11"]
-    df_best = df[(df["p4"] == p4) & (df["p11"] == p11)]
-    df_alex = df[(df["p4"] == 10) & (df["p11"] == 20)]
-    fig, ax = plt.subplots()
-    sns.lineplot("Chunk Size", "Match Rate", data=df_best, label="Optimized")
-    sns.lineplot("Chunk Size", "Match Rate", data=df[df["Tool"] == "mprodigal"], label="MProdigal")
-    sns.lineplot("Chunk Size", "Match Rate", data=df_alex, label="Original")
-    ax.set_ylim(0, 1)
-    plt.show()


Function main refactored with the following changes:

Remove unreachable code (remove-unreachable-code)

sourcery-ai · 2023-12-04T23:01:57Z

code/python/driver/build_gcode_features.py

-            labels_per_seqname[lab.seqname()] = list()
+            labels_per_seqname[lab.seqname()] = []

        labels_per_seqname[lab.seqname()].append(lab)

-    counter = 0
-    for seqname in labels_per_seqname:
+    for counter, (seqname, value) in enumerate(labels_per_seqname.items()):


Function get_features_from_prediction refactored with the following changes:

Replace list() with [] (list-literal)

Use items() to directly unpack dictionary values (use-dict-items)

Replace manual loop counter with call to enumerate (convert-to-enumerate)

Replace comparison with min/max call [×2] (min-max-identity)

sourcery-ai · 2023-12-04T23:01:57Z

code/python/driver/build_gcode_features.py

-    list_entries = list()
-
-


Function build_gcode_features_for_gi_for_chunk refactored with the following changes:

Move assignment closer to its usage within a block (move-assign-in-block)

Replace list() with [] (list-literal)

Merge append into list declaration (merge-list-append)

sourcery-ai · 2023-12-04T23:01:58Z

code/python/driver/build_gcode_features.py

-    list_df = list()
+    list_df = []


Function build_gcode_features_for_gi refactored with the following changes:

Replace list() with [] [×2] (list-literal)

sourcery-ai · 2023-12-04T23:01:58Z

code/python/driver/build_gcode_features.py

-    # type: (Environment, GenomeInfoList, str, List[int], Dict[str, Any]) -> pd.DataFrame
-    list_df = list()
-
-    for gi in gil:
-        list_df.append(
-            build_gcode_features_for_gi(env, gi, tool, chunks, **kwargs)
-        )
-
+    list_df = [
+        build_gcode_features_for_gi(env, gi, tool, chunks, **kwargs)
+        for gi in gil
+    ]


Function build_gcode_features refactored with the following changes:

Convert for loop into list comprehension (list-comprehension)

Replace list() with [] (list-literal)

This removes the following comments ( why? ):

# type: (Environment, GenomeInfoList, str, List[int], Dict[str, Any]) -> pd.DataFrame

sourcery-ai · 2023-12-04T23:01:59Z

code/python/driver/build_mgm_models_from_gms2_models_curr_best.py

-    list_entries = list()
+    list_entries = []


Function add_codon_probabilities refactored with the following changes:

Replace list() with [] (list-literal)

sourcery-ai · 2023-12-04T23:01:59Z

code/python/driver/build_mgm_models_from_gms2_models_curr_best.py

-    x_out = list()
-    y_out = list()
+    x_out = []
+    y_out = []


Function compute_bin_averages refactored with the following changes:

Replace list() with [] [×2] (list-literal)

Simplify sequence length comparison (simplify-len-comparison)

sourcery-ai · 2023-12-04T23:01:59Z

code/python/driver/build_mgm_models_from_gms2_models_curr_best.py

-    for gc_tag in sc_gc.keys():
+    for gc_tag in sc_gc:


Function add_start_context_probabilities refactored with the following changes:

Remove unnecessary call to keys() (remove-dict-keys)

sourcery-ai · 2023-12-04T23:01:59Z

code/python/driver/build_mgm_models_from_gms2_models_curr_best.py

-    list_mgm_models = list()  # type: List[List[float, float, MGMMotifModelV2]]
+    list_mgm_models = []


Function build_mgm_motif_models_for_all_gc refactored with the following changes:

Replace list() with [] (list-literal)

Simplify sequence length comparison (simplify-len-comparison)

This removes the following comments ( why? ):

# type: List[List[float, float, MGMMotifModelV2]]

sourcery-ai · 2023-12-04T23:01:59Z

code/python/driver/build_mgm_models_from_gms2_models_curr_best.py

-        if True or "RBS" in output_tag:
-            # create a label for each shift
-            for shift, prob in motif._shift_prior.items():
-                prob /= 100.0
-                output_tag_ws = f"{output_tag}_{int(shift)}"
-                try:
-                    mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAT"] = motif._motif[shift]
-                    mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_POS_DISTR"] = motif._spacer[
-                    shift]
-                except KeyError:
-                    pass
-
-                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}"] = 1
-                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_ORDER"] = 0
-                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_WIDTH"] = width
-                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MARGIN"] = 0
-                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAX_DUR"] = dur
-                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_SHIFT"] = prob
-        else:
-            # promoter aren't shifted (for now)
-            best_shift = max(motif._shift_prior.items(), key=operator.itemgetter(1))[0]
-            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_MAT"] = motif._motif[best_shift]
-            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_POS_DISTR"] = motif._spacer[best_shift]
-
-            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}"] = 1
-            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_ORDER"] = 0
-            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_WIDTH"] = width
-            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_MARGIN"] = 0
-            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_MAX_DUR"] = dur
+        # create a label for each shift
+        for shift, prob in motif._shift_prior.items():
+            prob /= 100.0
+            output_tag_ws = f"{output_tag}_{int(shift)}"
+            try:
+                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAT"] = motif._motif[shift]
+                mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_POS_DISTR"] = motif._spacer[
+                shift]
+            except KeyError:
+                pass
+
+            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}"] = 1
+            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_ORDER"] = 0
+            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_WIDTH"] = width
+            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MARGIN"] = 0
+            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAX_DUR"] = dur
+            mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_SHIFT"] = prob


Function add_motif_probabilities refactored with the following changes:

Remove redundant conditional (remove-redundant-if)

This removes the following comments ( why? ):

# promoter aren't shifted (for now)

sourcery-ai · 2023-12-04T23:02:01Z

code/python/driver/build_mgm_models_from_gms2_models_curr_best.py

-                    mgm,
-                    "RBS", f"RBS_{o}", genome_type, plot=plot
-                )
-
+        for o, l in zip(output_group, learn_from):
+            add_motif_probabilities(
+                env,
+                df_type[(df_type["GENOME_TYPE"].isin(l))],
+                mgm,
+                "RBS", f"RBS_{o}", genome_type, plot=plot
+            )
    if "Promoter" in components:
+        df_type = df[df["Type"] == genome_type]
        if genome_type == "Archaea":
            output_group = ["D"]
            learn_from = [{"D"}]  # always learn Promoter form group D

-            df_type = df[df["Type"] == genome_type]
-            for o, l in zip(output_group, learn_from):
-                add_motif_probabilities(
-                    env,
-                    df_type[(df_type["GENOME_TYPE"].isin(l))],
-                    mgm,
-                    "PROMOTER", f"PROMOTER_{o}", genome_type, plot=plot
-                )
        else:
            output_group = ["C"]
            learn_from = [{"C"}]  # always learn Promoter form group C

-            df_type = df[df["Type"] == genome_type]
-            for o, l in zip(output_group, learn_from):
-                add_motif_probabilities(
-                    env,
-                    df_type[(df_type["GENOME_TYPE"].isin(l))],
-                    mgm,
-                    "PROMOTER", f"PROMOTER_{o}", genome_type, plot=plot
-                )
-
+        for o, l in zip(output_group, learn_from):
+            add_motif_probabilities(
+                env,
+                df_type[(df_type["GENOME_TYPE"].isin(l))],
+                mgm,
+                "PROMOTER", f"PROMOTER_{o}", genome_type, plot=plot
+            )
    # Start Context
    if "Start Context" in components:
        if genome_type == "Archaea":
            output_group = ["A", "D"]
            learn_from = learn_from_arc

-            for o, l in zip(output_group, learn_from):
-                df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))]
-                add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_RBS_{o}", genome_type=genome_type,
-                                                plot=plot)
        else:
            output_group = ["A", "B", "C", "X"]
            learn_from = [{"A"}, {"B"}, {"C"}, {"A"}]

-            for o, l in zip(output_group, learn_from):
-                df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))]
-                add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_RBS_{o}", genome_type=genome_type,
-                                                plot=plot)
-
+        for o, l in zip(output_group, learn_from):
+            df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))]
+            add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_RBS_{o}", genome_type=genome_type,
+                                            plot=plot)
        # promoter
        if genome_type == "Archaea":
            output_group = ["D"]
            learn_from = [{"A", "D"}]  # always learn RBS form group A

-            for o, l in zip(output_group, learn_from):
-                df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))]
-
-                # NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug
-                # GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us
-                # to learn from group A genomes as well.
-                add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_PROMOTER_{o}", genome_type=genome_type,
-                                                plot=plot)
        else:
            output_group = ["C"]
            learn_from = [{"C"}]

-            for o, l in zip(output_group, learn_from):
-                df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))]
-                # NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug
-                # GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us
-                # to learn from group A genomes as well.
-                add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_PROMOTER_{o}", genome_type=genome_type,
-                                                plot=plot)
+        for o, l in zip(output_group, learn_from):
+            df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))]
+
+            # NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug
+            # GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us
+            # to learn from group A genomes as well.
+            add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_PROMOTER_{o}", genome_type=genome_type,
+                                            plot=plot)


Function build_mgm_models_from_gms2_models refactored with the following changes:

Remove redundant conditional (remove-redundant-if)

Hoist repeated code outside conditional statement [×6] (hoist-statement-from-if)

This removes the following comments ( why? ):

# add_stop_codon_probabilities(df, mgm, genome_type=genome_type, plot=plot) # NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug # to learn from group A genomes as well. # add_stop_codon_probabilities(df, mgm, genome_type="Archaea", plot=plot) # if "Stop Codons" in components: # GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us

sourcery-ai · 2023-12-04T23:02:01Z

code/python/driver/collect_gms2_models.py

-    list_entries = list()
+    list_entries = []


Function collect_start_info_from_gil refactored with the following changes:

Replace list() with [] (list-literal)

sourcery-ai · 2023-12-04T23:02:01Z

code/python/driver/collect_gms2_models.py

-    list_entries = list()
+    list_entries = []


Function collect_start_info_from_gil_and_print_to_file refactored with the following changes:

Replace list() with [] (list-literal)

sourcery-ai · 2023-12-04T23:02:01Z

code/python/driver/compare_clustering_algorithms.py

-    df[f"CONSENSUS_RBS_MAT"] = df.apply(lambda r: get_consensus_sequence(r["Mod"].items["RBS_MAT"]), axis=1)
+    df["CONSENSUS_RBS_MAT"] = df.apply(
+        lambda r: get_consensus_sequence(r["Mod"].items["RBS_MAT"]), axis=1
+    )


Function load_gms2_models_from_pickle refactored with the following changes:

Replace f-string with no interpolated values with string (remove-redundant-fstring)

sourcery-ai · 2023-12-04T23:02:02Z

code/python/driver/compare_clustering_algorithms.py

-                peak_to_list_pos_dist[peak] = list()
+                peak_to_list_pos_dist[peak] = []
            peak_to_list_pos_dist[peak].append(l)

        # average positions (per peak)
        values = dict()
        peak_counter = 0
-        for peak in peak_to_list_pos_dist.keys():
+        for peak in peak_to_list_pos_dist:


Function merge_spacers_by_peak refactored with the following changes:

Replace list() with [] [×3] (list-literal)

Remove unnecessary call to keys() [×3] (remove-dict-keys)

Replace identity comprehension with call to collection constructor (identity-comprehension)

sourcery-ai · 2023-12-04T23:02:03Z