-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourcery refactored master branch #1
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to GitHub API limits, only the first 60 comments can be shown.
|
||
fig = plt.figure() | ||
ax = fig.gca(projection='3d') | ||
ax.plot_trisurf(df['Chunk Size'], df['p4,p11'], df['Match Rate'], linewidth=0.2) | ||
ax.set_xlabel("Chunk Size") | ||
ax.set_ylabel("Parameters") | ||
ax.set_zlabel("Match Rate") | ||
plt.show() | ||
|
||
df2 = df[df["Tool"] == "mgm2"].groupby(["p4", "p11"], as_index=False).mean() | ||
|
||
idx = df2["Match Rate"].argmax() | ||
p4 = df2.at[idx, "p4"] | ||
p11 = df2.at[idx, "p11"] | ||
df_best = df[(df["p4"] == p4) & (df["p11"] == p11)] | ||
df_alex = df[(df["p4"] == 10) & (df["p11"] == 20)] | ||
fig, ax = plt.subplots() | ||
sns.lineplot("Chunk Size", "Match Rate", data=df_best, label="Optimized") | ||
sns.lineplot("Chunk Size", "Match Rate", data=df[df["Tool"] == "mprodigal"], label="MProdigal") | ||
sns.lineplot("Chunk Size", "Match Rate", data=df_alex, label="Original") | ||
ax.set_ylim(0, 1) | ||
plt.show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Remove unreachable code (
remove-unreachable-code
)
labels_per_seqname[lab.seqname()] = list() | ||
labels_per_seqname[lab.seqname()] = [] | ||
|
||
labels_per_seqname[lab.seqname()].append(lab) | ||
|
||
counter = 0 | ||
for seqname in labels_per_seqname: | ||
for counter, (seqname, value) in enumerate(labels_per_seqname.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_features_from_prediction
refactored with the following changes:
- Replace
list()
with[]
(list-literal
) - Use items() to directly unpack dictionary values (
use-dict-items
) - Replace manual loop counter with call to enumerate (
convert-to-enumerate
) - Replace comparison with min/max call [×2] (
min-max-identity
)
list_entries = list() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function build_gcode_features_for_gi_for_chunk
refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block
) - Replace
list()
with[]
(list-literal
) - Merge append into list declaration (
merge-list-append
)
list_df = list() | ||
list_df = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function build_gcode_features_for_gi
refactored with the following changes:
- Replace
list()
with[]
[×2] (list-literal
)
# type: (Environment, GenomeInfoList, str, List[int], Dict[str, Any]) -> pd.DataFrame | ||
list_df = list() | ||
|
||
for gi in gil: | ||
list_df.append( | ||
build_gcode_features_for_gi(env, gi, tool, chunks, **kwargs) | ||
) | ||
|
||
list_df = [ | ||
build_gcode_features_for_gi(env, gi, tool, chunks, **kwargs) | ||
for gi in gil | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function build_gcode_features
refactored with the following changes:
- Convert for loop into list comprehension (
list-comprehension
) - Replace
list()
with[]
(list-literal
)
This removes the following comments ( why? ):
# type: (Environment, GenomeInfoList, str, List[int], Dict[str, Any]) -> pd.DataFrame
list_entries = list() | ||
list_entries = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function add_codon_probabilities
refactored with the following changes:
- Replace
list()
with[]
(list-literal
)
x_out = list() | ||
y_out = list() | ||
x_out = [] | ||
y_out = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function compute_bin_averages
refactored with the following changes:
- Replace
list()
with[]
[×2] (list-literal
) - Simplify sequence length comparison (
simplify-len-comparison
)
for gc_tag in sc_gc.keys(): | ||
for gc_tag in sc_gc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function add_start_context_probabilities
refactored with the following changes:
- Remove unnecessary call to keys() (
remove-dict-keys
)
list_mgm_models = list() # type: List[List[float, float, MGMMotifModelV2]] | ||
list_mgm_models = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function build_mgm_motif_models_for_all_gc
refactored with the following changes:
- Replace
list()
with[]
(list-literal
) - Simplify sequence length comparison (
simplify-len-comparison
)
This removes the following comments ( why? ):
# type: List[List[float, float, MGMMotifModelV2]]
if True or "RBS" in output_tag: | ||
# create a label for each shift | ||
for shift, prob in motif._shift_prior.items(): | ||
prob /= 100.0 | ||
output_tag_ws = f"{output_tag}_{int(shift)}" | ||
try: | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAT"] = motif._motif[shift] | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_POS_DISTR"] = motif._spacer[ | ||
shift] | ||
except KeyError: | ||
pass | ||
|
||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}"] = 1 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_ORDER"] = 0 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_WIDTH"] = width | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MARGIN"] = 0 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAX_DUR"] = dur | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_SHIFT"] = prob | ||
else: | ||
# promoter aren't shifted (for now) | ||
best_shift = max(motif._shift_prior.items(), key=operator.itemgetter(1))[0] | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_MAT"] = motif._motif[best_shift] | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_POS_DISTR"] = motif._spacer[best_shift] | ||
|
||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}"] = 1 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_ORDER"] = 0 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_WIDTH"] = width | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_MARGIN"] = 0 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag}_MAX_DUR"] = dur | ||
# create a label for each shift | ||
for shift, prob in motif._shift_prior.items(): | ||
prob /= 100.0 | ||
output_tag_ws = f"{output_tag}_{int(shift)}" | ||
try: | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAT"] = motif._motif[shift] | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_POS_DISTR"] = motif._spacer[ | ||
shift] | ||
except KeyError: | ||
pass | ||
|
||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}"] = 1 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_ORDER"] = 0 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_WIDTH"] = width | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MARGIN"] = 0 | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_MAX_DUR"] = dur | ||
mgm.items_by_species_and_gc[genome_tag][str(gc)].items[f"{output_tag_ws}_SHIFT"] = prob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function add_motif_probabilities
refactored with the following changes:
- Remove redundant conditional (
remove-redundant-if
)
This removes the following comments ( why? ):
# promoter aren't shifted (for now)
mgm, | ||
"RBS", f"RBS_{o}", genome_type, plot=plot | ||
) | ||
|
||
for o, l in zip(output_group, learn_from): | ||
add_motif_probabilities( | ||
env, | ||
df_type[(df_type["GENOME_TYPE"].isin(l))], | ||
mgm, | ||
"RBS", f"RBS_{o}", genome_type, plot=plot | ||
) | ||
if "Promoter" in components: | ||
df_type = df[df["Type"] == genome_type] | ||
if genome_type == "Archaea": | ||
output_group = ["D"] | ||
learn_from = [{"D"}] # always learn Promoter form group D | ||
|
||
df_type = df[df["Type"] == genome_type] | ||
for o, l in zip(output_group, learn_from): | ||
add_motif_probabilities( | ||
env, | ||
df_type[(df_type["GENOME_TYPE"].isin(l))], | ||
mgm, | ||
"PROMOTER", f"PROMOTER_{o}", genome_type, plot=plot | ||
) | ||
else: | ||
output_group = ["C"] | ||
learn_from = [{"C"}] # always learn Promoter form group C | ||
|
||
df_type = df[df["Type"] == genome_type] | ||
for o, l in zip(output_group, learn_from): | ||
add_motif_probabilities( | ||
env, | ||
df_type[(df_type["GENOME_TYPE"].isin(l))], | ||
mgm, | ||
"PROMOTER", f"PROMOTER_{o}", genome_type, plot=plot | ||
) | ||
|
||
for o, l in zip(output_group, learn_from): | ||
add_motif_probabilities( | ||
env, | ||
df_type[(df_type["GENOME_TYPE"].isin(l))], | ||
mgm, | ||
"PROMOTER", f"PROMOTER_{o}", genome_type, plot=plot | ||
) | ||
# Start Context | ||
if "Start Context" in components: | ||
if genome_type == "Archaea": | ||
output_group = ["A", "D"] | ||
learn_from = learn_from_arc | ||
|
||
for o, l in zip(output_group, learn_from): | ||
df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))] | ||
add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_RBS_{o}", genome_type=genome_type, | ||
plot=plot) | ||
else: | ||
output_group = ["A", "B", "C", "X"] | ||
learn_from = [{"A"}, {"B"}, {"C"}, {"A"}] | ||
|
||
for o, l in zip(output_group, learn_from): | ||
df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))] | ||
add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_RBS_{o}", genome_type=genome_type, | ||
plot=plot) | ||
|
||
for o, l in zip(output_group, learn_from): | ||
df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))] | ||
add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_RBS_{o}", genome_type=genome_type, | ||
plot=plot) | ||
# promoter | ||
if genome_type == "Archaea": | ||
output_group = ["D"] | ||
learn_from = [{"A", "D"}] # always learn RBS form group A | ||
|
||
for o, l in zip(output_group, learn_from): | ||
df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))] | ||
|
||
# NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug | ||
# GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us | ||
# to learn from group A genomes as well. | ||
add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_PROMOTER_{o}", genome_type=genome_type, | ||
plot=plot) | ||
else: | ||
output_group = ["C"] | ||
learn_from = [{"C"}] | ||
|
||
for o, l in zip(output_group, learn_from): | ||
df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))] | ||
# NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug | ||
# GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us | ||
# to learn from group A genomes as well. | ||
add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_PROMOTER_{o}", genome_type=genome_type, | ||
plot=plot) | ||
for o, l in zip(output_group, learn_from): | ||
df_curr = df[(df["Type"] == genome_type) & (df["GENOME_TYPE"].isin(l))] | ||
|
||
# NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug | ||
# GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us | ||
# to learn from group A genomes as well. | ||
add_start_context_probabilities(df_curr, mgm, "SC_RBS", f"SC_PROMOTER_{o}", genome_type=genome_type, | ||
plot=plot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function build_mgm_models_from_gms2_models
refactored with the following changes:
- Remove redundant conditional (
remove-redundant-if
) - Hoist repeated code outside conditional statement [×6] (
hoist-statement-from-if
)
This removes the following comments ( why? ):
# add_stop_codon_probabilities(df, mgm, genome_type=genome_type, plot=plot)
# NOTE: SC_PROMOTER is intentionally learned from SC_RBS. This is not a bug
# to learn from group A genomes as well.
# add_stop_codon_probabilities(df, mgm, genome_type="Archaea", plot=plot)
# if "Stop Codons" in components:
# GMS2 has equal values for SC_RBS and SC_PROMOTER. Training from SC_RBS therefore allows us
list_entries = list() | ||
list_entries = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function collect_start_info_from_gil
refactored with the following changes:
- Replace
list()
with[]
(list-literal
)
list_entries = list() | ||
list_entries = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function collect_start_info_from_gil_and_print_to_file
refactored with the following changes:
- Replace
list()
with[]
(list-literal
)
df[f"CONSENSUS_RBS_MAT"] = df.apply(lambda r: get_consensus_sequence(r["Mod"].items["RBS_MAT"]), axis=1) | ||
df["CONSENSUS_RBS_MAT"] = df.apply( | ||
lambda r: get_consensus_sequence(r["Mod"].items["RBS_MAT"]), axis=1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function load_gms2_models_from_pickle
refactored with the following changes:
- Replace f-string with no interpolated values with string (
remove-redundant-fstring
)
peak_to_list_pos_dist[peak] = list() | ||
peak_to_list_pos_dist[peak] = [] | ||
peak_to_list_pos_dist[peak].append(l) | ||
|
||
# average positions (per peak) | ||
values = dict() | ||
peak_counter = 0 | ||
for peak in peak_to_list_pos_dist.keys(): | ||
for peak in peak_to_list_pos_dist: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function merge_spacers_by_peak
refactored with the following changes:
- Replace
list()
with[]
[×3] (list-literal
) - Remove unnecessary call to keys() [×3] (
remove-dict-keys
) - Replace identity comprehension with call to collection constructor (
identity-comprehension
)
elif tool == "mprodigal" or tool == "prodigal": | ||
elif tool in ["mprodigal", "prodigal"]: | ||
gcode_per_contig = get_gcode_per_contig_for_mprodigal(pf_prediction) | ||
else: | ||
raise ValueError("Unknown tool") | ||
|
||
num_matches = sum([1 for v in gcode_per_contig.values() if str(v) == gcode_true]) | ||
num_mismatches = sum([1 for v in gcode_per_contig.values() if str(v) != gcode_true]) | ||
num_matches = sum(1 for v in gcode_per_contig.values() if str(v) == gcode_true) | ||
num_mismatches = sum( | ||
1 for v in gcode_per_contig.values() if str(v) != gcode_true | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_accuracy_gcode_predicted
refactored with the following changes:
- Replace multiple comparisons of same variable with
in
operator (merge-comparisons
) - Replace unneeded comprehension with generator [×2] (
comprehension-to-generator
)
list_entries = list() | ||
list_entries = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function compute_gcode_accuracy_for_tool_on_sequence
refactored with the following changes:
- Replace
list()
with[]
(list-literal
)
list_entries = list() | ||
list_entries = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function compute_gcode_accuracy_for_tools_on_chunk_deprecated
refactored with the following changes:
- Replace
list()
with[]
(list-literal
)
list_df = list() | ||
list_df = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function compute_gcode_accuracy_for_tools_on_chunk
refactored with the following changes:
- Replace
list()
with[]
(list-literal
)
list_df = list() | ||
list_df = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function compute_gcode_accuracy_for_gi
refactored with the following changes:
- Replace
list()
with[]
[×2] (list-literal
)
# type: (Environment, GenomeInfoList, List[str], List[int], Dict[str, Any]) -> pd.DataFrame | ||
list_df = list() | ||
|
||
for gi in gil: | ||
list_df.append( | ||
compute_gcode_accuracy_for_gi(env, gi, tools, chunks, **kwargs) | ||
) | ||
|
||
list_df = [ | ||
compute_gcode_accuracy_for_gi(env, gi, tools, chunks, **kwargs) | ||
for gi in gil | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function compute_gcode_accuracy
refactored with the following changes:
- Convert for loop into list comprehension (
list-comprehension
) - Replace
list()
with[]
(list-literal
)
This removes the following comments ( why? ):
# type: (Environment, GenomeInfoList, List[str], List[int], Dict[str, Any]) -> pd.DataFrame
if p_true == 0: | ||
return float('inf') | ||
return 1.0 / p_true - 1 | ||
return float('inf') if p_true == 0 else 1.0 / p_true - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function ratio_false_true
refactored with the following changes:
- Lift code into else after jump in control flow (
reintroduce-else
) - Replace if statement with if expression (
assign-if-exp
)
logger.critical("Random-seed: {}".format(rs)) | ||
logger.critical(f"Random-seed: {rs}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Replace call to format with f-string [×3] (
use-fstring-for-formatting
) - Swap positions of nested conditionals [×6] (
swap-nested-ifs
)
return list() | ||
return [] | ||
|
||
if len(list_tag_value_pairs) % 2 != 0: | ||
raise ValueError("Tag/value pairs list must have a length multiple of 2.") | ||
|
||
list_parsed = list() | ||
for i in range(0, len(list_tag_value_pairs), 2): | ||
list_parsed.append((list_tag_value_pairs[i], list_tag_value_pairs[i + 1])) | ||
return list_parsed | ||
return [ | ||
(list_tag_value_pairs[i], list_tag_value_pairs[i + 1]) | ||
for i in range(0, len(list_tag_value_pairs), 2) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function parse_tags_from_list
refactored with the following changes:
- Replace
list()
with[]
[×2] (list-literal
) - Convert for loop into list comprehension (
list-comprehension
) - Inline variable that is immediately returned (
inline-immediately-returned-variable
)
elif prl_options["use-pbs"]: | ||
# setup PBS jobs | ||
pbs = PBS(env, prl_options, splitter=split_gil, merger=merge_identity) | ||
pbs.run( | ||
data={"gil": gil}, | ||
func=helper_run_mgm_on_genome_list, | ||
func_kwargs={"env": env, "pf_mgm_mod": pf_mgm_mod, **kwargs} | ||
) | ||
else: | ||
# PBS Parallelization | ||
if prl_options["use-pbs"]: | ||
# setup PBS jobs | ||
pbs = PBS(env, prl_options, splitter=split_gil, merger=merge_identity) | ||
pbs.run( | ||
data={"gil": gil}, | ||
func=helper_run_mgm_on_genome_list, | ||
func_kwargs={"env": env, "pf_mgm_mod": pf_mgm_mod, **kwargs} | ||
) | ||
# Multithreading parallelization | ||
else: | ||
# parallel using threads | ||
run_n_per_thread( | ||
list(gil), run_mgm_on_gi, "gi", | ||
{"env": env, "pf_mgm_mod": pf_mgm_mod, **kwargs}, | ||
simultaneous_runs=prl_options.safe_get("num-processors") | ||
) | ||
# parallel using threads | ||
run_n_per_thread( | ||
list(gil), run_mgm_on_gi, "gi", | ||
{"env": env, "pf_mgm_mod": pf_mgm_mod, **kwargs}, | ||
simultaneous_runs=prl_options.safe_get("num-processors") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function run_mgm_on_genome_list
refactored with the following changes:
- Merge else clause's nested if statement into elif (
merge-else-if-into-elif
)
This removes the following comments ( why? ):
# Multithreading parallelization
# PBS Parallelization
# type: (Environment, GenomeInfoList, List[str], List[int], Dict[str, Any]) -> None | ||
list_df = list() | ||
for gi in gil: | ||
list_df.append(run_tools_on_gi(env, gi, tools, chunks, **kwargs)) | ||
|
||
list_df = [run_tools_on_gi(env, gi, tools, chunks, **kwargs) for gi in gil] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function run_tools_on_gil
refactored with the following changes:
- Convert for loop into list comprehension (
list-comprehension
) - Replace
list()
with[]
(list-literal
)
This removes the following comments ( why? ):
# type: (Environment, GenomeInfoList, List[str], List[int], Dict[str, Any]) -> None
if len(pf_predictions) == 0: | ||
if not pf_predictions: | ||
return pd.DataFrame() | ||
|
||
name_to_labels = { | ||
t: read_labels_from_file(pf_predictions[t], shift=-1, name=t) for t in pf_predictions.keys() | ||
} # type: Dict[str, Labels] | ||
t: read_labels_from_file(pf_predictions[t], shift=-1, name=t) | ||
for t in pf_predictions | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function stats_per_gene_for_gi
refactored with the following changes:
- Simplify sequence length comparison (
simplify-len-comparison
) - Remove unnecessary call to keys() [×2] (
remove-dict-keys
) - Replace
list()
with[]
(list-literal
) - Inline variable that is immediately returned (
inline-immediately-returned-variable
)
This removes the following comments ( why? ):
# df[f"5p-{t}"] = df[f"5p-{t}"].astype(int)
# for t in tools.keys():
# type: Dict[str, Labels]
list_df = list() | ||
for gi in gil: | ||
list_df.append(stats_per_gene_for_gi(env, gi, tools, **kwargs)) | ||
|
||
if len(list_df) == 0: | ||
if list_df := [ | ||
stats_per_gene_for_gi(env, gi, tools, **kwargs) for gi in gil | ||
]: | ||
return pd.concat(list_df, ignore_index=True, sort=False) | ||
else: | ||
return pd.DataFrame() | ||
|
||
return pd.concat(list_df, ignore_index=True, sort=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function helper_stats_per_gene
refactored with the following changes:
- Convert for loop into list comprehension (
list-comprehension
) - Replace
list()
with[]
(list-literal
) - Simplify sequence length comparison (
simplify-len-comparison
) - Lift code into else after jump in control flow (
reintroduce-else
) - Swap if/else branches (
swap-if-else-branches
) - Use named expression to simplify assignment and conditional (
use-named-expression
)
df = pd.concat(list_df, ignore_index=True, sort=False) | ||
|
||
# threading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function stats_per_gene
refactored with the following changes:
- Hoist repeated code outside conditional statement (
hoist-statement-from-if
)
This removes the following comments ( why? ):
# threading
tool_to_dir = {a: b for a, b in zip(tools, dn_tools)} | ||
tool_to_dir = dict(zip(tools, dn_tools)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Replace identity comprehension with call to collection constructor (
identity-comprehension
)
Branch
master
refactored by Sourcery.If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.
See our documentation here.
Run Sourcery locally
Reduce the feedback loop during development by using the Sourcery editor plugin:
Review changes via command line
To manually merge these changes, make sure you're on the
master
branch, then run:Help us improve this pull request!