Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery refactored develop branch #1

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Dec 4, 2023

Branch develop refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the develop branch, then run:

git fetch origin sourcery/develop
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from ben-silke December 4, 2023 23:04
Copy link
Author

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sourcery timed out performing refactorings.

Due to GitHub API limits, only the first 60 comments can be shown.

@@ -2,6 +2,7 @@
"""
This takes doctest files and turns them into standalone scripts.
"""

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 25-33 refactored with the following changes:

Comment on lines -114 to +126
for sub_word in just:
if sub_word in fn:
new.append(fn)
new.extend(fn for sub_word in just if sub_word in fn)
file_paths = new
elif exclude:
exclude = exclude.split(",")
new = []
for fn in file_paths:
keep = True
for sub_word in exclude:
if sub_word in fn:
keep = False
break
keep = all(sub_word not in fn for sub_word in exclude)
if keep:
new.append(fn)
file_paths = new

if verbose:
print("File paths, after filtering: %s" % str(file_paths))
print(f"File paths, after filtering: {str(file_paths)}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False

return True
return all(n % i != 0 for i in range(3, sqrt_n + 1, 2))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function is_prime refactored with the following changes:

for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False

return True
return all(n % i != 0 for i in range(3, sqrt_n + 1, 2))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function is_prime refactored with the following changes:

@@ -2,6 +2,7 @@
analysis experience within Jupyter notebooks plus supporting parallel
execution on compute systems with 1000s of CPUs."""

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 94-94 refactored with the following changes:

Comment on lines -381 to +374
mantissa = 0.0
if kw["use_scaling"]:
mantissa = numpy.log(0.0)

mantissa = numpy.log(0.0) if kw["use_scaling"] else 0.0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Pair.calc_rows refactored with the following changes:

Comment on lines -507 to +501
# XXX the int case should be a different method?
if isinstance(index, int):
return self.pred[index]
else:
pog = self.pog[index]
leaf = self.leaf[index]
return AlignablePOG(leaf, pog)
pog = self.pog[index]
leaf = self.leaf[index]
return AlignablePOG(leaf, pog)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function AlignablePOG.__getitem__ refactored with the following changes:

This removes the following comments ( why? ):

# XXX the int case should be a different method?

Comment on lines -565 to +560
# XXX the int case should be a different method?
if isinstance(index, int):
if index == 0:
return []
elif 0 < index < len(self.index):
return [index - 1]
else:
raise IndexError(index)
# elif index == slice(None, None, None):
# return self
else:
if not isinstance(index, int):
return AlignableSeq(self.leaf[index])
if index == 0:
return []
elif 0 < index < len(self.index):
return [index - 1]
else:
raise IndexError(index)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function AlignableSeq.__getitem__ refactored with the following changes:

This removes the following comments ( why? ):

#    return self
# XXX the int case should be a different method?
# elif index == slice(None, None, None):

Comment on lines -676 to +660
if kw["use_logs"]:
(impossible, inevitable) = (-numpy.inf, 0.0)
else:
(impossible, inevitable) = (0.0, 1.0)
(impossible, inevitable) = (-numpy.inf, 0.0) if kw["use_logs"] else (0.0, 1.0)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function PairEmissionProbs._calc_global_probs refactored with the following changes:

Comment on lines -800 to +781
result = numpy.array([probs[p_rows.index(i)] for i in last_row])
return result
return numpy.array([probs[p_rows.index(i)] for i in last_row])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function PairEmissionProbs.scores_at_rows refactored with the following changes:

Comment on lines -820 to +800
self.pair.size[0] - 2 >= 3
self.pair.size[0] >= 5
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function PairEmissionProbs.dp refactored with the following changes:

Comment on lines -1098 to -1106
# Used during progressive sequence alignment.
# Because the alignment depends on the total length (so long as the
# model is reversable!) the same cached viterbi result can be re-used
# to calculate the partial likelihoods even if the root of the 2-seq
# tree is moved around.
alignable = self.pair_hmm.emission_probs.get_alignable(
return self.pair_hmm.emission_probs.get_alignable(
self.aligned_positions, ratio=ratio
)
return alignable
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function GlobalViterbiPath.get_alignable refactored with the following changes:

This removes the following comments ( why? ):

# Because the alignment depends on the total length (so long as the
# model is reversable!) the same cached viterbi result can be re-used
# tree is moved around.
# Used during progressive sequence alignment.
# to calculate the partial likelihoods even if the root of the 2-seq

), "names don't match between seqs and tree: tree=%s; seqs=%s" % (
tip_names,
seq_names,
)
), f"names don't match between seqs and tree: tree={tip_names}; seqs={seq_names}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TreeAlign refactored with the following changes:

if pos is not None:
c = seqs[dimension][pos]
else:
c = gap_value
c = seqs[dimension][pos] if pos is not None else gap_value
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function seq_traceback refactored with the following changes:

Comment on lines -22 to +27
for tys in _types.keys():
for tys in _types:
types = getattr(obj, tys, None) or []
types = [types] if type(types) == str else types
_types[tys] = [{None: ""}.get(e, e) for e in types]

row = [
return [
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _get_app_attr refactored with the following changes:

Comment on lines -299 to +295
if pattern:
result = [m for m in self if fnmatch(m, pattern)]
else:
result = [m for m in self if callback(m)]
return result
return (
[m for m in self if fnmatch(m, pattern)]
if pattern
else [m for m in self if callback(m)]
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function ReadOnlyDataStoreBase.filtered refactored with the following changes:

Comment on lines -491 to +483
if not str(relativeid) in self:
if str(relativeid) not in self:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function WritableDataStoreBase.add_file refactored with the following changes:

  • Simplify logical expression using De Morgan identities (de-morgan)

Comment on lines -587 to +579
for f in p.iterdir():
if get_format_suffixes(str(f))[0] not in allowed:
return True
return False
return any(get_format_suffixes(str(f))[0] not in allowed for f in p.iterdir())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function WritableDirectoryDataStore._has_other_suffixes refactored with the following changes:

  • Use any() instead of for loop (use-any)

Comment on lines -1054 to +1043
relativeid = str(relativeid).replace(suffixes, f"-{num}{suffixes}")
relativeid = relativeid.replace(suffixes, f"-{num}{suffixes}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function WritableTinyDbDataStore.add_file refactored with the following changes:

data = dict(record for record in self._parser(data))
data = dict(iter(self._parser(data)))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _seq_loader.load refactored with the following changes:

if self.as_type == "pssm":
return make_pssm_from_tabular(data)

return None
return make_pssm_from_tabular(data) if self.as_type == "pssm" else None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_tabular.load refactored with the following changes:

Comment on lines -444 to +441
stored = self.data_store.write(identifier, output)
return stored
return self.data_store.write(identifier, output)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function write_tabular.write refactored with the following changes:

aln = self.lf.simulate_alignment()
return aln
return self.lf.simulate_alignment()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function model_result.simulate_alignment refactored with the following changes:

Comment on lines -366 to +370
if isinstance(model, DiscreteSubstitutionModel):
length_as = "paralinear"
else:
length_as = "ENS"

if not hasattr(self, "_tree"):
length_as = (
"paralinear"
if isinstance(model, DiscreteSubstitutionModel)
else "ENS"
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function model_result.tree refactored with the following changes:

Comment on lines -616 to +619
pvalue = 1
return 1
elif self.LR > 0:
pvalue = chisqprob(self.LR, self.df)
return chisqprob(self.LR, self.df)
else:
pvalue = None
return pvalue
return None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function hypothesis_result.pvalue refactored with the following changes:

Comment on lines -117 to +127
chars = set([chars])
chars = {chars}

def just_chars(data):
if is_array:
data = set(data.flatten())
else:
data = set("".join(data))
data = set(data.flatten()) if is_array else set("".join(data))
return data <= chars

def not_chars(data):
if is_array:
data = set(data.flatten())
else:
data = set("".join(data))
data = set(data.flatten()) if is_array else set("".join(data))
return not data & chars

if negate:
return not_chars

return just_chars
return not_chars if negate else just_chars
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function AllowedCharacters refactored with the following changes:

Comment on lines -186 to +177
self.gap_chars = set([gap_chars])
self.gap_chars = {gap_chars}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function GapsOk.__init__ refactored with the following changes:

Comment on lines -191 to +184
if self.is_array:
data = Counter(data.flatten())
else:
data = Counter("".join(data))

data = Counter(data.flatten()) if self.is_array else Counter("".join(data))
num_gap = sum(data[g] for g in self.gap_chars)
gap_frac = num_gap / length
return gap_frac
return num_gap / length
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function GapsOk._get_gap_frac refactored with the following changes:

Comment on lines -463 to +454
if info:
info = InfoClass(info)
else:
info = InfoClass()
info = InfoClass(info) if info else InfoClass()
self.info = info
# if we're forcing the same data, skip the validation
if force_same_data:
self._force_same_data(data, names)
if isinstance(data, dict):
curr_seqs = list(data.values())
else:
curr_seqs = data
# otherwise, figure out what we got and coerce it into the right type
curr_seqs = list(data.values()) if isinstance(data, dict) else data
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _SequenceCollectionBase.__init__ refactored with the following changes:

This removes the following comments ( why? ):

# otherwise, figure out what we got and coerce it into the right type

Comment on lines -658 to +653
if (names is None) or (None in names):
per_seq_names = name_order
else: # got names from seqs, so assume name_order is in Names
per_seq_names = names

per_seq_names = name_order if (names is None) or (None in names) else names
# check for duplicate names
duplicates, fixed_names, fixed_seqs = self._strip_duplicates(
per_seq_names, curr_seqs
)
if duplicates:
if remove_duplicate_names:
per_seq_names, curr_seqs = fixed_names, fixed_seqs
# if name_order doesn't have the same names as per_seq_names,
# replace it with per_seq_names
if (set(name_order) != set(per_seq_names)) or (
len(name_order) != len(per_seq_names)
):
name_order = per_seq_names
else:
if not remove_duplicate_names:
raise ValueError(
"Some names were not unique. Duplicates are:\n"
+ str(sorted(duplicates.keys()))
)

per_seq_names, curr_seqs = fixed_names, fixed_seqs
# if name_order doesn't have the same names as per_seq_names,
# replace it with per_seq_names
if (set(name_order) != set(per_seq_names)) or (
len(name_order) != len(per_seq_names)
):
name_order = per_seq_names
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _SequenceCollectionBase._names_seqs_order refactored with the following changes:

This removes the following comments ( why? ):

# got names from seqs, so assume name_order is in Names

Copy link
Author

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Type: Refactoring

Summary of PR: This PR includes a series of refactoring changes aimed at improving the readability and efficiency of the codebase. It utilizes modern Python features such as list comprehensions, ternary operators, and f-strings to simplify the code. The changes also include logical simplifications, such as removing unnecessary conditionals and using set comprehensions.

General PR suggestions

  • Review the changes to ensure that the refactoring does not introduce any logical errors, especially where conditions have been simplified or ternary operators have been introduced.
  • Verify that the use of modern Python features such as f-strings and list comprehensions maintains the readability and does not compromise the maintainability of the code.
  • Consider encapsulating logic within object methods where appropriate, such as using an 'add' method for adding children to a tree node, to keep the code modular and maintainable.
  • Ensure that any changes to loop conditions or removal of checks do not affect the intended functionality, particularly in cases where input validation is crucial.
  • While the stylistic changes like reordering comparisons are generally positive, make sure they are consistent throughout the codebase to maintain a uniform coding style.

Your trial expires on December 18, 2023. Please email [email protected] to continue using Sourcery ✨

@@ -107,7 +107,7 @@ def __init__(
if children is not None:
self.extend(children)
self._parent = parent
if (parent is not None) and not (self in parent.children):
if parent is not None and self not in parent.children:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (llm): Consider using the 'add' method instead of 'append' for adding an element to the parent's children list. This would encapsulate the logic of adding a child within the parent object, which might include additional checks or operations.

Suggested change
if parent is not None and self not in parent.children:
parent.add(self)

return True

return self.name == other.name
return True if self is other else self.name == other.name
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (llm): The ternary operation is unnecessary here. You can directly return the result of the logical OR operation.

Suggested change
return True if self is other else self.name == other.name
return self is other or self.name == other.name

return self.pre_and_postorder(include_self=include_self)
else:
return self.preorder(include_self=include_self)
return (
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (llm): The refactoring to use a ternary operation for the return statement is a nice touch for readability. However, ensure that the readability is not compromised when the ternary operation becomes too complex or nested.

@@ -676,7 +670,7 @@ def separation(self, other):
if id(other) in my_ancestors:
# need to figure out how many steps there were back from self
curr = self
while not (curr is None or curr is other):
while curr is not None and curr is not other:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (llm): The condition 'curr is not other' will never be False if 'curr is not None' is True. This seems to be a logical error, as the intent is likely to check if 'curr' is 'other' or not.

Suggested change
while curr is not None and curr is not other:
while curr is not None and curr != other:

@@ -623,7 +613,7 @@ def __len__(self):
return self.length

def __repr__(self):
return repr(self.spans) + f"/{self.parent_length}"
return f"{repr(self.spans)}/{self.parent_length}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (llm): Using f-string for the representation is more concise and readable. Good use of modern Python features.

else:
pos2_dg = get_dg(pos2_probs, aln_probs)

pos2_dg = dgs[pos2] if dgs else get_dg(pos2_probs, aln_probs)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (llm): Using the inline if-else statement for conditional assignment is a neat way to reduce the number of lines and improve readability. However, ensure that the logic is clear and maintainable, especially for more complex conditions.

@@ -1038,7 +997,7 @@ def to_rich_dict(self):
data = {}
moltype = self.moltype.label
info = {} if self.info is None else self.info
if not info.get("Refs", None) is None and "Refs" in info:
if info.get("Refs", None) is not None and "Refs" in info:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (llm): The check for 'Refs' in info seems redundant since 'get' already ensures it exists. You could simplify the condition.

Suggested change
if info.get("Refs", None) is not None and "Refs" in info:
if info.get("Refs") is not None:

return AlignablePOG(leaf, pog)
pog = self.pog[index]
leaf = self.leaf[index]
return AlignablePOG(leaf, pog)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (llm): The removal of the check for 'index' being an integer might lead to unexpected behavior when 'index' is not an integer. It's important to ensure that the input is validated or handled appropriately.

Suggested change
return AlignablePOG(leaf, pog)
if isinstance(index, int): return self.pred[index] else: return AlignablePOG(self.leaf[index], self.pog[index])

@@ -973,7 +959,7 @@ def formatted_array(
base_format = "d"
elif "float" in type_name:
base_format = f".{precision}f"
elif "bool" == type_name:
elif type_name == "bool":
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (llm): The change from 'elif "bool" == type_name:' to 'elif type_name == "bool":' is purely stylistic and does not affect functionality, but it does align with the more common practice of placing the variable on the left side of the comparison.

if count >= abs_cutoff:
result.append(char)
return result
return [char for char, count in zip(alphabet, counts) if count >= abs_cutoff]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise (llm): Refactoring to use list comprehension for generating the result list is a good improvement for readability and conciseness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants