Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery Starbot ⭐ refactored guyrosin/temporal_attention #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SourceryAI
Copy link

Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch https://github.com/sourcery-ai-bot/temporal_attention main
git merge --ff-only FETCH_HEAD
git reset HEAD^

dataset = DatasetDict({"train": train_dataset, "validation": test_dataset})
return dataset
return DatasetDict({"train": train_dataset, "validation": test_dataset})
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_train_test_datasets refactored with the following changes:

Comment on lines -123 to +122
exclude_similar_sentences = True if corpus_name.startswith("liverpool") else False
exclude_similar_sentences = bool(corpus_name.startswith("liverpool"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function split_temporal_dataset_files refactored with the following changes:

Comment on lines -206 to +205
logger.info(f"Finding relevant sentences in the corpus...")
logger.info("Finding relevant sentences in the corpus...")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function find_sentences_of_words refactored with the following changes:

Comment on lines -72 to -76
kwargs.update(additional_kwargs)
config = AutoConfig.from_pretrained(
kwargs |= additional_kwargs
return AutoConfig.from_pretrained(
model_args.model_name_or_path, cache_dir=model_args.cache_dir, **kwargs
)
return config
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _load_auto_config refactored with the following changes:

Comment on lines -302 to +304
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}, "
+ f"distributed training: {bool(training_args.local_rank != -1)}"
(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}, "
+ f"distributed training: {training_args.local_rank != -1}"
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function init_run refactored with the following changes:

Comment on lines -236 to +249
# For backward compatibility, allow to try to setup 'max_len_sentences_pair'.
if (
value == self.model_max_length - self.num_special_tokens_to_add(pair=True)
and self.verbose
value
!= self.model_max_length - self.num_special_tokens_to_add(pair=True)
or not self.verbose
):
if not self.deprecation_warnings.get("max_len_sentences_pair", False):
logger.warning(
"Setting 'max_len_sentences_pair' is now deprecated. "
"This value is automatically set up."
)
self.deprecation_warnings["max_len_sentences_pair"] = True
else:
raise ValueError(
"Setting 'max_len_sentences_pair' is now deprecated. "
"This value is automatically set up."
)
if not self.deprecation_warnings.get("max_len_sentences_pair", False):
logger.warning(
"Setting 'max_len_sentences_pair' is now deprecated. "
"This value is automatically set up."
)
self.deprecation_warnings["max_len_sentences_pair"] = True
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase.max_len_sentences_pair refactored with the following changes:

This removes the following comments ( why? ):

# For backward compatibility, allow to try to setup 'max_len_sentences_pair'.

Comment on lines -372 to +370
"Model name '{}' not found in model shortcut name list ({}). "
"Assuming '{}' is a path, a model identifier, or url to a directory containing tokenizer files.".format(
pretrained_model_name_or_path,
", ".join(s3_models),
pretrained_model_name_or_path,
)
f"""Model name '{pretrained_model_name_or_path}' not found in model shortcut name list ({", ".join(s3_models)}). Assuming '{pretrained_model_name_or_path}' is a path, a model identifier, or url to a directory containing tokenizer files."""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase.from_pretrained refactored with the following changes:

if config_tokenizer_class is not None:
if cls.__name__.replace("Fast", "") != config_tokenizer_class.replace(
"Fast", ""
):
logger.warning(
"The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. "
"It may result in unexpected tokenization. \n"
f"The tokenizer class you load from this checkpoint is '{config_tokenizer_class}'. \n"
f"The class this function is called from is '{cls.__name__}'."
)
if config_tokenizer_class is not None and cls.__name__.replace(
"Fast", ""
) != config_tokenizer_class.replace("Fast", ""):
logger.warning(
"The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. "
"It may result in unexpected tokenization. \n"
f"The tokenizer class you load from this checkpoint is '{config_tokenizer_class}'. \n"
f"The class this function is called from is '{cls.__name__}'."
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase._from_pretrained refactored with the following changes:

Comment on lines -769 to +766
(filename_prefix + "-" if filename_prefix else "")
+ SPECIAL_TOKENS_MAP_FILE,
(
(f"{filename_prefix}-" if filename_prefix else "")
+ SPECIAL_TOKENS_MAP_FILE
),
)
tokenizer_config_file = os.path.join(
save_directory,
(filename_prefix + "-" if filename_prefix else "") + TOKENIZER_CONFIG_FILE,
(f"{filename_prefix}-" if filename_prefix else "")
+ TOKENIZER_CONFIG_FILE,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase.save_pretrained refactored with the following changes:

Comment on lines -863 to +855
(filename_prefix + "-" if filename_prefix else "") + ADDED_TOKENS_FILE,
(f"{filename_prefix}-" if filename_prefix else "") + ADDED_TOKENS_FILE,
)
added_vocab = self.get_added_vocab()
if added_vocab:
if added_vocab := self.get_added_vocab():
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase._save_pretrained refactored with the following changes:

Comment on lines -1018 to +1013
if max_length is not None:
if max_length is not None and (
truncation is False or truncation == "do_not_truncate"
):
warnings.warn(
"`max_length` is ignored when `padding`=`True` and there is no truncation strategy. "
"To pad to max length, use `padding='max_length'`."
)
if max_length is not None and (
truncation is False or truncation == "do_not_truncate"
):
warnings.warn(
"`max_length` is ignored when `padding`=`True` and there is no truncation strategy. "
"To pad to max length, use `padding='max_length'`."
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase._get_padding_truncation_strategies refactored with the following changes:

Comment on lines -1642 to +1630
inputs = dict((k, v[i]) for k, v in encoded_inputs.items())
inputs = {k: v[i] for k, v in encoded_inputs.items()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase.pad refactored with the following changes:

Comment on lines -1694 to +1682
if token_ids_1 is None:
return token_ids_0
return token_ids_0 + token_ids_1
return token_ids_0 if token_ids_1 is None else token_ids_0 + token_ids_1
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase.build_inputs_with_special_tokens refactored with the following changes:

Comment on lines -1750 to +1736
pair = bool(pair_ids is not None)
pair = pair_ids is not None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase.prepare_for_model refactored with the following changes:

Comment on lines -1916 to +1900
error_msg = (
error_msg + "Please select another truncation strategy than "
f"{truncation_strategy}, for instance 'longest_first' or 'only_second'."
)
error_msg = f"{error_msg}Please select another truncation strategy than {truncation_strategy}, for instance 'longest_first' or 'only_second'."
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerBase.truncate_sequences refactored with the following changes:

Comment on lines -627 to +622
clean_text = self.clean_up_tokenization(text)
return clean_text
return self.clean_up_tokenization(text)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerFast._decode refactored with the following changes:

Comment on lines -661 to +658
(filename_prefix + "-" if filename_prefix else "") + ADDED_TOKENS_FILE,
(f"{filename_prefix}-" if filename_prefix else "")
+ ADDED_TOKENS_FILE,
)
added_vocab = self.get_added_vocab()
if added_vocab:
if added_vocab := self.get_added_vocab():
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoPreTrainedTokenizerFast._save_pretrained refactored with the following changes:

Comment on lines -356 to +357
if data_args.line_by_line:
tokenized_dataset = tokenize_dataset_line_by_line(
return (
tokenize_dataset_line_by_line(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_data refactored with the following changes:

Comment on lines -6 to -19
"configuration_tempobert": [
"TempoBertConfig",
"configuration_tempobert": ["TempoBertConfig"],
"tokenization_tempobert_fast": ["TempoBertTokenizerFast"],
"modeling_tempobert": [
"TempoBertForMaskedLM",
"TempoBertModel",
"TempoBertForPreTraining",
"TempoBertForSequenceClassification",
"TempoBertForTokenClassification",
],
}

_import_structure["tokenization_tempobert_fast"] = ["TempoBertTokenizerFast"]

_import_structure["modeling_tempobert"] = [
"TempoBertForMaskedLM",
"TempoBertModel",
"TempoBertForPreTraining",
"TempoBertForSequenceClassification",
"TempoBertForTokenClassification",
]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 6-19 refactored with the following changes:

Comment on lines -80 to +81
SPECIAL_TIMES_COUNT = 2 # NOTE: hardcoded (see TempoSpecialTokensMixin)
if "attention" in self.time_embedding_type:
SPECIAL_TIMES_COUNT = 2 # NOTE: hardcoded (see TempoSpecialTokensMixin)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function TempoBertEmbeddings.init_time_embeddings refactored with the following changes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant