Skip to content
This repository was archived by the owner on Jun 17, 2024. It is now read-only.

fix: Perform the last action with the column name modified. #9

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

tashiro-akira
Copy link
Contributor

@AkiraUra @kimusaku
Fix target column names to revert after removing special characters.
Please check.

@tashiro-akira tashiro-akira requested a review from a team as a code owner December 12, 2023 06:42
@tashiro-akira tashiro-akira requested review from kimusaku and AkiraUra and removed request for a team December 12, 2023 06:42
Copy link
Contributor

@AkiraUra AkiraUra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following process is easy to implement. What do you think?

rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) in cols_has_symbols }
train_dataset = train_dataset.rename(columns=rename_symbol_cols)
...
rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}
... # restore column names by ".rename(columns=rename_symbol_cols)"

An issue would be that renamed names are the same as the original ones or renamed names of two columns are the same.

@tashiro-akira
Copy link
Contributor Author

An error occurred during the process of outputting a csv when the review results were applied.
csv file to return the name of the column in the data frame from which it was output.

Copy link
Contributor

@AkiraUra AkiraUra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you consider my comment?

Comment on lines 5 to 10
{% if training %}
rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols }
rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}
train_dataset = train_dataset.rename(columns=lambda col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col)
{% endif %}
{% if test %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following code?

rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols }
{% if training %}
train_dataset = train_dataset.rename(columns=rename_symbol_cols)
{% endif %}
{% if test %}
test_dataset = test_dataset.rename(columns=rename_symbol_cols)
{% endif %}
rename_symbol_cols = {v: k for k, v in rename_symbol_cols.items()}

@tashiro-akira
Copy link
Contributor Author

Reflected the content of the review.
Undid unnecessary modifications.

@tashiro-akira tashiro-akira requested a review from AkiraUra April 4, 2024 02:59
@@ -31,7 +32,7 @@

logger = setup_logger()

INHIBITED_SYMBOL_PATTERN = re.compile(r"[\{\}\[\]\",:<'\\]+")
INHIBITED_SYMBOL_PATTERN = re.compile(r"[\{\}\[\]\",:<'\\\+]+")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add \+?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking if "+" is treated as a symbol in the test string.
I put it back.

Comment on lines 233 to 259
org_df_column = df.columns.values
org_target_column = task.target_columns
df = df.rename(columns=lambda col: remove_symbols(col) if col in cols_has_symbols else col)
task.target_columns = [
remove_symbols(col) if col in cols_has_symbols else col for col in task.target_columns
]
same_column = {k: v for k, v in collections.Counter(list(df.columns.values)).items() if v > 1}
rename_dict = {}
if len(same_column) != 0:
for target in same_column.keys():
rename_dict = {}
rename_target_col = []
df_cols = list(df.columns.values)
i = 1
for col in df_cols:
if target in col:
rename_dict[org_df_column[len(rename_dict)]] = str(col + str(i))
i = i + 1
else:
rename_dict[org_df_column[len(rename_dict)]] = col
df = df.set_axis(list(rename_dict.values()), axis=1)
i = 1
for col in org_target_column:
rename_target_col.append(rename_dict[col])

task.target_columns = rename_target_col

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think the code may rename column names which originally don't contain symbols, right? If so, could you keep the names when the column names contain symbols.
  • This code is hard to read and can have unexpected issues. Could you rewrite it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review has been applied.

Comment on lines 6 to 9
if len(rename_dict) == 0 :
rename_symbol_cols = {col: inhibited_symbol_pattern.sub("", col) if col in cols_has_symbols else col in cols_has_symbols for col in cols_has_symbols }
else:
rename_symbol_cols = rename_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need show this conditional branch to users. Could you move it to "template's if statement"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed conditional branching in template if statement.

Signed-off-by: tashiro-akira <[email protected]>
Signed-off-by: tashiro-akira <[email protected]>
@tashiro-akira tashiro-akira requested a review from AkiraUra April 26, 2024 07:26
Copy link
Contributor

@AkiraUra AkiraUra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you consider my comment?

same_column[target] = same_column[target] - 1
else:
rename_dict[org_column] = target

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current method fails when the renamed names are the same as original names.
For example, there are original columns Age , Age{} and Age1.
In the case, Age -> Age1, Age{} -> Age0, so there are two Age1 columns.
Could you consider the case?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants