Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tests for classes related to typing of tabular data #875

Closed
lars-reimann opened this issue Jun 28, 2024 · 0 comments · Fixed by #979
Closed

Improve tests for classes related to typing of tabular data #875

lars-reimann opened this issue Jun 28, 2024 · 0 comments · Fixed by #979
Assignees
Labels
testing 🧪 Additional automated tests

Comments

@lars-reimann
Copy link
Member

lars-reimann commented Jun 28, 2024

Tests for methods of

  • DataType (interface) and _PolarsDataType (implementation),
  • Schema (interface) and _PolarsSchema (implementation)

are missing entirely or for some edge cases.

We should

  • remove the outdated test_data_type.py and test_schema.py
  • create folders _data_type and _schema inside tests/safeds/data/tabular/typing
  • create individual files inside the folder for the different methods of DataType and Schema (same structure as for containers of tabular data).

Use parametrized tests to cover edge cases.

@lars-reimann lars-reimann added testing 🧪 Additional automated tests lab labels Jun 28, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Library Jun 28, 2024
@lars-reimann lars-reimann removed the lab label Jul 12, 2024
@lars-reimann lars-reimann self-assigned this Jan 12, 2025
@lars-reimann lars-reimann moved this from Backlog to In Progress in Library Jan 12, 2025
lars-reimann added a commit that referenced this issue Jan 12, 2025
Closes #875
Closes #877
Closes partially #977

### Summary of Changes

Stabilize the API of the `Table` class. This PR introduces several
breaking changes to this class:

- All optional parameters are now keyword-only, so we can reposition
them later.
- The `data` parameter of `__init__` is now required.
- Rename `remove_columns_except` to `select_columns`
- The new method can also be called with a callback that determines
which columns to select.
- Rename `add_table_as_columns` to `add_tables_as_columns`
  - Multiple tables can now be passed at once.
- Rename `add_table_as_rows` to `add_tables_as_rows`
  - Multiple tables can now be passed at once.

It also adds new functionality throughout the library:

- New method `Table.add_index_column` to add a new column with
auto-incrementing integer values to a table.
- New method `Table.filter_rows` to keep only the rows matched by some
predicate.
- New method `Table.filter_rows_by_column` to keep only the rows that
have a value in a specific column that matches some predicate.
- New parameter `random_seed` for `Table.shuffle_rows` and
`Table.split_rows` to control the pseudorandom number generator.
Previously, the methods were deterministic, but the seed was hidden.
- New parameter `missing_value_ratio_threshold` of
`Table.remove_columns_with_missing_values` to be able to keep columns
with only a few missing values.
- Various static factory methods under `ColumnType` to instantiate
column types. This prepares for #754.

Finally, the methods `Table.summarize_statistics` and
`Column.summarize_statistics` are now considerably faster.

---------

Co-authored-by: megalinter-bot <[email protected]>
@github-project-automation github-project-automation bot moved this from In Progress to ✔️ Done in Library Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing 🧪 Additional automated tests
Projects
Status: ✔️ Done
Development

Successfully merging a pull request may close this issue.

1 participant