Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VersionTracker:steps_df returns a column Index, most columns are of type object #3852

Open
lucasrodes opened this issue Jan 16, 2025 · 1 comment

Comments

@lucasrodes
Copy link
Member

lucasrodes commented Jan 16, 2025

Problems

  1. VersionTracker returns a table whose first column is index, which doesn't seem to be required:
from etl.version_tracker import VersionTracker
steps_df = VersionTracker(exclude_steps=None).steps_df
  1. Most columns from steps_df are of type object. Instead, we could use string type.
index                                   int64
step                                   object
direct_dependencies                    object
direct_usages                          object
all_active_dependencies                object
all_dependencies                       object
all_active_usages                      object
all_usages                             object
state                                  object
role                                   object
dag_file_name                          object
path_to_script                         object
n_all_dependencies                      int64
n_all_usages                            int64
step_type                              object
kind                                   object
channel                                object
namespace                              object
version                                object
name                                   object
identifier                             object
versions                               object
n_versions                              int64
latest_version                         object
n_newer_versions                        int64
chart_ids                              object
chart_slugs                            object
db_dataset_id                         float64
db_dataset_name                        object
db_private                             object
db_archived                            object
update_period_days                    float64
chart_views_365d                       object
all_chart_ids                          object
all_chart_slugs                        object
n_charts                                int64
all_chart_views_365d                   object
n_chart_views_365d                      int64
date_of_next_update                    object
days_to_update                         object
same_steps_forward                     object
same_steps_backward                    object
same_steps_all                         object
same_steps_latest                      object
dag_file_path                          object
full_path_to_script                    object
is_latest                                bool
updateable_dependencies                object
n_updateable_dependencies             float64
n_updateable_snapshot_dependencies    float64
external_usages                        object
n_external_usages                     float64
update_state                           object

Impact

The impact of having an extra index col and lack of string dtypes are pretty minimal, and don't affect functionality.

@larsyencken
Copy link
Collaborator

@lucasrodes This looks like a very small fix, want to have a try? 👼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants