Skip to content

More investigation on epi_df duplicate detection #598

Open
@brookslogan

Description

@brookslogan

This comment identifies some alternative approaches (conversion to data.table paired with other approaches, as well as vctrs::vec_duplicate_any) (and some details with memory benchmarking). If as_epi_df() is still consuming a lot of time in some operations (I need to package up the archive -> archive slide mentioned in the issue), then we may want to look at these some more. (The memory aspect probably only matters for epi_archive duplicate-key detection not epi_df duplicated-key detection.)

First part of this is probably benchmarking some code to see if it's worth the time looking into further optimizations. (profvis may not show properly if there is native code involved; be sure to check / instrument properly)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions