Open
Description
This comment identifies some alternative approaches (conversion to data.table paired with other approaches, as well as vctrs::vec_duplicate_any) (and some details with memory benchmarking). If as_epi_df()
is still consuming a lot of time in some operations (I need to package up the archive -> archive slide mentioned in the issue), then we may want to look at these some more. (The memory aspect probably only matters for epi_archive
duplicate-key detection not epi_df
duplicated-key detection.)
First part of this is probably benchmarking some code to see if it's worth the time looking into further optimizations. (profvis may not show properly if there is native code involved; be sure to check / instrument properly)