Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand file extension dictionary #31

Merged
merged 1 commit into from
Mar 21, 2022
Merged

Expand file extension dictionary #31

merged 1 commit into from
Mar 21, 2022

Conversation

nanxstats
Copy link
Collaborator

This PR extends the file extension dictionary to include commonly used files with extensions like .stan thus fixes #20 .

Impact

Now you should be able to use the regular collate(..., file_auto("inst/")) and collate(..., file_root_core()) calls and see the .stan files and configuration files defined by use_rstan() collated and tagged as text files.

Metrics

I calculated the coverage percentage by number of files in all source packages on CRAN (data):

x <- readLines("exts.txt")
x <- tolower(unlist(strsplit(x, split = "\t")))
y <- sort(table(x), decreasing = TRUE)
eoi <- y

df <- data.frame(
  "ext" = names(eoi),
  "mime" = mime::guess_type(paste0(".", names(eoi))),
  "count" = as.vector(eoi)
)

ext_pkglite <- unique(tolower(c(pkglite::ext_text(flat = TRUE), pkglite::ext_binary(flat = TRUE))))
ext_pkglite <- ext_pkglite[!is.na(match(ext_pkglite, df$ext))]

sum(df[match(ext_pkglite, df$ext), "count"]) / sum(df$count)

Before patch: 88.85%. After patch: 96.65%.

Next step

A more fundamental fix for such issues is separating file capture rules and file type tagging rules, to make the former NOT file extension-based and much more generic (via updating the current file spec definitions), and the latter universal (via dictionary + tagging all unknown extensions as binary). This will be done in issue #18.

@elong0527 elong0527 merged commit 631d06f into master Mar 21, 2022
@nanxstats nanxstats deleted the issue-20 branch March 21, 2022 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add .stan in the pkglite::file_auto
2 participants