Releases: paithiov909/gibasa
v1.1.2
This is a patch release. There are no user's visible changes.
Full Changelog: v1.1.1...v1.1.2
v1.1.1
What's Changed
tokenize
now warns rather than throws an error when an invalid input is given during partial parsing. With this change,tokenize
is no longer entirely aborted even if an invalid string is given. Parsing of those strings is simply skipped.
Full Changelog: v1.1.0...v1.1.1
v1.1.0
What's Changed
- chore(deps): update actions/setup-python action to v5 by @renovate in #32
- Fix global_idf3 by @paithiov909 in #35
- Refactor bind_tf_idf2 by @paithiov909 in #36
Full Changelog: v1.0.1...v1.1.0
v1.0.1
New Feature: dictionary compiler is integrated 🚀
In this release, added wrappers around the 'dictionary compiler' of MeCab.
With source dictionaries and CSV files, you can build MeCab system/user dictionaries without leaving your R console.
Even in environments where MeCab is not installed, such as the Posit Cloud, you can try this snippet right away!!
require(gibasa)
if (requireNamespace("withr")) {
# create a sample dictionary in temporary directory
build_sys_dic(
dic_dir = system.file("latin", package = "gibasa"),
out_dir = tempdir(),
encoding = "utf8"
)
# copy the 'dicrc' file
file.copy(
system.file("latin/dicrc", package = "gibasa"),
tempdir()
)
# write a csv file and compile it into a user dictionary
csv_file <- tempfile(fileext = ".csv")
writeLines(
c(
"qa, 0, 0, 5, \u304f\u3041",
"qi, 0, 0, 5, \u304f\u3043",
"qu, 0, 0, 5, \u304f",
"qe, 0, 0, 5, \u304f\u3047",
"qo, 0, 0, 5, \u304f\u3049"
),
csv_file
)
build_user_dic(
dic_dir = tempdir(),
file = (user_dic <- tempfile(fileext = ".dic")),
csv_file = csv_file,
encoding = "utf8"
)
# mocking a 'mecabrc' file to temporarily use the dictionary
withr::with_envvar(
c(
"MECABRC" = if (.Platform$OS.type == "windows") {
"nul"
} else {
"/dev/null"
},
"RCPP_PARALLEL_BACKEND" = "tinythread"
),
{
tokenize("quensan", sys_dic = tempdir(), user_dic = user_dic)
}
)
}
Full Changelog: v0.9.5...v1.0.1
v0.9.5
Full Changelog: v0.9.4...v0.9.5
v0.9.4
Updated Makevars for Unix alikes. Users can now use a file specified by the MECABRC
environment variable or ~/.mecabrc
to set up dictionaries.
Full Changelog: v0.9.3...v0.9.4
v0.9.3
This is a patch release. For CRAN's checks, removed unnecessary C++ files.
v0.9.2
Initial CRAN release 🚀😎✨
I'm excited to announce {gibasa} is now on CRAN!!
Now you can more easily install {gibasa} from CRAN as well as from r-universe.
Full Changelog: v0.8.1...v0.9.2
v0.8.1
Full Changelog: v0.8.0...v0.8.1
v0.8.0
What's changed
- [Breaking Change] Changed numbering style of 'sentence_id' when
split
isFALSE
. - Added
grain_size
argument totokenize
. - Added new
bind_lr
function. - Use
RcppParallel::parallelFor
instead oftbb::parallel_for
.
Full Changelog: v0.7.1...v0.8.0