Skip to content

Releases: paithiov909/gibasa

v1.1.2

16 Feb 08:57
Compare
Choose a tag to compare

This is a patch release. There are no user's visible changes.

Full Changelog: v1.1.1...v1.1.2

v1.1.1

06 Jul 05:49
Compare
Choose a tag to compare

What's Changed

  • tokenize now warns rather than throws an error when an invalid input is given during partial parsing. With this change, tokenize is no longer entirely aborted even if an invalid string is given. Parsing of those strings is simply skipped.

Full Changelog: v1.1.0...v1.1.1

v1.1.0

17 Feb 04:01
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.0.1...v1.1.0

v1.0.1

03 Dec 01:29
Compare
Choose a tag to compare

New Feature: dictionary compiler is integrated 🚀

In this release, added wrappers around the 'dictionary compiler' of MeCab.
With source dictionaries and CSV files, you can build MeCab system/user dictionaries without leaving your R console.

Even in environments where MeCab is not installed, such as the Posit Cloud, you can try this snippet right away!!

require(gibasa)

if (requireNamespace("withr")) {
    # create a sample dictionary in temporary directory
    build_sys_dic(
        dic_dir = system.file("latin", package = "gibasa"),
        out_dir = tempdir(),
        encoding = "utf8"
    )
    # copy the 'dicrc' file
    file.copy(
        system.file("latin/dicrc", package = "gibasa"),
        tempdir()
    )
    # write a csv file and compile it into a user dictionary
    csv_file <- tempfile(fileext = ".csv")
    writeLines(
        c(
            "qa, 0, 0, 5, \u304f\u3041",
            "qi, 0, 0, 5, \u304f\u3043",
            "qu, 0, 0, 5, \u304f",
            "qe, 0, 0, 5, \u304f\u3047",
            "qo, 0, 0, 5, \u304f\u3049"
        ),
        csv_file
    )
    build_user_dic(
        dic_dir = tempdir(),
        file = (user_dic <- tempfile(fileext = ".dic")),
        csv_file = csv_file,
        encoding = "utf8"
    )
    # mocking a 'mecabrc' file to temporarily use the dictionary
    withr::with_envvar(
        c(
            "MECABRC" = if (.Platform$OS.type == "windows") {
                "nul"
            } else {
                "/dev/null"
            },
            "RCPP_PARALLEL_BACKEND" = "tinythread"
        ),
        {
            tokenize("quensan", sys_dic = tempdir(), user_dic = user_dic)
        }
    )
}

Full Changelog: v0.9.5...v1.0.1

v0.9.5

09 Jul 13:02
Compare
Choose a tag to compare

Full Changelog: v0.9.4...v0.9.5

v0.9.4

03 Jun 13:11
Compare
Choose a tag to compare

Updated Makevars for Unix alikes. Users can now use a file specified by the MECABRC environment variable or ~/.mecabrc to set up dictionaries.

Full Changelog: v0.9.3...v0.9.4

v0.9.3

20 Apr 23:28
Compare
Choose a tag to compare

This is a patch release. For CRAN's checks, removed unnecessary C++ files.

v0.9.2

12 Apr 12:39
Compare
Choose a tag to compare

Initial CRAN release 🚀😎✨

I'm excited to announce {gibasa} is now on CRAN!!
Now you can more easily install {gibasa} from CRAN as well as from r-universe.

Full Changelog: v0.8.1...v0.9.2

v0.8.1

14 Mar 08:13
Compare
Choose a tag to compare

Full Changelog: v0.8.0...v0.8.1

v0.8.0

04 Mar 11:43
Compare
Choose a tag to compare

What's changed

  • [Breaking Change] Changed numbering style of 'sentence_id' when split is FALSE.
  • Added grain_size argument to tokenize.
  • Added new bind_lr function.
  • Use RcppParallel::parallelFor instead of tbb::parallel_for.

Full Changelog: v0.7.1...v0.8.0