-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement additional LD measures #27
Comments
I built a simple function for computing the D of vocd_d in this commit Some issues I encountered:
Also see McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literary and linguistic computing, 15(3), 323-338. Think it's the original paper for vocd-D |
On the first, you can add this function: library("quanteda")
tokens_samplefrom <- function(x, size, replace = FALSE) {
attrs <- attributes(x)
result <- lapply(unclass(x), sample, size = size, replace = replace)
attributes(result) <- attrs
quanteda:::tokens_recompile(result)
}
toks <- tokens(c("a b c d e f", "q r s t u v w x"))
set.seed(100)
tokens_samplefrom(toks, size = 3)
## tokens from 2 documents.
## text1 :
## [1] "b" "f" "c"
##
## text2 :
## [1] "q" "t" "s"
|
Prof. @kbenoit , See commit e0f90d0 for my outline code for vocd-D after incorporating Would be great if you could:
|
For tests, or examples with anything stochastic, use On the HD-D code, I will return to the LD stuff but if @koheiw and I can agree on the structure of a new function (see quanteda/quanteda#1520 (comment)) then this will make writing those functions different (and easier). Let's wait on that issue before I return to this code. However I will try to take a look at the McCarthy & Jarvis (2007) to understand HD-D. I think there is code on the Internet somewhere for this, the vocd software perhaps? |
Working branch for this is dev-MTLD. |
@kbenoit Acknowledged! |
These would include:
See McCarthy, Philip M, and Scott Jarvis. 2010. “MTLD, Vocd-D, and HD-D: a Validation Study of Sophisticated Approaches to Lexical Diversity Assessment.” Behavior Research Methods 42(2): 381–92.
Also for testing the implementations
Related to quanteda/quanteda#1508
The text was updated successfully, but these errors were encountered: