Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offer randomized contrast #3

Open
christofs opened this issue Mar 5, 2017 · 1 comment
Open

offer randomized contrast #3

christofs opened this issue Mar 5, 2017 · 1 comment

Comments

@christofs
Copy link
Contributor

Offer the option to run a comparison not on a meaningful partitioning of the data, but on a random one, for a better understanding of the level of differences to be expected if the partitioning is not meaningful.

@christofs
Copy link
Contributor Author

Partly implemented.

Based on the current implementation, it would be very interesting to do this multiple times internally and calculate list of zeta score distributions ranked by mean or median, based on such multiple random partitionings of the data, then use this to do significance tests on the zeta scores with meaningfully partitioned data to estimate which zeta scores can actually be considered statistically significant given a certain text collection and partitioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant