-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyse Function #545
Analyse Function #545
Conversation
Add the "Analyse" function, including LDA topic analyze(topic page), collocation word extract(collocation page), cooccur word extract(cooccur page), high-frequency wordlist(wordlist page), keyword extract(keyword), cooccur semantic network(network page). The analyse function get the result from blacklab-server and analyze it in "src\main\java\nl\inl\corpuswebsite\utils\analyseUtils".
Wow, this is an impressive amount of work! First off, I appreciate the effort and time you've put into this. Given the size and scope of the changes, we'll need some time to properly evaluate everything to ensure it aligns with our own plans for the future. A few initial thoughts:
There’s a lot of interesting work here, we'll take a deeper look and follow up! |
First, thank you for sharing your work with us, we really appreciate the effort and dedication that went into this. There is cool stuff in there - we do see value! But it's mainly down to technical concerns. That said, we'd be happy to provide some guidance or discuss potential next steps in more detail if you're interested. You can reach out via email (see the README), or open issues here on GitHub if you have specific questions. Why We Can't Merge ThisArchitectural FitWe feel that the Corpus-Frontend isn't the best place for this feature. A better approach might be integrating the analysis implementations as a BlackLab plugin or as a separate tool. Philosophically, the Corpus-Frontend is mostly a fancy querybuilder and result viewer - a publishing tool - it doesn't really concern itself with data processing. Usability & Performance ConcernsApart from the above, there are some major usability and performance issues that would need to be addressed:
Technical ConsiderationsBeyond these, we also found some more specific issues:
Moving ForwardOne of the biggest takeaways here is the importance of discussion with maintainers before and during development. Having said all that, we have discussed building some of these things in the past (but more in a "perhaps one day" fashion), and this is a good concrete example of what that could look like! We will certainly look at this for inspiration for some time to come. Best, |
Impressive work, but I have to agree with @KCMertens; this approach unfortunately won't work for many users. If you have time to and are interested in seeing how some of the processing could either be done with a separate data enrichment step before indexing, or perhaps during the BlackLab indexing or search process, we'd be happy to discuss that. |
Add the "Analyse" function, including LDA topic analyze(topic page), collocation word extract(collocation page), cooccur word extract(cooccur page), high-frequency wordlist(wordlist page), keyword extract(keyword), cooccur semantic network(network page). The analyse function get the result from blacklab-server and analyze it in "src\main\java\nl\inl\corpuswebsite\utils\analyseUtils".