Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a paragraph on problems faced so far #137

Open
Daniel-Mietchen opened this issue Oct 16, 2016 · 5 comments
Open

Write a paragraph on problems faced so far #137

Daniel-Mietchen opened this issue Oct 16, 2016 · 5 comments

Comments

@Daniel-Mietchen
Copy link
Member

I'll need one paragraph from each of you, so that some version of this info can be included in the final report. I will collate the paragraphs and ping you if I need more details.

@difranco
Copy link
Collaborator

difranco commented Oct 30, 2016

Here's my info:

I'm working on a major update to recitation-bot now and previously advised Max Klein on the first version and worked on the cocytus project with him, which had similar architecture and requirements. In the cocytus project, we used a work queue implemented by the RQ python library and backed by wikimedia labs' single-machine redis server, which did not scale to meet our workload. Wishing to avoid similar problems with recitation-bot, Max attempted to maintain the work queue in the filesystem, but there were concurrency issues with that ad-hoc approach which prevented it from working reliably. In my update now, I am implementing a slightly more sophisticated ad-hoc solution that is using an on-disk key-value database with locking to try to avoid the concurrency problems of the previous version, but this is still not ideal, and has led to a fair amount of troubleshooting in what is essentially reinventing commodity functionality. It would be best to have wikimedia labs' infrastructure support work queues and messaging in a scalable way so that modern infrastructure for backend data processing tasks like these is ready at hand, for example with zeromq configured to scale out.

@arlitt
Copy link
Collaborator

arlitt commented Nov 6, 2016

There have been some glitches getting everything set up and running, most of them covered in @difranco's paragraph. I also ran into some problems initially getting set up on wmflabs as I encountered some missing documentation and other peculiarities of this project that were not covered in the documentation. I have since learned more than I knew before about when to take things on myself and when to ask for help or clarification from those with more experience and knowledge of a system.

@Daniel-Mietchen
Copy link
Member Author

@difranco can you update this with information regarding the problems you had on wmflabs and why the switch from PywikiBot to the mwclient was necessary?

@tcatapano reminder to post your paragraph here.

@difranco
Copy link
Collaborator

difranco commented Mar 5, 2017

Adding to the previous paragraph:
It turned out after a fair bit more troubleshooting that Pywikibot had its own multithreading code, part of its ability to run in stand-alone mode, which conflicted with the system I wrote to handle job processing in the background, and caused the system to hang when Pywikibot tried to connect to upload edits to wikis. This made it necessary to find another way to use the mediawiki API, so consulting with @notconfusing I chose mwclient as it is mature and actively developed and thoroughly documented and designed to be used as a library only. After changing the code to use mwclient, a bug that split the image pages into two incomplete ones was introduced because of differences in how Pywikibot and mwclient handle the creation of file pages, which resulted in our being banned. I have fixed the bug but we remain banned.

@difranco
Copy link
Collaborator

Prepared a summary document, working on it here:
https://docs.google.com/document/d/1JV0ezviSOQOSOJwsT_gCtssmaPCrIDwJ1dR6juhXsPM/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants