Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add live texts parsing to the CRON task #86

Closed
wants to merge 9 commits into from
Closed

Conversation

mdamien
Copy link
Member

@mdamien mdamien commented Jul 31, 2018

Stats

  • 4/4 texts passing for senapy-cli doslegs_urls --in-discussion
  • 2/4 texts passing for anpy-cli doslegs_urls --in-discussion
    • explanation for avenir professionel failing for example: an intermediary text is really missing (161)
  • 265/298 texts passing for senapy-cli doslegs_urls --min-year=2018 (88%)
Errors stats (WIP)
tail -n 1 data/logs-encours/* | grep -v '^==>' | grep . | sort | uniq --count | sort -rn
     16     data = step.get('texte.json')
      3     if step['date'] > date:
      2     raise Exception('[complete_articles] Fatal error')
      1     raise Exception('[parse_texts] Invalid response %s' % url)
      1     if dic.get('titre') and dic.get('titre') == article.get('titre') and 'source_text' not in article:

Blocking problems

  • In generate_dossiers_csv.py, the count of the maximum promulgated texts we can support assumed there's no live texts failing (counting directly the number of failing cases in logs/)

    • To keep the count correct we need to improve on that Done in this PR
  • Make it clear the text is in progress in the UI

Possible improvements

  • Deduplication of the urls but I consider this an optimization for later (issue 72)
  • Improved formatting of the CRON to distinguish the current texts and the promulgated ones Done in this PR
  • --in-discussion should include the texts in the past (work to be done on anpy/senapy)
    • possible solution: Take all the texts discussed in the last month and the ones discussed in the future by parsing the agenda
  • add to reparse_all.sh (done by "Handle non promulgated texts already parsed in data")


See also: Milestone on the frontend

@mdamien mdamien added this to the Live milestone Jul 31, 2018
@mdamien
Copy link
Member Author

mdamien commented Sep 11, 2018

Cette PR commence à vieillir, j'ai fait une version mergeable pour avoir les derniers commentaires et après je merge en un seul commit lundi idéalement:

…merge conflicts"

This reverts commit 59b4675.

(pushed by accident)
@boogheta
Copy link
Contributor

looks like still breaking but I guess because of anpy's changes which require a merge of tlfp's latest changes on master

@boogheta
Copy link
Contributor

In any case, when merging, the leftovers elements in the above todo list need to be replicated in one or more other issues

mdamien added a commit that referenced this pull request Sep 17, 2018
@mdamien
Copy link
Member Author

mdamien commented Sep 18, 2018

Everything important is now either resolved or a new issue, let's close this !

@mdamien mdamien closed this Sep 18, 2018
@mdamien mdamien deleted the add-live-texts branch October 30, 2018 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants