Skip to content
This repository has been archived by the owner on May 31, 2020. It is now read-only.

Add Python module. Issue #47 #49

Merged
merged 5 commits into from
Apr 11, 2019
Merged

Add Python module. Issue #47 #49

merged 5 commits into from
Apr 11, 2019

Conversation

greebie
Copy link
Collaborator

@greebie greebie commented Apr 4, 2019

This PR replaces the function cell in the notebook with a python module and revises the function to work with the Notebook class.

A few notes:

  • If you make changes to the class, the notebook will not recognize the changes until the notebook is restarted. (Ask me about the fun of debugging in this environment.) I assume this will not be a problem once the module is finalized.
  • The network graphs currently use very little of the new module (only the gephi filename). I would like some advice on whether we should resolve that here or in a different PR (e.g. when we split the notebooks).
  • The module contains a class called Notebook that takes a collection ID, a folder name and **kwargs that sets the various variables. One potential issue is if a config has a var that's not in the class. Could use some advice about whether an error is fine or if we should handle things more gracefully.
  • For the most part, this just means that where before we said get_text(), instead we now use nb.get_text().
  • The previous set-up used default parameters i.e. fun(x, min_length="2") - now it just uses the MINIMUM_WORD_LENGTH variable, so
nb.MINIMUM_WORD_LENGTH = 2
nb.fun(x)
  • There is some ambiguity re: style as we have ALL_CAPS in the notebook to handle madlibs, but inside a class this is poor Python style for non-constants. An easy solution is just to use lower() on instantiation for the class, but whatever other advice anyone has is welcome.

Fix notebook to call from Python module.

Remove notebook functions.
@greebie
Copy link
Collaborator Author

greebie commented Apr 4, 2019

Marked this draft for general comments first. Once the approach is okay, I'll mark it ready for review and we can work through some of the details.

Copy link
Member

@ruebot ruebot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple style changes here to to start. You'll also want to run it through something that'll validate PEP-8. http://pep8online.com/checkresult

You'll also need to update the Dockerfile to include the library. Otherwise, it's missing from the build and won't run. See here: https://mybinder.org/v2/gh/archivesunleashed/auk-notebooks/issue-47?filepath=auk-notebook.ipynb

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-8eb457572b52> in <module>
----> 1 from auknb import Notebook

ModuleNotFoundError: No module named 'auknb'

You can put a COPY auknb.py $HOME on line 18 https://github.com/archivesunleashed/auk-notebooks/blob/master/Dockerfile#L18

Notebook itself:

  • Probably want to change "# Change to alter path." to something a bit more descriptive. Maybe "Change value to full path to your data, including trailing slash."
  • # Setup Archives Unleashed Cloud data. you can get rid of that comment in the required packages section.
  • Put the library import in with all the others.
  • Probably need to change the intro language and section header for "User Configuration"

auknb.py Outdated
return(result)

if __name__ == "__main__":
print ("This module is intended for use with Archives Unleashed Cloud")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put all this in a header comment.

auknb.py Outdated
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords

class Notebook():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change the class name to AuNotebook.

greebie added 2 commits April 5, 2019 14:42
Remove print statements in favor of docstrings.
- class variables are lower case as per PEP8
- add auknb.py to Dockerfile
- fix explanations for function calls etc.
- add instructions to view functions and class documentation.
x
@greebie
Copy link
Collaborator Author

greebie commented Apr 5, 2019

I think the latest commit fixes the above review fixes. I also included mention that help(nb.function_name()) will provide instructions. Once I get word from @ianmilligan1 that the approach is okay, I will take this out of draft and work on fine tuning.

Copy link
Member

@ianmilligan1 ianmilligan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on my end as a user, but I'd want to make sure @ruebot green-lights this as he knows exponentially more than I do.

@greebie
Copy link
Collaborator Author

greebie commented Apr 7, 2019

Thanks @ianmilligan1 @ruebot what are your thoughts on unit tests here? I can build them, but am not sure if they belong in the auk-notebooks folder.

@ruebot
Copy link
Member

ruebot commented Apr 8, 2019

Out of scope for now. We'll cross that bridge when we publish it in it's own repo.

Copy link
Member

@ruebot ruebot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebook review:

  • AUK_PATH comment has two full-stops.
  • What's the purpose of displaying: help(AuNotebook.clean_domain)?
  • Change: The following cell sets up AuNotebook to The following cell sets up the notebook
  • Cell 13 has an output error # Get a list of the top words in the collection (regardless of year). Did you clear the notebook and re-run it?
  • You should update the library filename to reflect the class name change. Make sure you update the Dockerfile as well.

greebie added 2 commits April 8, 2019 10:22
- remove help cell (was only for testing).
- change module name to aunb.
- fix output errors (i.e. be more patient when waiting for notebook to run.)
@greebie
Copy link
Collaborator Author

greebie commented Apr 8, 2019

Latest commit should be ready to go. Moving this out of draft.

@greebie greebie marked this pull request as ready for review April 8, 2019 14:24
@ruebot ruebot merged commit 4b8938b into master Apr 11, 2019
@ruebot ruebot deleted the issue-47 branch April 11, 2019 13:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants