The utility for determining the most used words in python source code
- Free software: MIT
- Requirements
- Python>=3.5.3
- nltk==3.2.4
- github3.py==0.9.6
Finds most used verbs or nouns in local or github python project.
As command line util
$ most_common_words -c minimum_count -s verbs local -p path/to/your/project
or
$ most_common_words -c minimum_count -s verbs github flask -u pallets
As library
import most_common_words as mcw
config = {'path': '/your/path/to/project', 'count': any_count, 'speech_part': 'verbs or nouns'}
processor = mcw.MostCommonWords(config)
mcw.check_nltk_data_installation(yes=True) # if you don`t know if data installed or not. it will installed automatically
words = processor.get_words()
or with github project
from pathlib import Path
import most_common_words as mcw
config = {'project-name': 'project name', 'count': any_count, 'speech_part': 'verbs or nouns'}
gh = mcw.Client(config)
gh.find_project()
gh.download_project(archive_file_dest_fd)
gh.unzip_project(archive_file_dest_fd, project_folder_dest)
config['path'] = Path(project_folder_dest)
processor = mcw.MostCommonWords(config)
words = processor.get_words()
For archive_file_dest_fd
and project_folder_dest
you can use respectively tempfile.NamedTemporaryFile
(at least any file descriptor)
and tempfile.TemporaryDirectory
(in this case you must call name
property on its object).
You can print returned words through builtin print. Example:
for word, times in words:
print(word, times)
Or through Printer
instance:
printer = mcw.Printer(config)
printer.print(words)
Note: printer config requires format and pretty keys. Format on of 3 supported output formats (csv, json, humanable) and pretty if output must be prettified if it can be done.
Util needs nltk data to be downloaded, so if it is not installed script will ack you to download it (it may take tome time).
You can use check_installation()
method call of NLTKDownloader instance to check and download nltk data. It gets 2 optional boolean arguments: yes
and force_download
.
Or you can use raw nltk methods:
from nltk.downloader import Downloader
downloader = Downloader()
if not downloader.is_installed('all'):
downloader.download('all')
Run most_common_words --help
for a full list of options and their effects.
$ most_common_words --help
usage: most_common_words [-h] [-p PATH] [-c COUNT] [-s {verbs,nouns}]
[-f {json,csv,humanable}] [--pretty]
[--skip-data-check]
[--console {stdout,stderr} | -o OUTPUT]
[--functions | --variables]
{local,github} ...
positional arguments:
{local,github}
optional arguments:
-h, --help show this help message and exit
-p PATH, --path PATH Path to project. Default current folder.
-c COUNT, --count COUNT
Determines minimum number of occurrences words.
Default 2.
-s {verbs,nouns}, --speech-part {verbs,nouns}
Choose what part of speech to search. Default verbs.
-f {json,csv,humanable}, --format {json,csv,humanable}
Chose output format. Default humanable.
--pretty Prettify output
--skip-data-check Skips nltk data installation
--console {stdout,stderr}
Prints returned data to stdout or stderr
-o OUTPUT, --output OUTPUT
Prints returned data to file. (Overrides existing
file!)
--functions Goes through function names
--variables Goes through variable names
$ most_common_words local -h
usage: most_common_words local [-h] [-p PATH]
optional arguments:
-h, --help show this help message and exit
-p PATH, --path PATH Path to project. Default current folder.
$ most_common_words github -h
usage: most_common_words github [-h] [-u USER] [-l LOGIN] [-s SECRET]
[-t TOKEN]
project-name
positional arguments:
project-name
optional arguments:
-h, --help show this help message and exit
-u USER, --user USER Github project owner.
-l LOGIN, --login LOGIN
Your Github login.
-s SECRET, --secret SECRET
Your Github password.
-t TOKEN, --token TOKEN
Your Github OAuth token.
NOTE!
Any common arguments must be gone BEFORE github or local subcommands!
Main class
attr: |
Holds base configuration. Must have:
|
---|---|
method: |
Main function (aka entry point). Returns list of tuples there first element is word, second - count. |
Contains some helper functions
function: |
Generator, yields item's content if its iterable (list, tuple, generator), otherwise yields item itself. Non recursive. |
---|---|
function: |
Generator, walks through folders recursively and yields all files with extension extension, wrapped in pathlib.Path. |
Contains functions to parse python source code.
function: |
Checks, if name is magic (starts and ends with double-underline symbols) or not. |
---|---|
function: |
Checks, if given ast node is function or not. |
function: |
Checks, if given ast node is assign or not. |
function: |
Gets name, tokenize it and returns list of words, with nltk speech part tag. |
function: |
Generator, yields ast from each file in path arg (calls |
function: |
Generator, yields function nodes from all ast (calls |
function: |
Generator, yields assign's nodes targets from all ast (calls |
Contains class encapsulates nltk data download logic and exceptions
function: |
Checks, if nltk data is installed. If it doesnt installed, asks permission to install in interactive mode and tries to download and install if permitted.
If argument |
---|
Encapsulates download logic.
attr: |
Nltk data id. By default |
---|---|
method: |
Checks, if nltk data is installed (by id from data_id). If it doesnt installed, asks permission to install in interactive mode and tries to download and install if permitted.
If argument |
method: |
If argument |
Base downloader exception.
Error class, throws if data not installed and user rejected it. Inherits from most_common_words.nltk_downloader.NLTKDownloaderError
Error class, throws if something throng with Internet connection. Installation check even needs internet. Inherits from most_common_words.nltk_downloader.NLTKDownloaderError
Contains output logic
Encapsulates printer logic.
attr: |
Holds base configuration. Must have:
|
---|---|
property: |
Returns formatter class according on config |
property: |
Returns configured Writer instance for current pointer. If searches config for key writer, if it presents return it. Otherwise it looks for output key, if its not |
method: |
Formats message from data and prints it. |
Package contains different formatter's implementations
Abstract base class for any new formatter.
attr: |
Holds base configuration. Must have:
|
---|---|
property: |
Returns pretty key from config. |
property: |
Returns speech_part key from config. |
absractmethod: |
Main abstract method. Eny realization must receive data and return string. |
Implements abc most_common_words.formatter.base.Formatter
. Output is CSV.
Implements abc most_common_words.formatter.base.Formatter
. Output is JSON.
Implements abc most_common_words.formatter.base.Formatter
. Used as default, for humans.
property: |
Returns path key from config. |
---|
Contains classes, responsible for writing data for different places. All classes have only one method: write(data: str)
, which writes data.
Writes data to file. Constructor accepts file as pathlib.Path
instance. Overrides existing file!
Writes data to stdout.
Writes data to stderr.
Contains functionality for interaction this GitHub API
Class for interaction this GitHub API
attr: |
Holds base configuration. Must have:
|
---|---|
attr: |
Holds reference to found github project. |
attr: |
Separate printer for client, to interact with user. |
property: |
Returns project-name key from config. |
proprety: |
Returns user key from config. |
property: |
Returns login key from config. |
property: |
Returns secret key from config. |
property: |
Returns token key from config. |
method: |
Finds github project, according on |
method: |
Downloads project zip archive and writes to file descriptor archive_fd. |
method: |
Unpacks project archive, writen to file descriptor archive_fd to project_folder |