This library contains various utils to parse GitHub repositories into function definition and docstring pairs. It is based on tree-sitter to parse code into ASTs and apply heuristics to parse metadata in more details. Currently, it supports 6 languages: Python, Java, Go, Php, Ruby, and Javascript. It also parses function calls and links them with their definitions for Python.
pip install function-parser
In order to use the library you must download and build the language grammars for tree-sitter
to parser source code with. Included in the library is a handy CLI tool for setting this up.
To download and build grammars: build_grammars
This command will download and build the grammars in the same location this python library was installed on your computer after pip installing.
import function_parser
import os
import pandas as pd
from function_parser.language_data import LANGUAGE_METADATA
from function_parser.process import DataProcessor
from tree_sitter import Language
language = "python"
DataProcessor.PARSER.set_language(
Language(os.path.join(function_parser.__path__[0], "tree-sitter-languages.so"), language)
)
processor = DataProcessor(
language=language, language_parser=LANGUAGE_METADATA[language]["language_parser"]
)
dependee = "keras-team/keras"
definitions = processor.process_dee(dependee, ext=LANGUAGE_METADATA[language]["ext"])
pd.DataFrame(definitions).head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
nwo | sha | path | language | identifier | parameters | argument_list | return_statement | docstring | docstring_summary | docstring_tokens | function | function_tokens | url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | keras-team/keras | e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b | keras/backend.py | python | backend | () | return 'tensorflow' | Publicly accessible method for determining the... | Publicly accessible method for determining the... | [Publicly, accessible, method, for, determinin... | def backend():\n """Publicly accessible metho... | [def, backend, (, ), :, return, 'tensorflow'] | https://github.com/keras-team/keras/blob/e43af... | |
1 | keras-team/keras | e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b | keras/backend.py | python | cast_to_floatx | (x) | return np.asarray(x, dtype=floatx()) | Cast a Numpy array to the default Keras float ... | Cast a Numpy array to the default Keras float ... | [Cast, a, Numpy, array, to, the, default, Kera... | def cast_to_floatx(x):\n """Cast a Numpy arra... | [def, cast_to_floatx, (, x, ), :, if, isinstan... | https://github.com/keras-team/keras/blob/e43af... | |
2 | keras-team/keras | e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b | keras/backend.py | python | get_uid | (prefix='') | return layer_name_uids[prefix] | Associates a string prefix with an integer cou... | Associates a string prefix with an integer cou... | [Associates, a, string, prefix, with, an, inte... | def get_uid(prefix=''):\n """Associates a str... | [def, get_uid, (, prefix, =, '', ), :, graph, ... | https://github.com/keras-team/keras/blob/e43af... | |
3 | keras-team/keras | e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b | keras/backend.py | python | reset_uids | () | Resets graph identifiers. | Resets graph identifiers. | [Resets, graph, identifiers, .] | def reset_uids():\n """Resets graph identifie... | [def, reset_uids, (, ), :, PER_GRAPH_OBJECT_NA... | https://github.com/keras-team/keras/blob/e43af... | ||
4 | keras-team/keras | e43af6c89cd6c4adecc21ad5fc05b21e7fa9477b | keras/backend.py | python | clear_session | () | Resets all state generated by Keras.\n\n Kera... | Resets all state generated by Keras. | [Resets, all, state, generated, by, Keras, .] | def clear_session():\n """Resets all state ge... | [def, clear_session, (, ), :, global, _SESSION... | https://github.com/keras-team/keras/blob/e43af... |