Skip to content

Latest commit

 

History

History
205 lines (148 loc) · 5.03 KB

README.md

File metadata and controls

205 lines (148 loc) · 5.03 KB

DACKAR

Digital Analytics, Causal Knowledge Acquisition and Reasoning for Technical Language Processing

Configuration for Sphinx doc/conf.py:

extensions = ['sphinx.ext.intersphinx',
	'sphinx.ext.autodoc',
	'sphinx.ext.doctest',
	'sphinx.ext.todo',
	"sphinx.ext.autodoc.typehints",
	"sphinx.ext.mathjax",
  "sphinx.ext.autosummary",
	"nbsphinx",  # <- For Jupyter Notebook support
	"sphinx.ext.napoleon",  # <- For Google style docstrings
	"sphinx.ext.imgmath",
	"sphinx.ext.viewcode",
	'autoapi.extension',
  'sphinx_copybutton',
]

templates_path = ['_templates']
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
source_suffix = [".rst", ".md"]
autoapi_dirs = ['../src']

import sphinx_rtd_theme

html_theme = 'sphinx_rtd_theme'

html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

# -- NBSphinx options
# Do not execute the notebooks when building the docs
nbsphinx_execute = "never"

autodoc_inherit_docstrings = False

How to build html?

  pip install sphinx sphinx_rtd_theme nbsphinx sphinx-copybutton sphinx-autoapi
  conda install pandoc
  cd docs
  make html
  cd _build/html
  python3 -m http.server

open your brower to: http://localhost:8000

Installation

How to install DACKAR libraries with spaCy 3.5?

  • Install dependency libraries
  conda create -n dackar_libs python=3.11

  conda activate dackar_libs

  pip install spacy==3.5 textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy=1.26 scikit-learn pyspellchecker contextualSpellCheck pandas
  • Download language model from spacy (can not use INL network)
  python -m spacy download en_core_web_lg
  python -m coreferee install en
  • Install required nltk data for similarity analysis

  python -m nltk.downloader all

Different approach when there is an issue with SSLError

  • Download language model from spacy
  Download en_core_web_lg-3.5.0-py3-none-any.whl from https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl

  python -m pip install ./en_core_web_lg-3.5.0-py3-none-any.whl
  • Download coreferee model:
  Download from https://github.com/richardpaulhudson/coreferee/tree/master/models/coreferee_model_en.zip

  python -m pip install ./coreferee_model_en.zip
  • run script DACKAR/nltkDownloader.py to download nltk data:
  python nltkDownloader.py

or check installing_nltk_data_ on how to manually install nltk data. For this project, the users can also try the following steps:

  cd ~
  mkdir nltk_data
  cd nltk_data
  mkdir corpora
  mkdir taggers
  mkdir tokenizers
  Dowload wordnet, averaged_perceptron_tagger, punkt
  cp -r wordnet ~/nltk_data/corpora/
  cp -r averaged_perceptron_tagger ~/nltk_data/taggers/
  cp -r punkt ~/nltk_data/tokenizers

Old Installation Process

How to install DACKAR libraries with spaCy 3.1?

  • Install dependency libraries
  conda create -n nlp_libs python=3.9
  conda activate nlp_libs
  pip install spacy==3.1 textacy matplotlib nltk coreferee beautifulsoup4 networkx pysbd tomli numerizer autocorrect pywsd openpyxl quantulum3[classifier] numpy==1.26 scikit-learn==1.2.2 pyspellchecker

scikit-learn 1.2.2 is required for quantulum3

  • Download language model from spacy (can not use INL network)
  python -m spacy download en_core_web_lg
  python -m coreferee install en
  • Different approach when there is an issue with SSLError
  Download en_core_web_lg-3.1.0.tar.gz from https://github.com/explosion/spacy-models/releases/tag/en_core_web_lg-3.1.0

  python -m pip install ./en_core_web_lg-3.1.0.tar.gz
  • Download coreferee model:
  Download from https://github.com/richardpaulhudson/coreferee/tree/master/models/coreferee_model_en.zip

  python -m pip install ./coreferee_model_en.zip
  • You may need to install stemming for some of unit parsing
  pip install stemming

Installing typing_extensions<4.6

  pip install typing_extensions==4.5.*
  • Required libraries and nltk data for similarity analysis
  conda install -c conda-forge pandas
  python -m nltk.downloader all
  • Different approach when there is an issue with SSLError

As a first alternative, the following command can be used:

  python nltkDownloader.py

If not successful, please check (https://www.nltk.org/data.html) on how to manually install nltk data. For this project, the users can try the following steps:

  cd ~
  mkdir nltk_data
  cd nltk_data
  mkdir corpora
  mkdir taggers
  mkdir tokenizers
  Dowload wordnet, averaged_perceptron_tagger, punkt
  cp -r wordnet ~/nltk_data/corpora/
  cp -r averaged_perceptron_tagger ~/nltk_data/taggers/
  cp -r punkt ~/nltk_data/tokenizers
  • Required library for preprocessing
  pip install contextualSpellCheck