All notable changes to PySS3 will be documented here.
- Quick fix of default compatibility with foreign languages (#15).
- Patches issue #11.
-
Dataset.load_from_files_multilabel()
can load documents with no labels as well (31251f8). -
A
set_testset_from_files_multilabel()
function was added to theLive_Test
class. This function allows loading multilabel datasets from disk Live Test server (0ddbd6a).
- Fixed a bug in SS3 hyperparameter initialization (e2e72f9).
PySS3 now fully support multi-label classification! :)
-
The
load_from_files_multilabel()
function was added to theDataset
class (7ece7ce, resolved #6) -
The
Evaluation
class now supports multi-label classification (resolved #5)- Add multi-label support to
train()/fit()
(4d00476) - Add multi-label support to
Evaluation.test()
(0a897dd) - Add multi-label support to
show_best and get_best()
(ef2419b) - Add multi-label support to
kfold_cross_validation()
(aacd3a0) - Add multi-label support to
grid_search()
(925156d, 79f1e9d) - Add multi-label support to the 3D Evaluation Plot (42bbc65)
- Add multi-label support to
-
The Live Test tool now supports multi-label classification as well (15657ee, b617bb7, resolved #9)
-
Category names are no longer case-insensitive (4ec009a, resolved #8)
-
The Live Test Tool now supports custom (user-defined) preprosessing methods (b50cfaf, 7c6b0c6, resolved #3).
-
The process for recognizing word n-grams during classification was improved (2ceb148).
- The
predict
method was optimized. Now it is 10x to 200x faster! This improvement also has a positive impact on other methods that usepredict
such asgrid_search
(37202d8). - A new
get_ngrams_length
method was added toSS3
class. It can be used to get the length of longest learned n-gram (b4f8827). - The Evaluation 3D Plot's GUI was improved (1bb1e5a).
- A new
Evaluation
class topyss3.util
(8feeef5).- Now the user can import the
Evaluation
class to perform model evaluation and hyperparameter optimization. This class not only provide methods to evaluate models but also keeps all the advantages previously provided only through the Command Line tool, such as an evaluation cache that automatically keeps track of the evaluation history and the generation of the interactive 3D evaluation plot.
- Now the user can import the
set_name()
toSS3
(5b1c355).train()
toSS3
as a user-friendly alias offit()
(74cb540).- Print now supports nested verbosity regions (78176ab).
- Compatibility of progress bars with Jupyter Notebooks (7848b3e, 8d163d9, 2029c37, 2a700d5).
- Bug in SS3.fit when given an empty document (31eccbc).
- Non-string category labels support (5b1c355).
- Issue with verbosity level consistency (b38d8b0).
- IndexError in classify_(multi)label (fa91952).
- Python 2 UnicodeEncodeError issue (867026e).
- Public methods for the SS3's
cv
,gv
,lv
,sg
andsn
functions have been added to theSS3 class
(ef35b25). These functions were originally defined in Section 3.2.2 of the original paper. - Slightly improving training time (due to previously disabled 'by-default' cache of "local value" function).
- A bug on the HTTP Live Test Server (d106d68)
- Some bug on the Command-Line tool (cd42b61, 8745603, dfe8b95)
Among other minor improvements and changes, the most important ones that were added are:
SS3
class:- The classifier now explicitly supports multi-label classification:
- Created the following two methods in
SS3
class:classify_multilabel
andclassify_label
(0759bca). - A
multilabel
argument was added to thepredict
method (c5ac946).
- Created the following two methods in
- A new
extract_insight()
method was added to theSS3
class. This method, given a document, returns the pieces of text that were involved in the classification decision (eee1e29). - Created four new methods to allow the user to set the delimiters (b632fe0)
- The classifier now explicitly supports multi-label classification:
- Live Test tool:
- Improved how PySS3 handles verbosity levels (read 216be41 for more info ): created the
set_verbosity()
function.
- Live Test: layout updated.
- PySS3 Command Line:
frange
function added as an alias ofr
for thegrid_search
command.
- PySS3 Command Line: live_test always lunch the server with no documents (even when before "live_test a/path")
- Live Test:sentences starting with "unknown" token were not included in the "Advanced" interactive chart
- Server: fixed bug that stopped the server when receiving arbitrary bytes (not utf-8 strings)
- PySS3 Command Line: fixed bug when loading live_test with a non existing path
- Live Test: now the user can select single letter words (and are also included in the "advanced" live chart)
- Summary operators are not longer static.
Server.set_testset_from_files
lazy load.
- Evaluation plot: confusion matrices size when working with k-folds
Dataset
class added topyss3.util
as an interface to help the user to load/read datasets. MethodDataset.load_from_files
added- Documentations updated
- PySS3 Command Line Python 2 full compatibility support
- Matplotlib set_yaxis bug fixed
- Dependencies and compatibility with python 2 Improved
- Setup and tests fixed
- Summary operators: now it is possible to use user-defined summary operators, the following static methods were added to the
SS3
class:summary_op_ngrams
,summary_op_sentences
, andsummary_op_paragraphs
.
- update: some docstrings were improved
- update: the README.md / Pypi Description file.
- Python 2 and 3 compatibility problem with scikit-learn (using version 0.20.1 from now on)
- PyPi: setup.py:
long_description_content_type
set to'text/markdown'