NMTFcoclust implements decomposition on a data matrix 𝐗 (document-word counts, movie-viewer ratings, and product-customer purchases matrices) with finding three matrices:
- 𝐅 (roles membership rows)
- 𝐆 (roles membership columns)
- 𝐒 (roles summary matrix)
The low-rank approximation of 𝐗 by
NMTFcoclust
implements the proposed algorithm (OPNMTF) and some NMTF according to the objective functions below:
numpy==1.18.3
pandas==1.0.3
scipy==1.4.1
matplotlib==3.0.3
scikit-learn==0.22.2.post1
coclust==0.2.1
Datasets | #Documents | #Words | Sporsity(%0) | Number of clusters |
---|---|---|---|---|
CSTR | 475 | 1000 | 96% | 4 |
WebACE | 2340 | 1000 | 91.83% | 20 |
Classic3 | 3891 | 4303 | 98% | 3 |
Sports | 8580 | 14870 | 99.99% | 7 |
Reviews | 4069 | 18483 | 99.99% | 5 |
RCV1_4Class | 9625 | 29992 | 99.75% | 4 |
NG20 | 19949 | 43586 | 99.99% | 20 |
20Newsgroups | 18846 | 26214 | 96.96% | 20 |
TDT2 | 9394 | 36771 | 99.64% | 30 |
RCV1_ori | 9625 | 29992 | 96.62% | 4 |
import pandas as pd
import numpy as np
from scipy.io import loadmat
from sklearn.metrics import confusion_matrix
file_name=r"NMTFcoclust\Dataset\Classic3\classic3.mat"
mydata = loadmat(file_name)
X_Classic3 = mydata['A'].toarray()
X_Classic3_sum_1 = X_Classic3/X_Classic3.sum()
true_labels = mydata['labels'].flatten().tolist()
true_labels = [x+1 for x in true_labels]
print(confusion_matrix(true_labels, true_labels))
Medical: [[1033 0 0]
Information Retrieval: [ 0 1460 0]
Aeronautical Systems: [ 0 0 1398]]
from NMTFcoclust.Models.NMTFcoclust_OPNMTF_alpha_2 import OPNMTF
from NMTFcoclust.Evaluation.EV import Process_EV
OPNMTF_alpha = OPNMTF(n_row_clusters = 3, n_col_clusters = 3, landa = 0.3, mu = 0.3, alpha = 0.4)
OPNMTF_alpha.fit(X_Classic3_sum_1)
Process_Ev = Process_EV( true_labels ,X_Classic3_sum_1, OPNMTF_alpha)
Accuracy (Acc):0.9100488306347982
Normalized Mutual Info (NMI):0.7703948803438703
Adjusted Rand Index (ARI):0.7641161476685447
Confusion Matrix (CM):
[[1033 0 0]
[ 276 1184 0]
[ 0 74 1324]]
Total Time: 26.558243700000276
OPNMTF implements on synthetic datasets such as Bernoulli, Poisson, and Truncated Gaussian:
- Available from GitHub
- Available from ESWA
- Pre-review version
- Personalized URL providing 50 days' free access to the orginal article
- Industry Relations and Applications
- We proposed a co-clustering algorithm Orthogonal Parametric Non-negative Matrix Tri-Factorization (OPNMTF) by Adding two penalty terms for controlling the orthogonality of row and column clusters based on 𝛼-divergence.
- We use the 𝛼-divergence as a measure of divergence between the observation matrix and the approximation matrix. This unification permits more flexibility in determining divergence measures by changing the value of 𝛼.
- Experiments on six real text datasets demonstrate the effectiveness of the proposed model compared to the state-of-the-art co-clustering methods.
- Our algorithm works by multiplicative update rules and it is convergence.
- Adding two penalties for controlling the orthogonality of row and column clusters.
- Unifying a class of algorithms for co-clustering based on
$\alpha$ -divergence. - All datasets and algorithm codes are available on GitHub as
NMTFcoclust
repository.
Please cite the following paper in your publication if you are using NMTFcoclust in your research:
@article{Saeid_OPNMTF_2023,
title= {Orthogonal Parametric Non-negative Matrix Tri-Factorization with 𝛼-Divergence for Co-clustering},
DOI= {10.1016/j.eswa.2023.120680},
volume= {231},
number= {120680},
journal= {Expert Systems with Applications},
authors= {Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour},
year= {2023}
}
[7] Li et al, Nonnegative Matrix Factorizations for Clustering: A Survey (2019), Data Clustering.