GitHub - Saeidhoseinipour/NMTFcoclust: Co-clustering algorithms can seek homogeneous sub-matrices into a dyadic data matrix, such as a document-word matrix.

`NMTFcoclust` (Non-negative Matrix Tri-Factorization for Co-clustering)

NMTFcoclust implements decomposition on a data matrix 𝐗 (document-word counts, movie-viewer ratings, and product-customer purchases matrices) with finding three matrices:

𝐅 (roles membership rows)
𝐆 (roles membership columns)
𝐒 (roles summary matrix)

The low-rank approximation of 𝐗 by

$$\mathbf{X} \approx \mathbf{FSG}^{\top}$$

Brief description of models

NMTFcoclust implements the proposed algorithm (OPNMTF) and some NMTF according to the objective functions below:

OPNMTF

$$D_{\alpha}(\mathbf{X}||\mathbf{FSG}^{\top})+ \lambda \; D_{\alpha}(\mathbf{I}_{g}||\mathbf{F}^{\top}\mathbf{F})+ \mu \; D_{\alpha}(\mathbf{I}_{s}||\mathbf{G}^{\top}\mathbf{G})$$

PNMTF

$$0.5||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}+0.5 \tau \; Tr(\mathbf{F} \Psi_{g}\mathbf{F}^{\top})+0.5 \eta \; Tr(\mathbf{G} \Psi_{s}\mathbf{G}^{\top})+ 0.5 \gamma \; Tr(\mathbf{S}^{\top}\mathbf{S})$$

ONMTF

$$0.5 ||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}$$

NBVD

$$||\mathbf{X}-\mathbf{FSG}^{\top}||^{2}$$

ONM3T

$$||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}+ Tr(\Lambda (\mathbf{F}^{\top}\mathbf{F}-\mathbf{I}_{s}))+ Tr(\Gamma (\mathbf{G}^{\top}\mathbf{G}-\mathbf{I}_{g}))$$

ODNMTF

$$||\mathbf{X}-\mathbf{FF^{\top}XGG}^{\top}||^{2}+ Tr(\Lambda \mathbf{F}^{\top})+ Tr( \Gamma \mathbf{G}^{\top})$$

DNMTF

$$||\mathbf{X}-\mathbf{FF^{\top}XGG}^{\top}||^{2}$$

Requirements

numpy==1.18.3
pandas==1.0.3
scipy==1.4.1
matplotlib==3.0.3
scikit-learn==0.22.2.post1
coclust==0.2.1

Datasets

Datasets	#Documents	#Words	Sporsity(%0)	Number of clusters
CSTR	475	1000	96%	4
WebACE	2340	1000	91.83%	20
Classic3	3891	4303	98%	3
Sports	8580	14870	99.99%	7
Reviews	4069	18483	99.99%	5
RCV1_4Class	9625	29992	99.75%	4
NG20	19949	43586	99.99%	20
20Newsgroups	18846	26214	96.96%	20
TDT2	9394	36771	99.64%	30
RCV1_ori	9625	29992	96.62%	4

import pandas as pd 
import numpy as np
from scipy.io import loadmat
from sklearn.metrics import confusion_matrix 



                                                                  

file_name=r"NMTFcoclust\Dataset\Classic3\classic3.mat"
mydata = loadmat(file_name)

                                                                    
X_Classic3 = mydata['A'].toarray()
X_Classic3_sum_1 = X_Classic3/X_Classic3.sum()
                                                                   
true_labels = mydata['labels'].flatten().tolist()                  
true_labels = [x+1 for x in true_labels]                           
print(confusion_matrix(true_labels, true_labels))



 Medical:               [[1033    0     0]
 Information Retrieval: [   0  1460     0]
 Aeronautical Systems:  [   0    0   1398]]

Model

from NMTFcoclust.Models.NMTFcoclust_OPNMTF_alpha_2 import OPNMTF
from NMTFcoclust.Evaluation.EV import Process_EV

OPNMTF_alpha = OPNMTF(n_row_clusters = 3, n_col_clusters = 3, landa = 0.3,  mu = 0.3,  alpha = 0.4)
OPNMTF_alpha.fit(X_Classic3_sum_1)
Process_Ev = Process_EV( true_labels ,X_Classic3_sum_1, OPNMTF_alpha) 



Accuracy (Acc):0.9100488306347982
Normalized Mutual Info (NMI):0.7703948803438703
Adjusted Rand Index (ARI):0.7641161476685447

Confusion Matrix (CM):
				[[1033    0    0]
				 [ 276 1184    0]
				 [   0   74 1324]]
Total Time:  26.558243700000276

non-negative matrix tri-factorization,OPNMTF, Orthogonal Parametric, Text mining, Matrix factorization, Co-clustering, Saeid Hoseinipour, divergence, wordcloud

Download full-size image available in ESWA

Supplementary material

OPNMTF implements on synthetic datasets such as Bernoulli, Poisson, and Truncated Gaussian:

Contributions

We proposed a co-clustering algorithm Orthogonal Parametric Non-negative Matrix Tri-Factorization (OPNMTF) by Adding two penalty terms for controlling the orthogonality of row and column clusters based on 𝛼-divergence.
We use the 𝛼-divergence as a measure of divergence between the observation matrix and the approximation matrix. This unification permits more flexibility in determining divergence measures by changing the value of 𝛼.
Experiments on six real text datasets demonstrate the effectiveness of the proposed model compared to the state-of-the-art co-clustering methods.

Highlights

Our algorithm works by multiplicative update rules and it is convergence.
Adding two penalties for controlling the orthogonality of row and column clusters.
Unifying a class of algorithms for co-clustering based on $\alpha$-divergence.
All datasets and algorithm codes are available on GitHub as NMTFcoclust repository.

Cite

Please cite the following paper in your publication if you are using NMTFcoclust in your research:

 @article{Saeid_OPNMTF_2023, 
    title=            {Orthogonal Parametric Non-negative Matrix Tri-Factorization with 𝛼-Divergence for Co-clustering}, 
    DOI=              {10.1016/j.eswa.2023.120680},
    volume=           {231}, 
    number=           {120680},
    journal=          {Expert Systems with Applications}, 
    authors=          {Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour}, 
    year=             {2023}
}

References

[1] Wang et al, Penalized nonnegative matrix tri-factorization for co-clustering (2017), Expert Systems with Applications.

[2] Yoo et al, Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel manifolds (2010), Information Processing and Management.

[3] Ding et al, Orthogonal nonnegative matrix tri-factorizations for clustering (2008), Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

[4] Long et al, Co-clustering by block value decomposition (2005), Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.

[5] Labiod et al, Co-clustering under nonnegative matrix tri-factorization (2011), International Conference on Neural Information Processing.

[6] Li et al, Nonnegative Matrix Factorization on Orthogonal Subspace (2010), Pattern Recognition Letters.

[7] Li et al, Nonnegative Matrix Factorizations for Clustering: A Survey (2019), Data Clustering.

[8] Cichocki et al, Non-negative matrix factorization with $\alpha$-divergence (2008), Pattern Recognition Letters.

[9] Saeid, Hoseinipour et al, Orthogonal parametric non-negative matrix tri-factorization with 𝛼-Divergence for co-clustering, Expert Systems with Applications (2023).

Name		Name	Last commit message	Last commit date
Latest commit History 589 Commits
Applications		Applications
Datasets		Datasets
Doc/Image		Doc/Image
Evaluation		Evaluation
Models		Models
Supplementary material		Supplementary material
Synthetic_Data		Synthetic_Data
Visualization		Visualization
.gitignore		.gitignore
LICENSE		LICENSE
OPNMTF.png		OPNMTF.png
Prereview_OPNMTF.pdf		Prereview_OPNMTF.pdf
README.md		README.md
Supplementary material_ESWA.pdf		Supplementary material_ESWA.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

`NMTFcoclust` (Non-negative Matrix Tri-Factorization for Co-clustering)

Brief description of models

Requirements

Datasets

Model

Supplementary material

Contributions

Highlights

Cite

References

About

Releases

Packages

Languages

License

Saeidhoseinipour/NMTFcoclust

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

NMTFcoclust (Non-negative Matrix Tri-Factorization for Co-clustering)

Brief description of models

Requirements

Datasets

Model

Supplementary material

Contributions

Highlights

Cite

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`NMTFcoclust` (Non-negative Matrix Tri-Factorization for Co-clustering)

Packages