Module nmtf.modules.nmtf
Classes accessing Non-negative matrix and tensor factorization functions
Classes
class NMF (n_components=None, beta_loss='frobenius', use_hals=False, tol=1e-06, max_iter=150, max_iter_mult=20, leverage='standard', convex=None, kernel='linear', random_state=None, verbose=0)-
Initialize NMF model
Parameters
n_components:integer- Number of components, if n_components is not set : n_components = min(n_samples, n_features)
n_update_W:integer- Estimate last n_update_W components from initial guesses. If n_update_W is not set : n_update_W = n_components.
n_update_H:integer- Estimate last n_update_H components from initial guesses. If n_update_H is not set : n_update_H = n_components.
beta_loss:string, default'frobenius'- String must be in {'frobenius', 'kullback-leibler'}. Beta divergence to be minimized, measuring the distance between X and the dot product WH. Note that values different from 'frobenius' (or 2) and 'kullback-leibler' (or 1) lead to significantly slower fits. Note that for beta_loss == 'kullback-leibler', the input matrix X cannot contain zeros.
use_hals:boolean- True -> HALS algorithm (note that convex & kullback-leibler loss options are not supported) False-> Projected gradiant
tol:float, default: 1e-6- Tolerance of the stopping condition.
max_iter:integer, default: 200- Maximum number of iterations.
max_iter_mult:integer, default: 20- Maximum number of iterations in multiplicative warm-up to projected gradient (beta_loss = 'frobenius' only).
leverage:None | 'standard' | 'robust', default'standard'- Calculate leverage of W and H rows on each component.
convex:None | 'components' | 'transformation', defaultNone- Apply convex constraint on W or H.
kernel:'linear', 'quadratic', 'radial', default'linear'- Can be set if convex = 'transformation'.
random_state:int, RandomState instanceorNone, optional, default: None- If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by
np.random. verbose:integer, default: 0- The verbosity level (0/1).
Returns
NMF model
Example
>>> from nmtf import * >>> myNMFmodel = NMF(n_components=4)References
P. Fogel, D.M. Hawkins, C. Beecher, G. Luta, S. S. Young (2013). A Tale of Two Matrix Factorizations. The American Statistician, Vol. 67, Issue 4. C. H.Q. Ding et al (2010) Convex and Semi-Nonnegative Matrix Factorizations IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 32 Issue: 1Methods
def fit_transform(self, X, W=None, H=None, update_W=True, update_H=True, n_bootstrap=None, regularization=None, sparsity=0, skewness=False, null_priors=False)-
Compute Non-negative Matrix Factorization (NMF)
Find two non-negative matrices (W, H) such as x = W @ H.T + Error. This factorization can be used for example for dimensionality reduction, source separation or topic extraction.
The objective function is minimized with an alternating minimization of W and H.
Parameters
X:array-like, shape (n_samples, n_features)- Constant matrix.
W:array-like, shape (n_samples, n_components)- prior W If n_update_W == 0 , it is used as a constant, to solve for H only.
H:array-like, shape (n_features, n_components)- prior H If n_update_H = 0 , it is used as a constant, to solve for W only.
update_W:boolean, default: True- Update or keep W fixed
update_H:boolean, default: True- Update or keep H fixed
n_bootstrap:integer, default: 0- Number of bootstrap runs.
regularization:None | 'components' | 'transformation'- Select whether the regularization affects the components (H), the transformation (W) or none of them.
sparsity:float, default: 0- Sparsity target with 0 <= sparsity < 1 representing either: - the % rows in W or H set to 0 (when use_hals = False) - the mean % rows per column in W or H set to 0 (when use_hals = True) sparsity == 1: adaptive sparsity through hard thresholding and hhi
skewness:boolean, defaultFalse- When solving mixture problems, columns of X at the extremities of the convex hull will be given largest weights. The column weight is a function of the skewness and its sign. The expected sign of the skewness is based on the skewness of W components, as returned by the first pass of a 2-steps convex NMF. Thus, during the first pass, skewness must be set to False. Can be set only if convex = 'transformation' and prior W and H have been defined.
null_priors:boolean, defaultFalse- Cells of H with prior cells = 0 will not be updated. Can be set only if prior H has been defined.
Returns
Estimator (dictionary) with following entriesW:array-like, shape (n_samples, n_components)- Solution to the non-negative least squares problem.
H:array-like, shape (n_components, n_features)- Solution to the non-negative least squares problem.
volume:scalar, volume occupied by W and HWB:array-like, shape (n_samples, n_components)- A sample is clustered in cluster k if its leverage on component k is higher than on any other components. During each run of the bootstrap, samples are re-clustered. Each row of WB contains the frequencies of the n_components clusters following the bootstrap. Only if n_bootstrap > 0.
HB:array-like, shape (n_components, n_features)- A feature is clustered in cluster k if its leverage on component k is higher than on any other components. During each run of the bootstrap, features are re-clustered. Each row of HB contains the frequencies of the n_components clusters following the bootstrap. Only if n_bootstrap > 0.
B:array-like, shape (n_observations, n_components)or(n_features, n_components)- Only if active convex variant, H = B.T @ X or W = X @ B
diff:scalar, objective minimum achieved
Example
>>> from nmtf import * >>> myMMFmodel = NMF(n_components=4)>>> # M: matrix to be factorized >>> estimator = myNMFmodel.fit_transform(M)References
P. Fogel, D.M. Hawkins, C. Beecher, G. Luta, S. S. Young (2013). A Tale of Two Matrix Factorizations. The American Statistician, Vol. 67, Issue 4.
C. H.Q. Ding et al (2010) Convex and Semi-Nonnegative Matrix Factorizations IEEE Transactions on Pattern Analysis and Machine Intelligence Vol: 32 Issue: 1
def permutation_test_score(self, estimator, y, n_permutations=100)-
Derives from factorization result ordered sample and feature indexes for future use in ordered heatmaps
Parameters
estimator:tuplet as returned by fit_transformy:array-like, group to be predictedn_permutations:integer, default: 100
Returns
Completed estimator with following entries:score:float- The true score without permuting targets.
pvalue:float- The p-value, which approximates the probability that the score would be obtained by chance.
CS:array-like, shape(n_components)- The size of each cluster
CP:array-like, shape(n_components)- The pvalue of the most significant group within each cluster
CG:array-like, shape(n_components)- The index of the most significant group within each cluster
CN:array-like, shape(n_components, n_groups)- The size of each group within each cluster
Example
>>> from nmtf import * >>> myMMFmodel = NMF(n_components=4) >>> # M: matrix to be factorized >>> estimator = myNMFmodel.fit_transform(M) >>> # sampleGroup: the group each sample is associated with >>> estimator = myNMFmodel.permutation_test_score(estimator, RowGroups, n_permutations=100) def predict(self, estimator, blocks=None, cluster_by_stability=False, custom_order=False)-
Derives from factorization result ordered sample and feature indexes for future use in ordered heatmaps
Parameters
estimator:tuplet as returned by fit_transformblocks:array-like, shape(n_blocks), defaultNone- Size of each block (if any) in ordered heatmap.
cluster_by_stability:boolean, defaultFalse- Use stability instead of leverage to assign samples/features to clusters
custom_order:boolean, defaultFalse- if False samples/features with highest leverage or stability appear on top of each cluster if True within cluster ordering is modified to suggest a continuum between adjacent clusters
Returns
Completed estimator with following entries:WL:array-like, shape (n_samples, n_components)- Sample leverage on each component
HL:array-like, shape (n_features, n_components)- Feature leverage on each component
QL:array-like, shape (n_blocks, n_components)- Block leverage on each component (NTF only)
WR:vector-like, shape (n_samples)- Ranked sample indexes (by cluster and leverage or stability) Used to produce ordered heatmaps
HR:vector-like, shape (n_features)- Ranked feature indexes (by cluster and leverage or stability) Used to produce ordered heatmaps
WN:vector-like, shape (n_components)- Sample cluster bounds in ordered heatmap
HN:vector-like, shape (n_components)- Feature cluster bounds in ordered heatmap
WC:vector-like, shape (n_samples)- Sample assigned cluster
HC:vector-like, shape (n_features)- Feature assigned cluster
QC:vector-like, shape (size(blocks))- Block assigned cluster (NTF only)
Example
>>> from nmtf import * >>> myMMFmodel = NMF(n_components=4) >>> # M: matrix to be factorized >>> estimator = myNMFmodel.fit_transform(M) >>> estimator = myNTFmodel.predict(estimator)
class NTF (n_components=None, fast_hals=False, n_iter_hals=2, n_shift=0, unimodal=False, smooth=False, apply_left=False, apply_right=False, apply_block=False, tol=1e-06, max_iter=150, leverage='standard', random_state=None, init_type=1, verbose=0)-
Initialize NTF model
Parameters
n_components:integer- Number of components, if n_components is not set : n_components = min(n_samples, n_features)
fast_hals:boolean, default: False- Use fast implementation of HALS
n_iter_hals:integer, default: 2- Number of HALS iterations prior to fast HALS
n_shift:integer, default: 0- max shifting in convolutional NTF
unimodal:Boolean, default: Falsesmooth:Boolean, default: Falseapply_left:Boolean, default: Falseapply_right:Boolean, default: Falseapply_block:Boolean, default: Falsetol:float, default: 1e-6- Tolerance of the stopping condition.
max_iter:integer, default: 200- Maximum number of iterations.
leverage:None | 'standard' | 'robust', default'standard'- Calculate leverage of W and H rows on each component.
random_state:int, RandomState instanceorNone, optional, default: None- If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by
np.random. init_type:integer, default1- init_type = 1 : NMF initialization applied on the reshaped matrix [vectorized (1st & 2nd dim) x 3rd dim] init_type = 2 : NMF initialization applied on the reshaped matrix [1st dim x vectorized (2nd & 3rd dim)]
verbose:integer, default: 0- The verbosity level (0/1).
Returns
NTF model
Example
>>> from nmtf import * >>> myNTFmodel = NTF(n_components=4)Reference
A. Cichocki, P.H.A.N. Anh-Huym, Fast local algorithms for large scale nonnegative matrix and tensor factorizations, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 92 (3) (2009) 708–721.
Methods
def fit_transform(self, X, n_blocks, n_bootstrap=None, regularization=None, sparsity=0, W=None, H=None, Q=None, update_W=True, update_H=True, update_Q=True)-
Compute Non-negative Tensor Factorization (NTF)
Find three non-negative matrices (W, H, Q) such as x = W @@ H @@ Q + Error (@@ = tensor product). This factorization can be used for example for dimensionality reduction, source separation or topic extraction. The objective function is minimized with an alternating minimization of W and H. Parameters ---------- X : array-like, shape (n_samples, n_features x n_blocks) Constant matrix. X is a tensor with shape (n_samples, n_features, n_blocks), however unfolded along 2nd and 3rd dimensions. n_blocks : integer, number of blocks defining the 3rd dimension of the tensor n_bootstrap : Number of bootstrap runs regularization : None | 'components' | 'transformation' Select whether the regularization affects the components (H), the transformation (W) or none of them. sparsity : float, default: 0 Sparsity target with 0 <= sparsity < 1 representing either: - the % rows in W or H set to 0 (when use_hals = False) - the mean % rows per column in W or H set to 0 (when use_hals = True) sparsity == 1: adaptive sparsity through hard thresholding and hhi. W : array-like, shape (n_samples, n_components) prior W
H : array-like, shape (n_features, n_components) prior H Q : array-like, shape (n_blocks, n_components) prior Q update_W : boolean, default: True Update or keep W fixed update_H : boolean, default: True Update or keep H fixed update_Q : boolean, default: True Update or keep Q fixed Returns ------- Estimator (dictionary) with following entries W : array-like, shape (n_samples, n_components) Solution to the non-negative least squares problem. H : array-like, shape (n_features, n_components) Solution to the non-negative least squares problem. Q : array-like, shape (n_blocks, n_components) Solution to the non-negative least squares problem. volume : scalar, volume occupied by W and H WB : array-like, shape (n_samples, n_components) Percent consistently clustered rows for each component. only if n_bootstrap > 0. HB : array-like, shape (n_features, n_components) Percent consistently clustered columns for each component. only if n_bootstrap > 0. diff : scalar, objective minimum achieved Example ------- >>> from nmtf import * >>> myNTFmodel = NTF(n_components=4) >>> # M: tensor with 5 blocks to be factorized >>> estimator = myNTFmodel.fit_transform(M, 5) Reference --------- A. Cichocki, P.H.A.N. Anh-Huym, Fast local algorithms for large scale nonnegative matrix and tensor factorizations, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 92 (3) (2009) 708–721. def permutation_test_score(self, estimator, y, n_permutations=100)-
See function description in class NMF
def predict(self, estimator, blocks=None, cluster_by_stability=False, custom_order=False)-
See function description in class NMF