NonNegativeMatrixFactorization

Decompose a matrix into two non-negative matrix factors.

ResourceFunction["NonNegativeMatrixFactorization"][mat,k]

finds non-negative matrix factors for the matrix mat using k dimensions.

Details and Options

Non-negative Matrix Factorization (NNMF) is a matrix factorization method that reduces the dimensionality of a given matrix.
NNMF has two factors: (1) the left one is the reduced dimensionality representation matrix, (2) the right one is the corresponding new-basis matrix.
The argument matrix can be square or rectangular.
NNMF allows easier interpretation of extracted topics in the framework of Latent semantic analysis.
When using NNMF over collections of images and documents often more than one executions are required in order to build confidence in the topics interpretation.
In comparison with the thin Singular value decomposition NNMF provides non-orthogonal basis vectors that have positive coordinates.
Here are the options taken by ResourceFunction["NonNegativeMatrixFactorization"]:
"Epsilon"10^-9denominator (regularization) offset
MaxSteps200maximum iteration steps
“NonNegative”Trueshould the non-negativity be enforced at each step
"Normalization"Leftwhich normalization is applied to the factors
PrecisionGoalAutomaticprecision goal
“ProfilingPrints”Falseshould profiling times be printed out during execution
“RegularizationParameter”0.01regularization (multiplier) parameter

Examples

Basic Examples

Here is a random integer matrix:

In[1]:=
SeedRandom[7]
mat = RandomInteger[10, {4, 3}];
MatrixForm[mat]
Out[3]=

Here are the NNMF factors:

In[4]:=
{W, H} = ResourceFunction[
   "https://www.wolframcloud.com/obj/user-01322cd4-0fb2-4de3-bb60-\
43a779233e87/DeployedResources/Function/\
NonNegativeMatrixFactorization"][mat, 2];
Row[{MatrixForm[W], MatrixForm[H]}]
Out[5]=

Here is the matrix product of the obtained factors:

In[6]:=
MatrixForm[W.H]
Out[6]=

Note that element-wise the relative errors between the original matrix and reconstructed matrix are small:

In[7]:=
MatrixForm[Round[(W.H - mat)/mat, 0.001]]
Out[7]=

Scope

Here is a random matrix with its first two columns having much larger magnitudes than the rest:

In[8]:=
SeedRandom[278];
mat = RandomReal[{20, 100}, {10, 2}];
mat2 = RandomReal[{10, 30}, {10, 7}];
mat = ArrayPad[mat, {{0, 0}, {0, 5}}] + mat2;
ListLinePlot[Transpose[mat], PlotRange -> All, ImageSize -> Medium, PlotTheme -> "Detailed"]
Out[9]=

Here we compute NNMF factors:

In[10]:=
SeedRandom[32];
{W, H} = ResourceFunction[
   "https://www.wolframcloud.com/obj/user-01322cd4-0fb2-4de3-bb60-\
43a779233e87/DeployedResources/Function/\
NonNegativeMatrixFactorization"][mat, 3, MaxSteps -> 100, PrecisionGoal -> 8];

Here is the relative error of the approximation by the obtained matrix factorization:

In[11]:=
Norm[mat - W.H]/Norm[mat]
Out[11]=

Here is the relative error for the first three columns:

In[12]:=
Norm[mat[[All, 1 ;; 3]] - (W.H)[[All, 1 ;; 3]]]/
 Norm[mat[[All, 1 ;; 3]]]
Out[12]=

Here are comparison plots:

In[13]:=
Block[{opts = {PlotRange -> All, ImageSize -> 250, PlotTheme -> "Detailed"}}, Row[{ListLinePlot[Transpose[mat], opts, PlotLabel -> "Orginal"], ListLinePlot[Transpose[W.H], opts, PlotLabel -> "Reconstructed"]}]]
Out[13]=

Options

“Normalization”

NNMF factors can be normalized in two ways: (i) the Euclidean norms of the columns of the left factor are all equal to 1, and (ii) the Euclidean norms of the rows of the right factor are all equal to 1.

Here is a table that shows NNMF factors for different normalization specifications:

In[14]:=
Block[{mat},
 SeedRandom[7];
 mat = RandomInteger[10, {4, 3}];
 Multicolumn[
  Table[
   SeedRandom[121];
   {W, H} = ResourceFunction[
     "https://www.wolframcloud.com/obj/user-01322cd4-0fb2-4de3-bb60-\
43a779233e87/DeployedResources/Function/\
NonNegativeMatrixFactorization"][mat, 2, "Normalization" -> nspec];
   Grid[{
     {Style[nspec, Bold, Blue], SpanFromLeft},
     {Labeled[MatrixForm[W], "W", Top], Labeled[MatrixForm[H], "H", Top]},
     {"Norm/@Transpose[W]:", Norm /@ Transpose[W]},
     {"Norm/@H:", Norm /@ H}},
    Dividers -> All, FrameStyle -> GrayLevel[0.8], Alignment -> Left
    ], {nspec, {Automatic, Left, Right, True, False, None}}],
  2, Dividers -> All]
 ]
Out[14]=

“RegularizationParameter”

The implemented NNMF algorithm uses the Gradient Descent algorithm, the regularization parameter controls the “learning rate” and can have dramatic influence on the number of iterations steps and approximation precision.

Here we compute NNMF with different regularization multiplier parameters:

In[15]:=
res = Association[# -> BlockRandom[SeedRandom[22];
       ResourceFunction[
        "https://www.wolframcloud.com/obj/user-01322cd4-0fb2-4de3-\
bb60-43a779233e87/DeployedResources/Function/\
NonNegativeMatrixFactorization"][mat, 3, "RegularizationParameter" -> #, MaxSteps -> 10, PrecisionGoal -> 3]] & /@ {0.01, 2}];

Plot the results:

In[16]:=
Grid[KeyValueMap[{#1, ListLinePlot[Transpose[#2[[1]]], PlotRange -> All, ImageSize -> 250, PlotTheme -> "Detailed"]} &, res], Alignment -> {Center, Left}]
Out[16]=

Here are the corresponding norms:

In[17]:=
Map[Norm[# - mat]/Norm[mat] &, Dot @@@ res]
Out[17]=

Applications

Topics extraction

One of the main motivations for developing NNMF algorithms is the easier interpretation of extracted topics in the framework of Latent Semantic Analysis. The following code illustrates the extraction of topics over a dataset of movie reviews.

Take a movie reviews dataset:

In[18]:=
movieReviews = ExampleData[{"MachineLearning", "MovieReview"}, "Data"];
Dimensions[movieReviews]
Out[16]=

Change the labels “positive” and “negative” to have the prefix “tag”:

In[19]:=
movieReviews[[All, 2]] = "tag:" <> # & /@ movieReviews[[All, 2]];

Concatenate to each review the corresponding label and make an association:

In[20]:=
aMovieReviews = AssociationThread[
   Range[Length[movieReviews]] -> Map[StringRiffle[List @@ #, " "] &, movieReviews]];
RandomSample[aMovieReviews, 2]
Out[21]=

Select movie reviews that are non-trivial:

In[22]:=
aMovieReviews = Select[aMovieReviews, StringQ[#] && StringLength[#] > 10 &];
RandomSample[aMovieReviews, 2]
Out[7]=

Split each review into words and delete stop words:

In[23]:=
aMovieReviews2 = DeleteStopwords[Select[StringSplit[#], StringLength[#] > 0 &]] & /@ aMovieReviews;

Convert the movie reviews association into a list of id-word pairs (“long form”):

In[24]:=
lsLongForm = Join @@ MapThread[Thread[{##}] &, Transpose[List @@@ Normal[aMovieReviews2]]];
RandomSample[lsLongForm, 4]
Out[13]=

Replace each word with its Porter stem:

In[25]:=
AbsoluteTiming[
 aStemRules = Dispatch[Thread[Rule[#, WordData[#, "PorterStem"] & /@ #]] &@
    Union[lsLongForm[[All, 2]]]];
 lsLongForm[[All, 2]] = lsLongForm[[All, 2]] /. aStemRules;
 ]
Out[25]=

Find the frequency of appearance of each unique word stems and pick word that appear in more than 30 reviews:

In[26]:=
aTallies = Association[Rule @@@ Tally[lsLongForm[[All, 2]]]];
aTallies = Select[aTallies, # > 20 &];
Length[aTallies]
Out[28]=

Filter the id-word pairs list to contain only word that are “popular enough”:

In[29]:=
lsLongForm = Select[lsLongForm, KeyExistsQ[aTallies, #[[2]]] && StringLength[#[[2]]] > 2 &];

Compute a id-word contingency matrix:

In[30]:=
ctObj = ResourceFunction["CrossTabulate"][lsLongForm, "Sparse" -> True];

Here is sample of the contingency matrix as dataset:

In[31]:=
SeedRandom[9938];
ResourceFunction["CrossTabulate"][RandomSample[lsLongForm, 12]]
Out[32]=

Visualize the contingency matrix:

In[33]:=
CTMatrixPlot[x_Association /; KeyExistsQ[x, "SparseMatrix"], opts___] := MatrixPlot[x["SparseMatrix"], Append[{opts}, FrameLabel -> {{Keys[x][[2]], None}, {Keys[x][[3]], None}}]];
CTMatrixPlot[ctObj]
Out[34]=

Take the contingency sparse matrix:

In[35]:=
matCT = N[ctObj["SparseMatrix"]];

Here is a plot that shows the Pareto Principle adherence of the selected popular words:

In[36]:=
ResourceFunction["ParetoPrinciplePlot"][Total[matCT, {1}]]
Out[36]=

Judging from the Pareto plot above we should apply the Inverse Document Frequency (IDF) formula:

In[37]:=
matCT = matCT.SparseArray[
    DiagonalMatrix[Log[Dimensions[matCT][[1]]/Total[matCT, {1}]]]];

Normalize each row with Euclidean norm:

In[38]:=
matCT = matCT/Sqrt[Total[matCT*matCT, {2}]];

In order to get computations faster we take a sample sub-matrix:

In[39]:=
SeedRandom[8966]
matCT2 = matCT[[RandomSample[Range[Dimensions[matCT][[1]]], 4000], All]]
Out[40]=

Apply NNMF in order to extract 24 topics using 12 maximum iteration steps and normalizing the right factor:

In[41]:=
SeedRandom[23];
AbsoluteTiming[
 {W, H} = ResourceFunction[
    "https://www.wolframcloud.com/obj/user-01322cd4-0fb2-4de3-bb60-\
43a779233e87/DeployedResources/Function/\
NonNegativeMatrixFactorization"][matCT2, 24, MaxSteps -> 12, "Normalization" -> Right];
 ]
Out[42]=

Show the extracted topics using the right factor (H):

In[43]:=
Multicolumn[
 Table[
  Column[{Style[ind, Blue, Bold], ColumnForm[
     Keys[TakeLargest[
       AssociationThread[
        ctObj["ColumnNames"] -> Normal[H[[ind, All]]]], 10]]]}],
  {ind, Dimensions[H][[1]]}
  ], 8, Dividers -> All]
Out[43]=

Statistical thesaurus

Here we show statistical thesaurus entries for random words, selected words, and the labels (“tag:positive”,”tag:negative”):

In[44]:=
SeedRandom[898];
rinds = Flatten[
   Position[ctObj["ColumnNames"], #] & /@ Map[WordData[#, "PorterStem"] &, {"tag:positive", "tag:negative", "book", "amusing", "actor", "plot", "culture", "comedy", "director", "thoughtful", "epic", "film", "bad", "good"}]];
rinds = Sort@
   Join[rinds, RandomSample[Range[Dimensions[H][[2]]], 16 - Length[rinds]]];
Multicolumn[
 Table[
  Column[{Style[ctObj["ColumnNames"][[ind]], Blue, Bold], ColumnForm[
     ctObj["ColumnNames"][[
      Flatten@Nearest[Normal[Transpose[H]] -> "Index", H[[All, ind]], 12]]]]}],
  {ind, rinds}
  ], 8, Dividers -> All]
Out[20]=

Properties and Relations

NNMF should be compared with Singular Value Decomposition (SVD) and Independent Component Analysis (ICA).

Generate 3D points:

In[45]:=
SeedRandom[236];
n = 200;
c = 15;
points = Transpose[
   RandomVariate[NormalDistribution[0, #], n] & /@ {2, 4, 0.1}];
points = points.RotationMatrix[{{0, 0, 1}, {-1, 0, 1}}];
points = Map[{4, 8, 2} + # &, points];
points = Clip[points, {0, Max[points]}];
opts = {BoxRatios -> {1, 1, 1}, PlotRange -> Table[{c, -c}, 3], FaceGrids -> {{0, 0, -1}, {0, 1, 0}, {1, 0, 0}}, ViewPoint -> {-1.1, -2.43, 2.09}};
gr0 = ListPointPlot3D[points, opts];

Compute matrix factorizations using NNMF, SVD, and ICA and make a comparison plots grid (rotate the graphics boxes for better perception):

In[46]:=
SeedRandom[236];
{W, H} = ResourceFunction[
   "https://www.wolframcloud.com/obj/user-01322cd4-0fb2-4de3-bb60-\
43a779233e87/DeployedResources/Function/\
NonNegativeMatrixFactorization"][points, 2, "Normalization" -> Right];
grNNMF = Show[{ListPointPlot3D[points], Graphics3D[{Red, Arrow[{{0, 0, 0}, #}] & /@ (c*H/Norm[H])}]}, opts, PlotLabel -> "NNMF"];
{U, S, V} = SingularValueDecomposition[points, 2];
grSVD = Show[{ListPointPlot3D[points], Graphics3D[{Red, Arrow[{{0, 0, 0}, #}] & /@ (c*Transpose[V])}]}, opts, PlotLabel -> "SVD"];
{A, S} = ResourceFunction["IndependentComponentAnalysis"][points, 2];
grICA = Show[{ListPointPlot3D[points], Graphics3D[{Red, Arrow[{{0, 0, 0}, #}] & /@ (c*S/Norm[S])}]}, opts, PlotLabel -> "ICA"];
Grid[{{gr0, grSVD}, {grICA, grNNMF}}]
Out[31]=

Possible Issues

Slow computations

Because of the Gradient Descent implementation through invocation of LinearSolve at every step for most larger data the computations are much slower than say the corresponding thing SingularValueDecomposition computations.

In most practical cases, though, NNMF does not have to be run for more than, say, 20 steps.

Local minima issues

NNMF is prone to “get stuck” in local minima. Hence, it is a good idea to run it several times in verify results interpretation. Different runs might involve different values for the regularization parameters.

Random initialization

NNMF uses random initialization of left factor. Because of that when multiple NNMF executions are done over the same data, in general the extracted bases would be different.

If an adequate number of topics and other execution parameters are used the bases differences should be small; the basis vectors are expected to be shuffled.

Source Metadata

Publisher Information