WOLFRAM NOTEBOOK

In[]:=
Table[Labeled[RulePlot[CellularAutomaton[{n,2,1/2}],Appearance->"Bricks"],n],{n,0,15}]
Out[]=
0
,
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
,
11
,
12
,
13
,
14
,
15
In[]:=
Table[Labeled[RulePlot[CellularAutomaton[{n,2,1/2}],Appearance->"Bricks"],n],{n,0,15,2}]
Out[]=
0
,
2
,
4
,
6
,
8
,
10
,
12
,
14
{0,6,8,14}
In[]:=
net=NetChain[{10,SoftmaxLayer[]},"Input"->NetEncoder[{"Image",{5,5},ColorSpace->"Grayscale"}],"Output"->NetDecoder[{"Class",Range[0,9]}]];trained=NetTrain[net,"MNIST"];ClassifierMeasurements[trained,ResourceData["MNIST","TestData"],"Accuracy"]
Out[]=
0.5053
In[]:=
trained
Out[]=
NetChain
Inputport:
image
Outputport:
class

The Minimal MNIST

In[]:=
net=NetChain[{10,SoftmaxLayer[]},"Input"->NetEncoder[{"Image",{5,5},ColorSpace->"Grayscale"}],"Output"->NetDecoder[{"Class",Range[0,9]}]];trained=NetTrain[net,"MNIST",TargetDevice->"GPU"];ClassifierMeasurements[trained,ResourceData["MNIST","TestData"],"Accuracy"]
Out[]=
0.809
In[]:=
trained
Out[]=
NetChain
Inputport:
image
Outputport:
class
In[]:=
MatrixPlot[Normal[NetExtract[trained,{1,"Weights"}]]]
Out[]=
In[]:=
MatrixPlot[Partition[#,5],FrameTicks->None,ImageSize->Tiny]&/@Normal[NetExtract[trained,{1,"Weights"}]]
Out[]=
,
,
,
,
,
,
,
,
,
In[]:=
RandomSample[ResourceData["MNIST","TestData"],5]
Out[]=
4,
1,
3,
9,
4
In[]:=
First/@%8
Out[]=
,
,
,
,
In[]:=
trained/@%
Out[]=
{4,1,1,9,4}
In[]:=
net=NetChain[{10,SoftmaxLayer[]},"Input"->NetEncoder[{"Image",{28,28},ColorSpace->"Grayscale"}],"Output"->NetDecoder[{"Class",Range[0,9]}]];trained28=NetTrain[net,"MNIST",TargetDevice->"GPU"];ClassifierMeasurements[trained28,ResourceData["MNIST","TestData"],"Accuracy"]
Out[]=
0.9195
In[]:=
MatrixPlot[Partition[#,28],FrameTicks->None,ImageSize->Tiny]&/@Normal[NetExtract[trained28,{1,"Weights"}]]
Out[]=
,
,
,
,
,
,
,
,
,
In[]:=
ListPlot3D[Partition[#,28],ColorFunction->"Rainbow",ImageSize->Small,Ticks->None,PlotRange->All]&/@Normal[NetExtract[trained28,{1,"Weights"}]]
Out[]=
,
,
,
,
,
,
,
,
,
Purely multiplying by the template (elementwise), then finding the total, then softmax’ing.
[This can be done purely optically....]
[ is this like “path kernels”? ] [ or “kernel method” ? ]

2D ICA

How much does it matter that the original image is graylevel?

Variation of Learned Weights

Look for change in weights associated with different trainings
19D parameter space [ see what they look like in dimension reduced space ]

ICA for Lifetime

LCA for Lifetime

[ “recurrent in space” ]

Columnwise ICA

[ “recurrent in time” ]

CA Rule for Lifetime

[ RNN ]
[ “recurrent in space and time” ]

RNN + Feedforward ?

LCA where the layers repeat until they hit a stopping condition (e.g. fixed point) (or one of the bits could be a “terminated” flag)
[ Could be individual cells in an ICA that run until they hit a certain value, then lock there until all cells have caught up ]

CAs that Generate Output at Each Step

Or where the readout happens when a particular bit is set...

Things To Learn

Lifetime

Consensus

Fixed collection of patterns

Have it learn a fixed collection of patterns, then feed it
(a) patterns close to those
(b) random noise

Doubling

Reversal

Autoencoder

E.g. encoding “blocks of size n” or “every nth bit” or “blocks of size n repeated m times”

Sequence-to-sequence learning

Translation from spatial initial condition to temporal output

[ “autoregressive system” ]
E.g. invert the cells
Sample the temporal output only every few steps ... then what can be done?

What Was Its Internal Representation of What It Learned?

It’s complicated, and not “explicable”
Multiway system there are certain branches of how you could learn

Objectives

What Happens Inside a Machine Learning System?

“It just happens to work” (you just run the whole thing and it works; there’s no computationally reducible way to describe it)
The thing it learned is not reducible. There might still be a reducible “engineered” solution
[ We should find some engineered solutions ]

In a Classifier What Are the Basins of Attraction?

How are they affected by the different particular “weights” that were learned?

What Kinds of Functions Can Be Learned?

Additional Objectives

Can a purely discrete system do serious ML?

Is gradient descent relevant?

How do we combine recurrent computation with feed forward?

With our simpleminded training it’s perfectly conceivable to train something recurrent

Name for Systems

Array Learning
Cellular Learning Systems
Wolfram Cloud

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


I understand and wish to continue anyway »

You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.