Lab 6: Machine Learning

NetID: <Please fill in>

Note: Please do not rush to evaluate all code by hitting shift+enter. If you do not understand the logical progression of how we are attempting to solve the task, chances are you will run into more frustrating errors.

Part 1: Recommending Music

Remember we discussed in class that any entity can be represented in n-dimensional featurespace.
In this part, we will be working with songs or musical tracks from Spotify.
We will use Spotify’s “Audio Features” data to map different songs to points on a 2D feature space.
This data has been taken from three Spotify playlists from the genres classical, electronic and rap.

Setting up the data

Import the data:

In[]:=

audioFeaturesRawData=Import["https://www.wolframcloud.com/obj/abritac/Published/ECE101/spotify_data.csv"];

Let’s look at the size of the data:

In[]:=

Dimensions[audioFeaturesRawData]

This means the dataset has 201 rows and 23 columns.
Each row corresponds to a track from Spotify.
Each column represents a specific feature of the track, e.g. name of the artist, name of song, name of the album and some other information.

Here are the different features of the first track in the dataset:

Out[]=

	0
danceability	0.721
energy	0.738
key	7
loudness	-4.77
mode	1
speechiness	0.0403
acousticness	0.00226
instrumentalness	4.41× -6 10
liveness	0.118
valence	0.637
tempo	119.976
type	audio_features
id	4qu63nuBpdn0qHUHuObEj1
uri	spotify:track:4qu63nuBpdn0qHUHuObEj1
track_href	https://api.spotify.com/v1/tracks/4qu63nuBpdn0qHUHuObEj1
analysis_url	https://api.spotify.com/v1/audio-analysis/4qu63nuBpdn0qHUHuObEj1
duration_ms	154983
time_signature	4
song	Leave Before You Love Me (with Jonas Brothers)
artist	Marshmello
album	Leave Before You Love Me
name	Leave Before You Love Me (with Jonas Brothers), by Marshmello

The very first row in the dataset contains the headers for each column in the dataset (this tells you the names of the different features being used to represent the track):

In[]:=

featureNames=audioFeaturesRawData[[1]]

We will use only some of the features in this dataset. The following features seem useful based on the descriptions provided at https://developer.spotify.com/documentation/web-api/reference/get-audio-features

◼

(2) danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

◼

(3) energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

◼

(5) loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks.

◼

(7) speechiness: Speechiness detects the presence of spoken words in a track.

◼

(8) acousticness: A measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

◼

(9) instrumentalness: Whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context.

◼

(10) liveness: Detects the presence of an audience in the recording.

◼

(11) valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.

◼

(12) tempo: The pace of music, measured in beats per second.

◼

(18) duration_ms: The duration of the track in milliseconds.

◼

(19) time_signature: An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).

Let’s pull out a few of the features that might be able to characterize the track well (just the names of the features, not the actual sample values):

In[]:=

usefulFeatures=featureNames[[{2,3,5,7,8,9,10,11,12,18,19}]]

Now let’s get the actual feature values for each track in the dataset--values corresponding to the features we selected above.

In[]:=

dataFeatures=audioFeaturesRawData[[2;;,{2,3,5,7,8,9,10,11,12,18,19}]]

You can see how each song or track is now represented by a list of numbers.

There is some other information related to each track that may become useful in labeling the tracks for better visualization:
(20) song: name of the song/track
(21) artist: name of the artist

We can use the name of the track for labeling each sample:

In[]:=

trackNames=audioFeaturesRawData[[2;;,20]]

Let’s look at this numeric representation of the first track in the dataset and its name:

In[]:=

trackNames[[1]]->dataFeatures[[1]]

Let’s look at this numeric representation of the fifth track in the dataset and its name:

In[]:=

trackNames[[5]]->dataFeatures[[5]]

Let’s look at this numeric representation of the 200th track in the dataset and its name:

In[]:=

trackNames[[200]]->dataFeatures[[200]]

Exploring the tracks in feature space

The following are the feature columns in our dataset now (because we selected a few of the features that we thought would be useful and dropped the rest):

Out[]=

1	danceability
2	energy
3	loudness
4	speechiness
5	acousticness
6	instrumentalness
7	liveness
8	valence
9	tempo
10	duration_ms
11	time_signature

“Danceability” is the first feature, “energy” is the second feature, “loudness” is the third feature, and so on.

Let’s pull out the data corresponding to “danceability” (feature 1) and “loudness” (feature 3) for each track, and visualize the samples in only these two dimensions.

In[]:=

(*pulloutcolumns1and3fromthedataset*)tracksDanceabilityLoudness=dataFeatures[[All,{1,3}]];(*labelthedataforeaseofviewing*)labeledTracksDanceabilityLoudness=MapThread[Tooltip[#1,#2]&,{tracksDanceabilityLoudness,trackNames}];(*Visualizein2dimensions*)ListPlot[labeledTracksDanceabilityLoudness,Frame->True,FrameLabel->usefulFeatures[[{1,3}]]]

Let’s recreate the visualization for “danceability” (feature 1) and “speechiness” (feature 4).

In[]:=

(*pulloutcolumns1and3fromthedataset*)tracksDanceabilitySpeechiness=dataFeatures[[All,{1,4}]];(*labelthedataforeaseofviewing*)labeledTracksDanceabilitySpeechiness=MapThread[Tooltip[#1,#2]&,{tracksDanceabilitySpeechiness,trackNames}];(*Visualizein2dimensions*)ListPlot[labeledTracksDanceabilitySpeechiness,Frame->True,FrameLabel->usefulFeatures[[{1,4}]]]

Problem 1

Visualize the tracks in two dimensions for any two other features (“energy”,“acousticness”,“instrumentalness”,“liveness”,“valence”,“tempo”,“duration_ms”,“time_signature”)

(*Enteryourcodebelow*)

Problem 2

Here is a more interactive visualization of the audio features:

Change the x & y axis so that it becomes easy to differentiate classical music from the rest. Which two features did you pick to achieve that?

Answer

Problem 3

A Spotify user often listens to music with 0.5 energy and 0.2 danceability. Name a track that should be recommended to this user.

Answer

Problem 4

Here is two-dimensional visualization of all the songs in our dataset. This was created by using all 9 of our selected features and then reducing them to just 2 numbers with the help of feature extraction algorithms.
(Hover over the data point to see the name of the song.)

What do the two clusters represent?
If someone recently listened to Beethoven’s Sonata No. 14 “Moonlight” in C-Sharp Minor, which of the above clusters would be a better pool for recommending new songs to them?

Answer

Problem 5

This is the vector space representation of the “Flight of the Bumblebee” track:

Here are some other tracks from our dataset:

Wolfram Language has a FeatureNearest function that can find the member of a list, that is nearest to a given sample. For example:

Find the number nearest to 20:

Find the track from the list “tracks” that is nearest to “bumblebee”:

Answer

Part 2: Identifying Images

Machine Learning

Machine learning is a way of teaching the computer to do something by showing it lots of examples.

Say we want to teach the computer to look at a photo and determine if it is showing a “day” scene or a “night” scene.
Instead of providing lots and lots of detailed instructions about what to look for in an image inorder to label it as day or night, we provide examples of both “day” and “night” images and ask the computer to build a model based on the data it has seen.
This model is called a “classifier”.
When we provide a new image, that the computer has not seen before, it uses the “classifier” model to analyze the image and make a decision.

Day-Night Classifier

Here is a list of images, each labeled according to the scene in the image:

Wolfram Language has a function named Classify that can be used to build the classifier model:

Use the classifier on new images to see if it can predict correctly whether its a “day” or “night” image:

Sneaker or Boot Classifier.

In this exercise you will use the “FashionMNIST” dataset from the Wolfram Data Repository to train a classifier to distinguish a sneaker from a boot.

Fashion-MNIST is a dataset of Zalando’s article images--consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

Here are the labels associated with the images in the dataset (a unique number is used as a label for each image -- 7 for sneakers and 9 for boots):

First, download the training dataset
(It may a take a while)

Check the number of samples in the dataset:

The following piece of code selects training and test examples labeled as
7 "Sneaker" or 9 "Ankle boot"

Look at a few random samples from the selected training set:

Problem 6:

Build a classifier using the images in selectedTrainingData:

Problem 7:

Here are ten random images of boots and sneakers from a test data set of similar images:

Use your classifier on the images to see if they are identified as “sneaker” or “boot”:

Problem 8:

You can evaluate the performance of your classifier against the entire test data set (edit the following piece of code):

Comment on the performance of your classifier.

◼

Is it doing well or is it pretty bad at classifying sneakers and boots? What information are you using from the above report to come to this conclusion?

◼

What “Classifier Method” did the model use for this problem?

Answer

Part 3: Writing like Shakespeare

In this part we will attempt to teach the computer to write like Shakespeare. This has applications similar to ChatGPT, but is much less powerful and uses a very different algorithm called “Markov model”. (ChatGPT and similar chatbots use a powerful neural network model called “Generative Pre-trained Transformer”)

In the following example, we will show the computer a number of examples of sonnets written by Shakespeare.
The “Markov model” implemented within the “SequencePredict” function in Wolfram Language computes the probability of a certain letter following another letter in the text that is provided.
E.g. probability of “h” following “t” might be .9 while that of “x” following “t” might be .001.

We can tweak the function a little to compute words following each other instead of single letters following each other.
E.g. probability of “must” following “You” maybe 0.6, while probability of “are” following “you” is 0.8.

We can then use these probabilities to generate new text following some starting seed text we provide.

Import Shakespeare’s sonnets:

How many sonnets are there?

look at one of the sonnets:

Problem 9

Train the SequencePredict function to build a Markov Model using the words from the sonnets:

The order of a Markov model refers to how many previous “states” (words in our example) determine the next “state” (word) in a sequence:
- First-order Markov model: The next word depends on the immediately preceding word.
- Second-order Markov model: The next word depends on the two preceding words.

What is the order of the Markov model you created above?

Answer

Problem 10

The following code will generate the next 10 most likely words (blank space is considered a word in this case):

The following code will generate 10 random words based on the probability distribution computed (not necessarily the most likely):

Try both “NextElement” or “RandomNextElement” to generate next 150 words. Which one provides a more interesting output?

Answer

Submitting your work

Ensure you have filled in your NetID at the top of the notebook

Save the notebook as a PDF file (Alternately, "Print to PDF" but please ensure the PDF looks ok and is not garbled)

Upload to Gradescope

Just to be sure that your submission was received, maybe email your TA (sattwik2@illinois.edu) that you have submitted.