Social Media Analytics

Resources


Introduction

Examining the information available through social media can tell you a lot about your online presence. This is a quick exploration of some social media analytics, with examples using a personal Twitter account.

Getting Data

In order to do any meaningful analysis, we’ll need to pull data from a few different sources. The first step is to connect to a social network—in this case, Twitter. We’ll also pull in some geographic data from the Wolfram Knowledgebase, as well as some data collected through an email survey.

Import a follower network from Twitter:

In[1]:=

network=ServiceExecute["Twitter","FollowerNetwork"]

Out[1]=

Grab entities for the 30 nearest cities:

In[2]:=

cities=GeoEntities

Santa Clara County, California, United States

ADMINISTRATIVE DIVISION

,"City"

Out[2]=



Cambrian Park

Fruitdale

Burbank

San Martin

Los Altos Hills

Sunol-Midtown

San Jose

Morgan Hill

Milpitas

Lexington Hills

Sunnyvale

Gilroy

East Foothills

Loyola

Santa Clara

Los Altos

Alum Rock

Mountain View

Cupertino

Seven Trees

Palo Alto

Saratoga

Monte Sereno

Buena Vista

Campbell

Los Gatos

Stanford



Import local data from the survey:

In[3]:=

data=SemanticImport["surveydata.csv"]

Out[3]=

FollowerID	Location	Age	Transporation	Education Level	Industry
137017760	Mountain View	32.6763	Walk	Two-Year Degree	Technology
128882819	Campbell	33.4117	Bicycle	Two-Year Degree	Technology
32093372	Sunnyvale	43.2417	Bicycle	Bachelor's	Manufacturing
14132025	Morgan Hill	38.6501	Other	Two-Year Degree	Sales
77828589	Monte Sereno	33.1648	Bicycle	Bachelor's	Entertainment
showing 1–5 of 800

Data types are automatically interpreted during import, so the locations in this dataset match the previous knowledgebase entities. We can easily cross-reference this data for quick analysis.

A map showing how many followers live in nearby cities:

In[4]:=

GeoBubbleChartdata@Select[MemberQ[cities,#Location&]//CountsBy[#Location&]],

Options



Out[4]=

Age: How Old Are Followers in the Network?

Looking at the age spread of a network can be a good starting point for understanding its makeup. The goal here is to compute an approximate numerical distribution of ages. Since the normal distribution has convenient analytic properties, we’ll try that first.

Find the closest normal distribution fit:

In[5]:=

FindDistribution[data[All,"Age"],TargetFunctions{NormalDistribution},MaxItems1]

Out[5]=

NormalDistribution[38.9873,5.49519]

Plot the actual distribution (red) compared with the estimate (yellow):

In[6]:=

Histogram{ages,%},Length[ages],

Options



Out[6]=

The normal distribution doesn’t quite match; the original data looks heavier on the left side than on the right. For this asymmetric shape, the skew normal distribution might be a better approximation. It has an additional parameter α that measures the skewness (slant) of the distribution.

Compare the formulas for the two distributions:

Out[7]=