Social Media Analytics
Social Media Analytics
Resources
Resources
Introduction
Introduction
Examining the information available through social media can tell you a lot about your online presence. This is a quick exploration of some social media analytics, with examples using a personal Twitter account.
Getting Data
Getting Data
In order to do any meaningful analysis, we’ll need to pull data from a few different sources. The first step is to connect to a social network—in this case, Twitter. We’ll also pull in some geographic data from the Wolfram Knowledgebase, as well as some data collected through an email survey.
Import a follower network from Twitter:
In[1]:=
network=ServiceExecute["Twitter","FollowerNetwork"]
Out[1]=
Grab entities for the 30 nearest cities:
In[2]:=
cities=GeoEntities,"City"
Out[2]=
,,,,,,,,,,,,,,,,,,,,,,,,,,
Import local data from the survey:
In[3]:=
data=SemanticImport["surveydata.csv"]
Out[3]=
Data types are automatically interpreted during import, so the locations in this dataset match the previous knowledgebase entities. We can easily cross-reference this data for quick analysis.
A map showing how many followers live in nearby cities:
In[4]:=
GeoBubbleChartdata@Select[MemberQ[cities,#Location&]//CountsBy[#Location&]],
Out[4]=
Age: How Old Are Followers in the Network?
Age: How Old Are Followers in the Network?
Looking at the age spread of a network can be a good starting point for understanding its makeup. The goal here is to compute an approximate numerical distribution of ages. Since the normal distribution has convenient analytic properties, we’ll try that first.
Find the closest normal distribution fit:
In[5]:=
FindDistribution[data[All,"Age"],TargetFunctions{NormalDistribution},MaxItems1]
Out[5]=
NormalDistribution[38.9873,5.49519]
Plot the actual distribution (red) compared with the estimate (yellow):
In[6]:=
Histogram{ages,%},Length[ages],
Out[6]=
The normal distribution doesn’t quite match; the original data looks heavier on the left side than on the right. For this asymmetric shape, the skew normal distribution might be a better approximation. It has an additional parameter α that measures the skewness (slant) of the distribution.
Compare the formulas for the two distributions:
Out[7]=
- 2 (x-μ) 2 2 σ 2π σ | - 2 (x-μ) 2 2 σ α(x-μ) 2 σ2π σ |
Normal | Skew Normal |
Find starting parameters for the skew normal distribution:
In[8]:=
FindDistributionParameters[ages,SkewNormalDistribution[μ,σ,α]]
Out[8]=
{μ35.1378,σ9.66348,α2.72919}
Adjust the sliders to verify the fit visually:
Out[9]=