Set up


Language usage

Gather all submission metadata

We’re only interested in answers, as they tend to be in a standard format and questions are prompts for which users add submissions.
In[]:=
answerToMetadata=EntityValue​​EntityClass"StackExchange.Codegolf:Post","PostType"
answer
STACKEXCHANGE:POSTTYPE
,​​"CodeGolfMetadata",​​"EntityAssociation"​​;​​answerToMetadata//Length
Out[]=
142574

Count usage per language

In reality, there really are not over 8600 programming languages used on the site -- my processing of the posts isn’t perfect since users tend to stray from the common format that is easy to parse.
In[]:=
languageToCount=ReverseSort@Counts[DeleteMissing@Values[answerToMetadata[[All,"Language"]]]];​​languageToCount//Length
Out[]=
8623
In fact, the top 14 languages account for just over half of the entire site’s submissions, so in practice, there are vastly fewer real languages:
In[]:=
100*N
Total[languageToCount[[;;14]]]
Total[languageToCount]

Out[]=
50.3277
In[]:=
languageToCount[[;;14]]
Out[]=

Python
13953,
JavaScript
9360,
Perl
5570,
C
4453,
Ruby
4286,
Haskell
4186,
Jelly
4151,
Wolfram Language
4019,
Java
3910,
Pyth
3599,
PHP
3555,
05AB1E
2954,
C#
2778,
R
2716
Perhaps unsurprisingly, Python and JavaScript are the most commonly submitted languages.
However, there are many golf-specific languages that appear in the top dozen or so.
In[]:=
BarChart[100.*N[Reverse[languageToCount[[;;14]]]/Total[languageToCount]],​​ScalingFunctions"Log",​​BarOriginLeft,​​ChartLabelsAutomatic,​​ImageSizeLarge,​​PlotTheme"Detailed",​​LabelStyle18,​​PlotLabel"Language use on codegolf.stackexchange.com",​​FrameLabel{None,"Fraction of submissions (%)"}​​]
Out[]=

Languages with bad words


Sort Thread Submissions by Size

Gather data

Start with a “parent” of a specific thread:
In[]:=
parent=
Q: [Calvin's Hobbies] Do you want to code a snowman?
»
CODEGOLF:POST
;
Gather all of the “child” submissions:
In[]:=
submissions=EntityList@EntityClass"StackExchange.Codegolf:Post",
parent post
parent;​​submissionToMetadata=EntityValue[submissions,"CodeGolfMetadata","EntityAssociation"];​​submissionToMetadata//Length
Out[]=
26

Show

Show the submissions sorted by their reported size (a feature not available on the website without leaderboard code, which sometimes breaks):
In[]:=
NiceGrid[​​{#["ReportedSize"],#["Language"],First[#["CodeSnippets"],Missing["NotFound"]]}&/@SortBy[submissionToMetadata,#ReportedSize&],​​AlignmentLeft​​]
Out[]=
A: [Dennis] CJam, 135 134 132 130 126 125 bytes0000000:…
»
125
B
CJam
0000000: 4e22285b200a5c225f2a295c2d2e2f6f2c3e4f3a3c3d5d225f N"([ .\"_*)\-./o,>O:<=]"_0000019: 2422dd7382d6bfab28707190992f240c362ee510262bd07a77 $".s....(pq../$.6...&+.zw0000032: 08556de9dcdb566c676817c2b87f5ecb8bab145dc2f2f76e07 .Um...Vlgh....^....]...n.000004b: 22323536624b623224663d4e2f7b5f2c342f2f7d25723a7e2e "256bKb2$f=N/{_,4//}%r:~.0000064: 3d2828342423346222205f0a20222e2a6f6f736572372f4e2a =((4$#4b" _. ".*ooser7/N*
A: [Kevin Cruijssen] 05AB1E, 137 135 128 bytes…)(ð•ž&Ž•J•£ÊÒu[7tˆ…
»
128
B
05AB1E
…)(ð•ž&Ž•J•£ÊÒu[7tˆ†ŠHλRΩ.P•12вèJsvN“_ .\/=":><Oo-[],)*(“•æ‰ΔΣ₁çδ₂¯r3₁’8iÈÉÞ2;lλÒžfúāÿ©-Ñm¦Ñ`^«#„]*≠½Ü4~āÐm=¾ç•20вè¶¡Nè4äyè.;
A: [Sp3000] CJam, 150 145 bytesBase convert all the things!"…
»
145
B
CJam
"b8li'U9gN;|"125:Kb8bl:~f="r pL|P3{cR`@L1iT"Kb21b"G.HMtNY7VM=BM@$^$dX8a665V"KbFb"=_./ <[(*-oO,\":"f=_"/<[(""\>])"er+4/f=.=7/N*
A: [Runer112] CJam, 164 bytesGenerates the snowman left-to-right…
»
164
B
CJam
q:Q;SS" _===_,___ ....., _ /_\,___ (_*_)"',/0{Q=~(=}:G~N" \ "4G'(".oO-"_2G",._ "1G@3G')" / "5GN"< / "4G'(" : ] [> < "3/6G')"> \ "5GNS'(" : \" \"___ "3/7G')
A: [Optimizer] CJam, 200 191 bytesThis can surely be golfed a…
»
191
B
CJam
7S*"_===_ ___ ..... _ /_\ ___ (_*_)"+6/2/Nf*",._ "1/".oO-"1/_" <\ / >/ \ "2/4/~" : ] [> < : \" \"___ "3/4/~]l~Ab:(]z::=:L0=N4{L=}:K~0='(2K1K3K')5K0=N4K1='(6K')5K1=NS'(7K')
A: [mazzy] PowerShell, 199 bytesInspired by Reto Koradi and…
»
199
B
PowerShell for Windows
for($t=' 0 _ _0 ___0 _ _ 0_. (0=./_0=._*0=.\_0_. )4 \ (2.oO-1,._ 3.oO-)5 / 4< / (6 ]> 6: 6 [< )5> \ (7 "_ 7: _ 7 "_ )';$d=$t[$i++];$r+="$d"){if($d-ge48){$d=$t[$i+"$args"["$d"]-49]$i+=4}}$r
A: [NinjaBearMonkey] JavaScript ES6, 210 208 202 bytess=>` 08(213)9…
»
202
B
JavaScript
s=>` 08(213)94(6)5 (7)`.replace(/\d/g,p=>`_===_1 ___ .....1 _ /_\\1 ___ (_*_)1,1.1_11.1o101-1.1o101-1<11/11>11\\11 : 1] [1> <1 1 : 1" "1___1 11\\11 11/11 `.split(1)[s[p>7?p-4:p]-1+p*4]||' ')
A: [Jakube] Pyth, 203 bytesM@GCHgc" ␣␣␣ ␣␣␣ ␣"bhzgc…
»
203
B
Pyth
M@GCHgc" ___​ ___ _"bhzgc" (_*_) _===_ ..... /_\\"bhzs[g" \ "@z4\(g"-.oO"@z2g" ,._"@z1g"-.oO"@z3\)g" / "@z5)s[g" < /"@z4\(gc" : ] [> <"b@z6\)g" > \\"@z5)++" ("gc" : \" \"___"bez\)
A: [anatolyg] C, 212 bytesd;main(){char*t="##3#b#b3#bbb3#b#b##\…
»
212
B
C
d;main(){char*t="##3#b#b3#bbb3#b#b##\r#3b1#+3@12b3@1b-3@1_b3b1#,#\r7#_##+51rR04/1b#61rR0,8#2##\r7?#2#+9#`A#9=###9#^?#,8A#_#\r#+:#%b#:=#b#:#%b#,#",p[9];for(gets(p);d=*t++;putchar(d-3))d=d<51?d:(p[d-51]-53)[t+=4];}
A: [Reto Koradi] C, 233 230 byteschar*t=" 0 ␣ ␣0 ␣␣␣0 ␣ ␣ 0␣. (…
»
230
B
C
char*t=" 0 _ _0 ___0 _ _ 0_. (0=./_0=._*0=.\\_0_. ) 4 \\ (2.oO-1,._ 3.oO-)5 / 4< / (6 ]> 6: 6 [< )5> \\ (7 \"_ 7: _ 7 \"_ ) ";i,r,d;f(char*p){while(r++<35){d=t[i]-48;putchar(t[d<0?i:i+p[d]-48]);i+=d<0?1:5;r%7?0:puts("");}}
A: [edc65] JavaScript (ES6), 247Not as good ad @NinjaBearMonkey…
»
247
B
JavaScript
A: [Matty] Python 3, 349 336 254 251 bytesSo much for doing…
»
251
B
Python
l='_===_| ___\n .....| _\n /_\| ___\n (_*_)| : |] [|> <| |>| |\| | : |" "|___| '.split('|')l[4:4]=' \ .oO-,._ .oO- / < / 'def s(a):print(' {}\n{}({}{}{}){}\n{}({}){}\n ({})'.format(*[l[4*m+int(a[int('0421354657'[m])])-1]for m in range(10)]))
A: [Khalil] Python 2.7, 257 bytes (i think)H,N,L,R,X,Y,T,B=…
»
257
B
Python
H,N,L,R,X,Y,T,B=map(int,i)l='\n's=' 'e=' .o0-'F=' \ / 'S=' < / \ >'o,c='()'print s+' _ _ ___ _ _\n\n\n\n _. (=./_=._*=.\__. )'[H::4]+l+F[X]+o+e[L]+' ,._ '[N]+e[R]+c+F[-Y]+l+S[X]+o+' ]> : [< '[T::4]+c+S[-Y]+l+s+o+' "_ : _ "_ '[B::4]+c
A: [Cees Timmerman] Python 2, 354 280 241 261 bytesdef s(g):H,N,L,R,…
»
261
B
Python
def s(g):H,N,L,R,X,Y,T,B=[int(c)-1for c in g];e='.oO-';print(' '*9+'_ _ ___ _ _\n\n\n\n _. (=./_=._*=.\\__. )')[H::4]+'\n'+' \\ '[X]+'('+e[L]+',._ '[N]+e[R]+')'+' / '[Y]+'\n'+'< / '[X]+"("+' ]> : [< '[T::4]+')'+'> \\ '[Y]+'\n ('+' "_ : _ "_ '[B::4]+")"
A: [CL-] C, 280 272 264 bytesOnly partially golfed at this…
»
264
B
C
#define P(n)[s[n]&3],f(char*s){printf(" %.3s\n %.5s\n%c(%c%c%c)%c\n%c(%.3s)%c\n (%.3s)","___ ___ _"+*s%4*3,"(_*_)_===_..... /_\\"+*s%4*5," \\ "P(4)"-.o0"P(2) " ,._"P(1)"-.o0"P(3)" /"P(5)" < /"P(4)" : ] [> <"+s[6]%4*3," > \\"P(5)" : \" \"___"+s[7]%4*3);}
A: [Claudiu] Python, 276 289 bytesV='.oO-'def F(d): D=lambda…
»
289
B
Python
V='.oO-'def F(d): D=lambda i:int(d[i])-1 print" "+("","___"," _ ","___")[D(0)]+"\n "+\"_. (=./_=._*=.\\__. )"[D(0)::4]+"\n"+\" \\ "[D(4)]+"("+V[D(2)]+',._ '[D(1)]+V[D(3)]+")"+" / "[D(5)]+'\n'+\"< / "[D(4)]+"("+" ]> : [< "[D(6)::4]+")"+"> \\ "[D(5)]+"\n ("+\' "_ : _ "_ '[D(7)::4]+")"
A: [nimi] Haskell, 361 306 289 byteso l a b=take a$drop((…
»
289
B
Haskell
o l a b=take a$drop((b-1)*a)ln="\n"p i=id=<<[" ",o" \n _===____ \n ..... _ \n /_\\ ___ \n (_*_)"11a,n,o" \\ "1e,o"(.(o(O(-"2c,o",._ "1 b,o".)o)O)-)"2d,o" / "1f,n,o"< / "1e,o"( : )(] [)(> <)( )"5g,o"> \\ "1f,n," (",o" : )\" \")___) )"4h]where[a,b,c,d,e,f,g,h]=map(read.(:[]))i
A: [Elcan] Dart, 307 bytesf(i,{r='.o0-',s=' : '}){i=i.split…
»
307
B
Dart
f(i,{r='.o0-',s=' : '}){i=i.split('').map((j)=>int.parse(j)-1).toList();return' ${['_===_',' ___ \n.....',' /_\\ ',' ___ \n (_*_)'][i[0]]}\n${' \\ '[i[4]]}(${r[i[2]]+',._ '[i[1]]+r[i[3]]})${' / '[i[5]]}\n${'< / '[i[4]]}(${[s,'] [','> <',' '][i[6]]})${'> \\ '[i[5]]}\n (${[s,'" "','___',' '][i[7]]})';}
A: [Craig Roy] Haskell, 333 bytesMy first submission! Builds the…
»
333
B
Haskell
a=y["\n _===_\n"," ___ \n .....\n"," _ \n /_\\ \n"," ___ \n (_*_)\n"]d=y",._ "c=y".oO-"e=y"< / "j=y" \\ "f=y"> \\ "k=y" / "y w n=w!!(n-1)h=y[" : ","] [","> <"," "]b=y[" ( : ) \n"," (\" \") \n"," (___) \n"," ( ) \n"]s(m:x:o:p:n:q:t:l:_)=putStr$a m++j x:'(':c o:d n:c p:')':k q:'\n':e x:'(':h t++')':f q:'\n':b l
A: [Ciaran_McCarthy] F#, 369 byteslet f(g:string)= let b=" " let p…
»
369
B
F#
let f(g:string)= let b=" " let p=printfn let i x=int(g.[x])-49 p" %s "["";"___";" _ ";"___"].[i 0] p" %s "["_===_";".....";" /_\ ";"(_*_)"].[i 0] p"%s(%c%c%c)%s"[b;"\\";b;b].[i 4]".oO-".[i 2]",._ ".[i 1]".oO-".[i 3][b;"/";b;b;b].[i 5] p"%s(%s)%s"["<";b;"/";b].[i 4][" : ";"] [";"> <";" "].[i 6][">";b;"\\";b].[i 5] p" (%s) "[" : ";"\" \"";"___";" "].[i 7]
A: [Jo.] PHP, 378 bytes' ␣===␣␣␣␣..... ␣ /␣\ ␣␣␣(␣*␣)',…
»
378
B
PHP
<?$f=str_split;$r=$f($argv[1]);$p=[H=>' _===____..... _ /_\ ___(_*_)',N=>',._ ',L=>'.oO-',R=>'.oO-',X=>' <\ / ',Y=>' >/ \ ',T=>' : ] [> < ',B=>' : " "___ '];echo preg_replace_callback("/[A-Z]/",function($m){global$A,$p,$r,$f;$g=$m[0];return$f($f($p[$g],strlen($p[$g])/4)[$r[array_search($g,array_keys($p))]-1])[(int)$A[$g]++];},' HHH HHHHHX(LNR)YX(TTT)Y (BBB)');
A: [M. I. Wright] TI-BASIC, 397 bytesImportant: If you want to test…
»
397
B
TI-Basic
Input Str9seq(inString("1234",sub(Str9,I,1)),I,1,length(Ans→L1" ___ _ ___ →Str1"_===_..... /_\ (_*_)→Str2",._ →Str3"•oO-→Str4"<\/ →Str5">/\ →Str6" : ] [> < →Str7" : ¨ ¨___ →Str8"Str1Str2Str3Str4Str5Str6Str7Str8→Str0For(X,3,5Output(X,2,"( )EndL1Output(3,3,sub(Str4,Ans(3),1)+sub(Str3,Ans(2),1)+sub(Str4,Ans(4),1Ans(5Output(4-(Ans=2),1,sub(Str5,Ans,1L1(6Output(4-(Ans=2),7,sub(Str6,Ans,1L1-1For(X,1,2Output(X+3,3,sub(expr(sub(Str0,X+6,1)),1+3Ans(X+6),3Output(X,2,sub(expr(sub(Str0,X,1)),1+5Ans(1),5End
A: [Kevin Cruijssen] Java 8, 548 545 432 401 399 bytesa->{int q=50,H…
»
399
B
Java
a->{int q=50,H=a[0]-49,N=a[1],L=a[2],R=a[3],X=a[4],Y=a[5];return"".format(" %s%n %s%n%c(%c%c%c)%c%n%c(%s)%c%n (%s)",H<1?"":H%2<1?" ___":" _","_===_s.....s /_\\s(_*_)".split("s")[H],X==q?92:32,L<q?46:L<51?111:L<52?79:45,N<q?44:N<51?46:N<52?95:32,R<q?46:R<51?111:R<52?79:45,Y==q?47:32,X<q?60:X%2<1?32:47," s : s] [s> <".split("s")[a[6]%4],92-(Y%3+Y%6/4)*30," s : s\" \"s___".split("s")[a[7]%4]);}
A: [OganM] R 414 BytesSlightly modified version of Molx's…
»
414
B
R
W =c("_===_"," ___\n ....."," _\n /_\\"," ___\n (_*_)",",",".","_"," ",".","o","O","-"," ","\\"," "," ","<"," ","/"," "," ","/"," ","",">"," ","\\",""," : ","] [","> <"," "," : ","\" \"","___"," ")f=function(x){i=as.integer(strsplit(x,"")[[1]]);cat(" ",W[i[1]],"\n",W[i[5]+12],"(",W[i[3]+8],W[i[2]+4],W[i[4]+8],")",W[i[6]+20],"\n",W[i[5]+16],"(",W[i[7]+28],")",W[i[6]+24],"\n"," (",W[i[8]+32], ")",sep="")}
A: [Molx] R, 436 437 bytesHere's my first try on code-golf…
»
437
B
R
H=c("_===_"," ___\n ....."," _\n /_\\"," ___\n (_*_)")N=c(",",".","_"," ")L=c(".","o","O","-")X=c(" ","\\"," "," ")S=c("<"," ","/"," ")Y=c(" ","/"," ","")U=c(">"," ","\\","")T=c(" : ","] [","> <"," ")B=c(" : ","\" \"","___"," ")f=function(x){i=as.integer(strsplit(x,"")[[1]]);cat(" ",H[i[1]],"\n",X[i[5]],"(",L[i[3]],N[i[2]],L[i[4]],")",Y[i[6]],"\n",S[i[5]],"(",T[i[7]],")",U[i[6]],"\n"," (",B[i[8]], ")",sep="")}
A: [Christopher Reid] JavaScript, 489 (without newlines and tabs)x=' ';…
»
489
B
JavaScript
x=' ';d=" ";h=['\n_===_',' ___ \n.....',' _ \n /_\\ ',' ___ \n(_*-)'];n=[',','.','_',x];e=['.','o','O','-'];y=['>',,'\\',x];u=['<',,'/',x];t=[' : ','[ ]','> <',d;b=[' : ','" "',"___",d];​j=process.argv[2].split('').map(function(k){return parseInt(k)-1});q=j[4]==1;w=j[5]==1;​console.log([ h[j[0]].replace(/(.*)\n(.*)/g, " $1\n $2"), (q?'\\':x)+'('+e[j[2]]+n[j[1]]+e[j[3]]+')'+(w?'/':x), (!q?u[j[4]]:x)+'('+t[j[6]]+')'+(!w?y[j[5]]:x), x+'('+b[j[7]]+')'].join('\n'));

Evaluate submissions in Notebooks

Python example

Look at a specific python submission:
Evaluating it directly in a notebook just requires some easy setup:
〉
In[]:=
l='_===_| ___\n .....| _\n /_\| ___\n (_*_)| : |] [|> <| |>| |\| | : |" "|___| '.split('|')
l[4:4]=' \ .oO-,._ .oO- / < / '
def s(a):print(' {}\n{}({}{}{}){}\n{}({}){}\n ({})'.format(*[l[4*m+int(a[int('0421354657'[m])])-1]for m in range(10)]))
〉
In[]:=
s('11112311')

Node.js example

Look at a specific Node.js submission:
Evaluating this in a notebook also requires some easy setup.
This particular submission requires some other changes to get it to work with regard to argument handling:

Gather Top Languages by post tags

Definition

For a given tag, gather the metadata for all submissions (answers) with that tag.
Then, find the ten most commonly submitted languages.

Examples

Unsurprisingly, submissions for posts with math-related tags use Wolfram Language quite a bit:
Other submission categories don’t use Wolfram Language as much:

Language Ranks per thread with symbolic SPARQL queries

Gather thread language data

Write a symbolic SPARQL query to extract languages used and their reported submission sizes for a given thread (i.e. “parent post”).
Group the results by thread and language, sorting by reported size.
Note that different units are treated the same here since it’s difficult to compare different units (e.g. bytes vs keystrokes vs characters in different encodings, etc...).
Here are some helpful definitions for getting position ranks with ties:
Look at the rank stats across all threads for some common languages:
On average, Wolfram Language is not all that different in ranking compared to the most popular languages.
Lower-level and compiled languages tend to do worse in code golf, likely due to their requirement for boilerplate code (e.g. strong/static typing).
Find the most common languages in first place (not considering ties for simplicity):
Wolfram Language takes first place (without ties) the 9th most often among all threads:
Find the parent posts of the winning Wolfram Language submissions:
Write another symbolic SPARQL query to extract out the actual Wolfram Language submission posts:
Look at the smallest such winners. Note the heavy use of built-ins and infix forms.
Perhaps unsurprisingly, the most common symbolic (non-alphanumeric) characters are @ (for common infix notations), square brackets, and pure function notation (all staples of Wolfram Language one-liner submissions).
Extracting out the actual expressions can let us see which actual symbols are the most common:
The most common letter is e, which is also the most common letter in typical English.
This is likely because most built-in symbols have English-similar names.

User language use time series

Gather Data

Use a symbolic SPARQL query to extract when users use which languages for submissions:

Submission Dates/times

It seems that most users on codegolf.stackexchange.com may be students, as they submit more often in the summer months and during the week. Also, noon appears to be the most common submission time, shortly followed by 5pm-6pm.
Pair the day of week with the submission hour:

Languages per user

Most users only ever make submissions in one language, though several users use many dozens of different languages:

Top User language use

These are the users with the most unique language submissions:
The user with the most languages seems to have taken a hiatus around the start of 2019, but focused on many golf-specific languages:
The user with the second most unique languages appears to be quite fluent in Wolfram Language, and also somewhat recently took a hiatus.

Machine Learning Code Golf Programming Language Classifier

Gather First snippets

First snippets are almost always the submission code.
Trim down to a smaller set of languages with enough training data:

Separate into training and testing sets

Train Classifier

Test and measure accuracy

About 87% accuracy is quite good:

Attempt to classify unclassified submissions

There aren’t too many submissions without languages, but there are some:
Use the newly-trained classifier to try to fill in the gaps:
The results are decent, but need some manual cleaning up to be more accurate:
At this stage, I could update the metadata and re-export the EntityStores, but for now, this is a good stopping point.