Set up
Set up
Language usage
Language usage
Gather all submission metadata
Gather all submission metadata
We’re only interested in answers, as they tend to be in a standard format and questions are prompts for which users add submissions.
In[]:=
answerToMetadata=EntityValueEntityClass"StackExchange.Codegolf:Post","PostType","CodeGolfMetadata","EntityAssociation";answerToMetadata//Length
Out[]=
142574
Count usage per language
Count usage per language
In reality, there really are not over 8600 programming languages used on the site -- my processing of the posts isn’t perfect since users tend to stray from the common format that is easy to parse.
In[]:=
languageToCount=ReverseSort@Counts[DeleteMissing@Values[answerToMetadata[[All,"Language"]]]];languageToCount//Length
Out[]=
8623
In fact, the top 14 languages account for just over half of the entire site’s submissions, so in practice, there are vastly fewer real languages:
In[]:=
100*N
Total[languageToCount[[;;14]]]
Total[languageToCount]
Out[]=
50.3277
In[]:=
languageToCount[[;;14]]
Out[]=
13953,9360,5570,4453,4286,4186,4151,4019,3910,3599,3555,2954,2778,2716
Perhaps unsurprisingly, Python and JavaScript are the most commonly submitted languages.
However, there are many golf-specific languages that appear in the top dozen or so.
However, there are many golf-specific languages that appear in the top dozen or so.
In[]:=
BarChart[100.*N[Reverse[languageToCount[[;;14]]]/Total[languageToCount]],ScalingFunctions"Log",BarOriginLeft,ChartLabelsAutomatic,ImageSizeLarge,PlotTheme"Detailed",LabelStyle18,PlotLabel"Language use on codegolf.stackexchange.com",FrameLabel{None,"Fraction of submissions (%)"}]
Out[]=
Languages with bad words
Languages with bad words
Sort Thread Submissions by Size
Sort Thread Submissions by Size
Gather data
Gather data
Start with a “parent” of a specific thread:
In[]:=
parent=;
Gather all of the “child” submissions:
In[]:=
submissions=EntityList@EntityClass"StackExchange.Codegolf:Post",parent;submissionToMetadata=EntityValue[submissions,"CodeGolfMetadata","EntityAssociation"];submissionToMetadata//Length
Out[]=
26
Show
Show
Show the submissions sorted by their reported size (a feature not available on the website without leaderboard code, which sometimes breaks):
In[]:=
NiceGrid[{#["ReportedSize"],#["Language"],First[#["CodeSnippets"],Missing["NotFound"]]}&/@SortBy[submissionToMetadata,#ReportedSize&],AlignmentLeft]
Out[]=
0000000: 4e22285b200a5c225f2a295c2d2e2f6f2c3e4f3a3c3d5d225f N"([ .\"_*)\-./o,>O:<=]"_0000019: 2422dd7382d6bfab28707190992f240c362ee510262bd07a77 $".s....(pq../$.6...&+.zw0000032: 08556de9dcdb566c676817c2b87f5ecb8bab145dc2f2f76e07 .Um...Vlgh....^....]...n.000004b: 22323536624b623224663d4e2f7b5f2c342f2f7d25723a7e2e "256bKb2$f=N/{_,4//}%r:~.0000064: 3d2828342423346222205f0a20222e2a6f6f736572372f4e2a =((4$#4b" _. ".*ooser7/N* | |||
…)(ð•ž&Ž•J•£ÊÒu[7tˆ†ŠHλRΩ.P•12вèJsvN“_ .\/=":><Oo-[],)*(“•æ‰ΔΣ₁çδ₂¯r3₁’8iÈÉÞ2;lλÒžfúāÿ©-Ñm¦Ñ`^«#„]*≠½Ü4~āÐm=¾ç•20вè¶¡Nè4äyè.; | |||
"b8li'U9gN;|"125:Kb8bl:~f="r pL|P3{cR`@L1iT"Kb21b"G.HMtNY7VM=BM@$^$dX8a665V"KbFb"=_./ <[(*-oO,\":"f=_"/<[(""\>])"er+4/f=.=7/N* | |||
q:Q;SS" _===_,___ ....., _ /_\,___ (_*_)"',/0{Q=~(=}:G~N" \ "4G'(".oO-"_2G",._ "1G@3G')" / "5GN"< / "4G'(" : ] [> < "3/6G')"> \ "5GNS'(" : \" \"___ "3/7G') | |||
7S*"_===_ ___ ..... _ /_\ ___ (_*_)"+6/2/Nf*",._ "1/".oO-"1/_" <\ / >/ \ "2/4/~" : ] [> < : \" \"___ "3/4/~]l~Ab:(]z::=:L0=N4{L=}:K~0='(2K1K3K')5K0=N4K1='(6K')5K1=NS'(7K') | |||
for($t=' 0 _ _0 ___0 _ _ 0_. (0=./_0=._*0=.\_0_. )4 \ (2.oO-1,._ 3.oO-)5 / 4< / (6 ]> 6: 6 [< )5> \ (7 "_ 7: _ 7 "_ )';$d=$t[$i++];$r+="$d"){if($d-ge48){$d=$t[$i+"$args"["$d"]-49]$i+=4}}$r | |||
s=>` 08(213)94(6)5 (7)`.replace(/\d/g,p=>`_===_1 ___ .....1 _ /_\\1 ___ (_*_)1,1.1_11.1o101-1.1o101-1<11/11>11\\11 : 1] [1> <1 1 : 1" "1___1 11\\11 11/11 `.split(1)[s[p>7?p-4:p]-1+p*4]||' ') | |||
M@GCHgc" ___ ___ _"bhzgc" (_*_) _===_ ..... /_\\"bhzs[g" \ "@z4\(g"-.oO"@z2g" ,._"@z1g"-.oO"@z3\)g" / "@z5)s[g" < /"@z4\(gc" : ] [> <"b@z6\)g" > \\"@z5)++" ("gc" : \" \"___"bez\) | |||
d;main(){char*t="##3#b#b3#bbb3#b#b##\r#3b1#+3@12b3@1b-3@1_b3b1#,#\r7#_##+51rR04/1b#61rR0,8#2##\r7?#2#+9#`A#9=###9#^?#,8A#_#\r#+:#%b#:=#b#:#%b#,#",p[9];for(gets(p);d=*t++;putchar(d-3))d=d<51?d:(p[d-51]-53)[t+=4];} | |||
char*t=" 0 _ _0 ___0 _ _ 0_. (0=./_0=._*0=.\\_0_. ) 4 \\ (2.oO-1,._ 3.oO-)5 / 4< / (6 ]> 6: 6 [< )5> \\ (7 \"_ 7: _ 7 \"_ ) ";i,r,d;f(char*p){while(r++<35){d=t[i]-48;putchar(t[d<0?i:i+p[d]-48]);i+=d<0?1:5;r%7?0:puts("");}} | |||
l='_===_| ___\n .....| _\n /_\| ___\n (_*_)| : |] [|> <| |>| |\| | : |" "|___| '.split('|')l[4:4]=' \ .oO-,._ .oO- / < / 'def s(a):print(' {}\n{}({}{}{}){}\n{}({}){}\n ({})'.format(*[l[4*m+int(a[int('0421354657'[m])])-1]for m in range(10)])) | |||
H,N,L,R,X,Y,T,B=map(int,i)l='\n's=' 'e=' .o0-'F=' \ / 'S=' < / \ >'o,c='()'print s+' _ _ ___ _ _\n\n\n\n _. (=./_=._*=.\__. )'[H::4]+l+F[X]+o+e[L]+' ,._ '[N]+e[R]+c+F[-Y]+l+S[X]+o+' ]> : [< '[T::4]+c+S[-Y]+l+s+o+' "_ : _ "_ '[B::4]+c | |||
def s(g):H,N,L,R,X,Y,T,B=[int(c)-1for c in g];e='.oO-';print(' '*9+'_ _ ___ _ _\n\n\n\n _. (=./_=._*=.\\__. )')[H::4]+'\n'+' \\ '[X]+'('+e[L]+',._ '[N]+e[R]+')'+' / '[Y]+'\n'+'< / '[X]+"("+' ]> : [< '[T::4]+')'+'> \\ '[Y]+'\n ('+' "_ : _ "_ '[B::4]+")" | |||
#define P(n)[s[n]&3],f(char*s){printf(" %.3s\n %.5s\n%c(%c%c%c)%c\n%c(%.3s)%c\n (%.3s)","___ ___ _"+*s%4*3,"(_*_)_===_..... /_\\"+*s%4*5," \\ "P(4)"-.o0"P(2) " ,._"P(1)"-.o0"P(3)" /"P(5)" < /"P(4)" : ] [> <"+s[6]%4*3," > \\"P(5)" : \" \"___"+s[7]%4*3);} | |||
V='.oO-'def F(d): D=lambda i:int(d[i])-1 print" "+("","___"," _ ","___")[D(0)]+"\n "+\"_. (=./_=._*=.\\__. )"[D(0)::4]+"\n"+\" \\ "[D(4)]+"("+V[D(2)]+',._ '[D(1)]+V[D(3)]+")"+" / "[D(5)]+'\n'+\"< / "[D(4)]+"("+" ]> : [< "[D(6)::4]+")"+"> \\ "[D(5)]+"\n ("+\' "_ : _ "_ '[D(7)::4]+")" | |||
o l a b=take a$drop((b-1)*a)ln="\n"p i=id=<<[" ",o" \n _===____ \n ..... _ \n /_\\ ___ \n (_*_)"11a,n,o" \\ "1e,o"(.(o(O(-"2c,o",._ "1 b,o".)o)O)-)"2d,o" / "1f,n,o"< / "1e,o"( : )(] [)(> <)( )"5g,o"> \\ "1f,n," (",o" : )\" \")___) )"4h]where[a,b,c,d,e,f,g,h]=map(read.(:[]))i | |||
f(i,{r='.o0-',s=' : '}){i=i.split('').map((j)=>int.parse(j)-1).toList();return' ${['_===_',' ___ \n.....',' /_\\ ',' ___ \n (_*_)'][i[0]]}\n${' \\ '[i[4]]}(${r[i[2]]+',._ '[i[1]]+r[i[3]]})${' / '[i[5]]}\n${'< / '[i[4]]}(${[s,'] [','> <',' '][i[6]]})${'> \\ '[i[5]]}\n (${[s,'" "','___',' '][i[7]]})';} | |||
a=y["\n _===_\n"," ___ \n .....\n"," _ \n /_\\ \n"," ___ \n (_*_)\n"]d=y",._ "c=y".oO-"e=y"< / "j=y" \\ "f=y"> \\ "k=y" / "y w n=w!!(n-1)h=y[" : ","] [","> <"," "]b=y[" ( : ) \n"," (\" \") \n"," (___) \n"," ( ) \n"]s(m:x:o:p:n:q:t:l:_)=putStr$a m++j x:'(':c o:d n:c p:')':k q:'\n':e x:'(':h t++')':f q:'\n':b l | |||
let f(g:string)= let b=" " let p=printfn let i x=int(g.[x])-49 p" %s "["";"___";" _ ";"___"].[i 0] p" %s "["_===_";".....";" /_\ ";"(_*_)"].[i 0] p"%s(%c%c%c)%s"[b;"\\";b;b].[i 4]".oO-".[i 2]",._ ".[i 1]".oO-".[i 3][b;"/";b;b;b].[i 5] p"%s(%s)%s"["<";b;"/";b].[i 4][" : ";"] [";"> <";" "].[i 6][">";b;"\\";b].[i 5] p" (%s) "[" : ";"\" \"";"___";" "].[i 7] | |||
<?$f=str_split;$r=$f($argv[1]);$p=[H=>' _===____..... _ /_\ ___(_*_)',N=>',._ ',L=>'.oO-',R=>'.oO-',X=>' <\ / ',Y=>' >/ \ ',T=>' : ] [> < ',B=>' : " "___ '];echo preg_replace_callback("/[A-Z]/",function($m){global$A,$p,$r,$f;$g=$m[0];return$f($f($p[$g],strlen($p[$g])/4)[$r[array_search($g,array_keys($p))]-1])[(int)$A[$g]++];},' HHH HHHHHX(LNR)YX(TTT)Y (BBB)'); | |||
Input Str9seq(inString("1234",sub(Str9,I,1)),I,1,length(Ans→L1" ___ _ ___ →Str1"_===_..... /_\ (_*_)→Str2",._ →Str3"•oO-→Str4"<\/ →Str5">/\ →Str6" : ] [> < →Str7" : ¨ ¨___ →Str8"Str1Str2Str3Str4Str5Str6Str7Str8→Str0For(X,3,5Output(X,2,"( )EndL1Output(3,3,sub(Str4,Ans(3),1)+sub(Str3,Ans(2),1)+sub(Str4,Ans(4),1Ans(5Output(4-(Ans=2),1,sub(Str5,Ans,1L1(6Output(4-(Ans=2),7,sub(Str6,Ans,1L1-1For(X,1,2Output(X+3,3,sub(expr(sub(Str0,X+6,1)),1+3Ans(X+6),3Output(X,2,sub(expr(sub(Str0,X,1)),1+5Ans(1),5End | |||
a->{int q=50,H=a[0]-49,N=a[1],L=a[2],R=a[3],X=a[4],Y=a[5];return"".format(" %s%n %s%n%c(%c%c%c)%c%n%c(%s)%c%n (%s)",H<1?"":H%2<1?" ___":" _","_===_s.....s /_\\s(_*_)".split("s")[H],X==q?92:32,L<q?46:L<51?111:L<52?79:45,N<q?44:N<51?46:N<52?95:32,R<q?46:R<51?111:R<52?79:45,Y==q?47:32,X<q?60:X%2<1?32:47," s : s] [s> <".split("s")[a[6]%4],92-(Y%3+Y%6/4)*30," s : s\" \"s___".split("s")[a[7]%4]);} | |||
W =c("_===_"," ___\n ....."," _\n /_\\"," ___\n (_*_)",",",".","_"," ",".","o","O","-"," ","\\"," "," ","<"," ","/"," "," ","/"," ","",">"," ","\\",""," : ","] [","> <"," "," : ","\" \"","___"," ")f=function(x){i=as.integer(strsplit(x,"")[[1]]);cat(" ",W[i[1]],"\n",W[i[5]+12],"(",W[i[3]+8],W[i[2]+4],W[i[4]+8],")",W[i[6]+20],"\n",W[i[5]+16],"(",W[i[7]+28],")",W[i[6]+24],"\n"," (",W[i[8]+32], ")",sep="")} | |||
H=c("_===_"," ___\n ....."," _\n /_\\"," ___\n (_*_)")N=c(",",".","_"," ")L=c(".","o","O","-")X=c(" ","\\"," "," ")S=c("<"," ","/"," ")Y=c(" ","/"," ","")U=c(">"," ","\\","")T=c(" : ","] [","> <"," ")B=c(" : ","\" \"","___"," ")f=function(x){i=as.integer(strsplit(x,"")[[1]]);cat(" ",H[i[1]],"\n",X[i[5]],"(",L[i[3]],N[i[2]],L[i[4]],")",Y[i[6]],"\n",S[i[5]],"(",T[i[7]],")",U[i[6]],"\n"," (",B[i[8]], ")",sep="")} | |||
x=' ';d=" ";h=['\n_===_',' ___ \n.....',' _ \n /_\\ ',' ___ \n(_*-)'];n=[',','.','_',x];e=['.','o','O','-'];y=['>',,'\\',x];u=['<',,'/',x];t=[' : ','[ ]','> <',d;b=[' : ','" "',"___",d];j=process.argv[2].split('').map(function(k){return parseInt(k)-1});q=j[4]==1;w=j[5]==1;console.log([ h[j[0]].replace(/(.*)\n(.*)/g, " $1\n $2"), (q?'\\':x)+'('+e[j[2]]+n[j[1]]+e[j[3]]+')'+(w?'/':x), (!q?u[j[4]]:x)+'('+t[j[6]]+')'+(!w?y[j[5]]:x), x+'('+b[j[7]]+')'].join('\n')); |
Evaluate submissions in Notebooks
Evaluate submissions in Notebooks
Python example
Python example
Look at a specific python submission:
Evaluating it directly in a notebook just requires some easy setup:
In[]:=
l='_===_| ___\n .....| _\n /_\| ___\n (_*_)| : |] [|> <| |>| |\| | : |" "|___| '.split('|')
l[4:4]=' \ .oO-,._ .oO- / < / '
def s(a):print(' {}\n{}({}{}{}){}\n{}({}){}\n ({})'.format(*[l[4*m+int(a[int('0421354657'[m])])-1]for m in range(10)]))
l[4:4]=' \ .oO-,._ .oO- / < / '
def s(a):print(' {}\n{}({}{}{}){}\n{}({}){}\n ({})'.format(*[l[4*m+int(a[int('0421354657'[m])])-1]for m in range(10)]))
In[]:=
s('11112311')
Node.js example
Node.js example
Look at a specific Node.js submission:
Evaluating this in a notebook also requires some easy setup.
This particular submission requires some other changes to get it to work with regard to argument handling:
This particular submission requires some other changes to get it to work with regard to argument handling:
Gather Top Languages by post tags
Gather Top Languages by post tags
Definition
Definition
For a given tag, gather the metadata for all submissions (answers) with that tag.
Then, find the ten most commonly submitted languages.
Then, find the ten most commonly submitted languages.
Examples
Examples
Unsurprisingly, submissions for posts with math-related tags use Wolfram Language quite a bit:
Other submission categories don’t use Wolfram Language as much:
Language Ranks per thread with symbolic SPARQL queries
Language Ranks per thread with symbolic SPARQL queries
Gather thread language data
Gather thread language data
Write a symbolic SPARQL query to extract languages used and their reported submission sizes for a given thread (i.e. “parent post”).
Group the results by thread and language, sorting by reported size.
Note that different units are treated the same here since it’s difficult to compare different units (e.g. bytes vs keystrokes vs characters in different encodings, etc...).
Note that different units are treated the same here since it’s difficult to compare different units (e.g. bytes vs keystrokes vs characters in different encodings, etc...).
Here are some helpful definitions for getting position ranks with ties:
Look at the rank stats across all threads for some common languages:
On average, Wolfram Language is not all that different in ranking compared to the most popular languages.
Lower-level and compiled languages tend to do worse in code golf, likely due to their requirement for boilerplate code (e.g. strong/static typing).
Lower-level and compiled languages tend to do worse in code golf, likely due to their requirement for boilerplate code (e.g. strong/static typing).
Find the most common languages in first place (not considering ties for simplicity):
Wolfram Language takes first place (without ties) the 9th most often among all threads:
Find the parent posts of the winning Wolfram Language submissions:
Write another symbolic SPARQL query to extract out the actual Wolfram Language submission posts:
Look at the smallest such winners. Note the heavy use of built-ins and infix forms.
Perhaps unsurprisingly, the most common symbolic (non-alphanumeric) characters are @ (for common infix notations), square brackets, and pure function notation (all staples of Wolfram Language one-liner submissions).
Extracting out the actual expressions can let us see which actual symbols are the most common:
The most common letter is e, which is also the most common letter in typical English.
This is likely because most built-in symbols have English-similar names.
This is likely because most built-in symbols have English-similar names.
User language use time series
User language use time series
Gather Data
Gather Data
Use a symbolic SPARQL query to extract when users use which languages for submissions:
Submission Dates/times
Submission Dates/times
It seems that most users on codegolf.stackexchange.com may be students, as they submit more often in the summer months and during the week. Also, noon appears to be the most common submission time, shortly followed by 5pm-6pm.
Pair the day of week with the submission hour:
Languages per user
Languages per user
Most users only ever make submissions in one language, though several users use many dozens of different languages:
Top User language use
Top User language use
These are the users with the most unique language submissions:
The user with the most languages seems to have taken a hiatus around the start of 2019, but focused on many golf-specific languages:
The user with the second most unique languages appears to be quite fluent in Wolfram Language, and also somewhat recently took a hiatus.
Machine Learning Code Golf Programming Language Classifier
Machine Learning Code Golf Programming Language Classifier
Gather First snippets
Gather First snippets
First snippets are almost always the submission code.
Trim down to a smaller set of languages with enough training data:
Separate into training and testing sets
Separate into training and testing sets
Train Classifier
Train Classifier
Test and measure accuracy
Test and measure accuracy
About 87% accuracy is quite good:
Attempt to classify unclassified submissions
Attempt to classify unclassified submissions
There aren’t too many submissions without languages, but there are some:
Use the newly-trained classifier to try to fill in the gaps:
The results are decent, but need some manual cleaning up to be more accurate:
At this stage, I could update the metadata and re-export the EntityStores, but for now, this is a good stopping point.