Acceleration by Stepsize Hedging
Acceleration by Stepsize Hedging
This notebook verifies the algebraic identities in Section 5 of the paper
“Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule
by Jason M. Altschuler and Pablo A. Parrilo [arXiv:2309.07879]
“Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule
by Jason M. Altschuler and Pablo A. Parrilo [arXiv:2309.07879]
Definitions and relations
Definitions and relations
Normalizing transform for the Silver Stepsize and its inverse (equation 3.5)
In[]:=
ztob=z(1+kz)/(1+z);btoz=b(b-1)/(k-b);
Defining equations for splitting the Normalized Silver Stepsizes (equation 3.1)
Defining equations for splitting the Normalized Silver Stepsizes (equation 3.1)
In[]:=
rec={y2nz2n==zn^2,z2n-y2n==2(zn-zn^2)};
Silver Convergence Rate (equation 3.9)
Silver Convergence Rate (equation 3.9)
In[]:=
tn=((1-zn)/(1+zn))^2;t2n=((1-z2n)/(1+z2n))^2;
Silver Stepsizes (equation 3.4)
Silver Stepsizes (equation 3.4)
In[]:=
an=ztob[yn];bn=ztob[zn];a2n=ztob[y2n];b2n=ztob[z2n];
Quantities from Definition B.2
Quantities from Definition B.2
In[]:=
r=(1-zn)/(1-z2n);c=t2n/tn(r+(1+r)(z2n+zn)/(z2n-zn));phi=t2nk/(k-2)(z2n+zn)/(z2n-zn);
Quantities from Lemma B.3
Quantities from Lemma B.3
In[]:=
d12=t2n/(1-zn)(z2n+zn)/(z2n-zn);d21=ky2nt2n/(1-zn)(z2n+zn)/(z2n-zn);d31=t2n/(1-zn)(1+ky2n)-t2n(1+(k-1)zn+kzn^2)/(1-zn)^2;d32=t2n(1+(k-1)z2n+kz2n^2)/(1-z2n)^2-ctn(1+(k-1)zn+kzn^2)/(1-zn)^2;d13=-2kznt2n/(1-zn)^2;d23=2kz2nt2n/(1-z2n)^2-c2kzntn/(1-zn)^2;
Main Matrices
The four 4x4 matrices in Lemma 5.3 (definitions in Appendix B.4)
In[]:=
EE={{t2n/tn-ctn,ctna2n-t2nbn/tn,0,0},{0,t2nbn^2/tn-ctna2n^2,0,0},{0,0,c-1,b2n-cbn},{0,0,0,cbn^2-b2n^2}};S={{-1/k(d12+d13+d21+d31),d12/k+d21+d13/k+d31,(d12+d21)/k,-d12-d21/k},{0,-(d12+d13+d21+d31),-d21-d12/k,d12+d21},{0,0,-1/k(d21+d12+d23+d32),d21/k+d12+d23/k+d32},{0,0,0,-(d21+d12+d23+d32)}};L1=(k-2)/k/(1-zn){{zn-2+1/k,a2n(1-zn)+(k-1)/k,1-1/k,0},{0,2a2n(zn-1)-(kzn-1),-1+1/k,0},{0,0,0,0},{0,0,0,0}};L2=(k-2)/k/(1-zn){{0,0,-1+zn,1-zn},{0,0,a2n(1-zn),-a2n(1-zn)},{0,0,2-zn-1/k,-1+zn},{0,0,0,1-kzn}};L=phi(L1+rL2);
VerifyproofofTheorem5
Check the quadratic form in the iterates and gradients (equation 5.6)
In[]:=
J=S+L-EE;Simplify[J,rec]
Out[]=
{{0,0,0,0},{0,0,0,0},{0,0,0,0},{0,0,0,0}}
Check the linear form in function values (equation 5.9)
Check the linear form in function values (equation 5.9)
In[]:=
W=t2n(kzn-1)(z2n+zn)/(1-zn)/(z2n-zn);f1=d12+d13-d21-d31+W;f2=d21+d23-d12-d32+rW;f3=d31+d32-d13-d23-W(1+r);
In[]:=
FullSimplify[{f1,f2,f3},rec]
Out[]=
{0,0,0}