The following three formulas are well known, they are found in many books on linear regression. It is not difficult to derive them.
β1=rYX1−rYX2rX1X21−r2X1X2√
β2=rYX2−rYX1rX1X21−r2X1X2√
R2=r2YX1+r2YX2−2rYX1rYX2rX1X21−r2X1X2√
If you substitute the two betas into your equation
R2=rYX1β1+rYX2β2, you will get the above formula for R-square.
Here is a geometric "insight". Below are two pictures showing regression of Y by X1 and X2. This kind of representation is known as variables-as-vectors in subject space (please read what it is about). The pictures are drawn after all the three variables were centered, and so (1) every vector's length = st. deviation of the respective variable, and (2) angle (its cosine) between every two vectors = correlation between the respective variables.
Y^ is the regression prediction (orthogonal projection of Y onto "plane X"); e is the error term; cos∠YY^=|Y^|/|Y|, multiple correlation coefficient.
The left picture depicts skew coordinates of Y^ on variables X1 and X2. We know that such coordinates relate the regression coefficients. Namely, the coordinates are: b1|X1|=b1σX1 and b2|X2|=b2σX2.
And the right picture shows corresponding perpendicular coordinates. We know that such coordinates relate the zero order correlation coefficients (these are cosines of orthogonal projections). If r1 is the correlation between Y and X1 and r∗1 is the correlation between Y^ and X1
then the coordinate is r1|Y|=r1σY=r∗1|Y^|=r∗1σY^. Likewise for the other coordinate, r2|Y|=r2σY=r∗2|Y^|=r∗2σY^.
So far it were general explanations of linear regression vector representation. Now we turn for the task to show how it may lead to R2=r1β1+r2β2.
First of all, recall that in their question @Corone put forward the condition that the expression is true when all the three variables are standardized, that is, not just centered but also scaled to variance 1. Then (i.e. implying |X1|=|X2|=|Y|=1 to be the "working parts" of the vectors) we have coordinates equal to: b1|X1|=β1; b2|X2|=β2; r1|Y|=r1; r2|Y|=r2; as well as R=|Y^|/|Y|=|Y^|. Redraw, under these conditions, just the "plane X" of the pictures above:
On the picture, we have a pair of perpendicular coordinates and a pair of skew coordinates, of the same vector Y^ of length R. There exist a general rule to obtain perpendicular coordinates from skew ones (or back): P=SC, where P is points X axes
matrix of perpendicular ones; S is the same sized matrix of skew ones; and C are the axes X axes
symmetric matrix of angles (cosines) between the nonorthogonal axes.
X1 and X2 are the axes in our case, with r12 being the cosine between them. So, r1=β1+β2r12 and r2=β1r12+β2.
Substitute these rs expressed via βs in the @Corone's statement R2=r1β1+r2β2, and you'll get that R2=β21+β22+2β1β2r12, - which is true, because it is exactly how a diagonal of a parallelogram (tinted on the picture) is expressed via its adjacent sides (quantity β1β2r12 being the scalar product).
This same thing is true for any number of predictors X. Unfortunately, it is impossible to draw the alike pictures with many predictors.