performance - When, if ever, is loop unrolling still useful? -
i've been trying optimize extremely performance-critical code (a quick sort algorithm that's being called millions , millions of times inside monte carlo simulation) loop unrolling. here's inner loop i'm trying speed up:
// search elements swap. while(myarray[++index1] < pivot) {} while(pivot < myarray[--index2]) {}
i tried unrolling like:
while(true) { if(myarray[++index1] < pivot) break; if(myarray[++index1] < pivot) break; // more unrolling } while(true) { if(pivot < myarray[--index2]) break; if(pivot < myarray[--index2]) break; // more unrolling }
this made absolutely no difference changed more readable form. i've had similar experiences other times i've tried loop unrolling. given quality of branch predictors on modern hardware, when, if ever, loop unrolling still useful optimization?
loop unrolling makes sense if can break dependency chains. gives out of order or super-scalar cpu possibility schedule things better , run faster.
a simple example:
for (int i=0; i<n; i++) { sum += data[i]; }
here dependency chain of arguments short. if stall because have cache-miss on data-array cpu cannot wait.
on other hand code:
for (int i=0; i<n; i+=4) { sum1 += data[i+0]; sum2 += data[i+1]; sum3 += data[i+2]; sum4 += data[i+3]; } sum = sum1 + sum2 + sum3 + sum4;
could run faster. if cache miss or other stall in 1 calculation there still 3 other dependency chains don't depend on stall. out of order cpu can execute these.
Comments
Post a Comment