为什么 GCC 不优化 a * a * a * a * a * a 到(a * a * a)*(a * a * a)? gcc assembly floating-point compiler-optimization fast-math