Efficiency in low-level code is critical for embedded systems with limited resources. Loop optimization, one of the most common structures, can make a difference in power consumption and response time.
Compiler Analysis and Directives
Using the correct directives allows the compiler (such as GCC for ARM) to apply aggressive optimizations. For example, using __attribute__((optimize("O3"))) for specific functions.
// Non-optimized loop
for(uint32_t i = 0; i < buffer_size; i++) {
data_buffer[i] = sensor_read();
}
// Loop with suggested prefetch and unrolling
#pragma unroll(4)
for(uint32_t i = 0; i < buffer_size; i+=4) {
prefetch_data(&sensor_buffer[i+8]);
data_buffer[i] = sensor_read();
data_buffer[i+1] = sensor_read();
data_buffer[i+2] = sensor_read();
data_buffer[i+3] = sensor_read();
}
Results on Cortex-M4
Tests on an STM32F4 showed a 40% reduction in clock cycles for DMA-assisted operations, freeing up the CPU for higher-priority tasks.
The choice between while and for can affect the generated assembly code, especially with size optimization options (-Os).