He had written a custom Monte Carlo particle filter, loosely coupled through Intel MPI. Each particle was a "what-if" scenario. 10,000 particles. 64 cores. 512-bit vectors. The system reached 98% of theoretical peak flops.
, this feature provided a visual map to identify which loops were most worth optimizing based on hardware limits. Disk I/O Analysis: Intel® VTune™ Amplifier intel parallel studio xe 2017