Speed Gains with OMP

The Intel® runtime library binds OMP threads to physical processing units. Depending on the machine topology, application, and operating system, thread affinity can have a substantial impact on the application speed. Users typically produces run times up to 4 times faster on a six core processor than the conventional single-threaded EFDC model. The figure below shows the speed gains from one processor (OMP1) to six processors (OMP6) for various computational groups in an EFDC model setup for the Sacramento-San Joaquin Delta. As can be seen in the first column, "Elapsed Total", overall run times are now up to 4 times faster with 6 threads than what they were running a single thread.

 Figure 1  EFDC_DSI_OMP Model Run Times Relative to a Single Processor Run.


Note that the computational groups, or sub-models, within EFDC have been individually timed in order to track the internal process that uses the majority of the run times. The sub models shown here are:

  • Elapsed Total = total elapsed time;
  • CPU Total = total CPU time;
  • Transport = advective transport calculations;
  • Vert Diff = vertical diffusivity calculations;
  • PUV = pressure fields for 2D solution of water surface elevation;
  • QQ = turbulence intensity;
  • UVW = internal mode momentum calculations;
  • EXP = explicit terms in momentum calculations;
  • T/B SH = top and bottom shear calculations;
  • SSED = sediment transport calculations;
  • V&D = horizontal and vertical diffusivity calculations;
  • Heat = heat transfer calculations.

Results Independent of Number of Threads

OMP in EFDC+ has been implemented in such a way as to ensure that the models produce exactly the same results no matter how many threads are used.  To check this a model can be run with different sub-models activated.  This model is run with a range of threads from 1 (referred to as "OMP1") to 16 threads (OMP16).  Using the Model Comparison feature the model results can be compared.  Using the Model Comparison to subtract the 2D results from one model from a base model for the same time snapshot produces a 2D view of the differences between models.  For the OMP comparisons, the differences should be equal to 0.0.


Plots from an example that has full 3D hydrodynamics, sediments and water quality activated are shown in Figure 2.  This model was run five days for OMP1 and OMP2.  The two models were then compared and plots below generated.  A plot is shown of the model results (left) and the comparison for salinity (right).  As can be seen, the model differences are equal to zero, showing that the number of threads does not impact model results.

Figure 2.  SaIinity results (left) and comparison of OMP1 minus OMP2 (right).