Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The basic requirements and restrictions for running any model with MPI were discussed under MPI on a Single Machine. This page focuses only on setting up the Windows cluster option, which allows a model to be run across multiple computers simultaneously. This option requires a the main node where the model is set to run and any number of worker nodes. The configuration of this kind of system is relatively complicated and will likely require the support of an IT administrator. The setup requirements are described below. Guiding principles for domain decomposition for running MPI are also provided here.

1. Requirements for All Computers on a Cluster (Main & Worker Nodes)

  • Connect all computers running a Windows OS (best if they all have the same operating system, e.g. Windows 10 build 1809) with a Local Area Network (LAN).

  • It is recommended that all the computers in the cluster have been logged in with the same Administrator account and password. The user can create the a new user to have the same username for all PCs.

  • Install Intel MPI Library for Windows which is available from this link. The user must install the same version of the library on all machines.

  • Open the Command Prompt with administrator permissions (run as Administrator) and use the following commands to determine the usernames and domain names of the computers

...

  • Set the same name for hard disks of the main and worker nodes (PCs). It is required that the hard disk drive of the main node and the worker nodes be the same ( e.g C, E, D). For example, in the main node, the MPI model is located in the E drive, then all worker nodes must use the E drive.

  • Firewall requirements. It is recommended to disable the firewall for guest/public networks.

...

  • Share the working directory (for example: "MPI_Model" is the folder containing the MPI model to be run, so the working directory in the main node will be "E:\EFDC_Explorer Modeling System\Testing\EFDC\MPI_Model")

  • Share the installed folder of EEMS (contains EFDC+ MPI exe). For example, copy the EFDC+ MPI exe folder to the shared folder containing the model. Figure 1 show shows steps to share folders.

    Anchor
    EFDC
    EFDC

...

  • Create the same directory structure on the worker computer.

In this example, the model folder on the main node is "E:\EFDC_Explorer Modeling System\Testing\EFDC\MPI_Model" so the user should create a folder path as "E:\EFDC_Explorer Modeling System\Testing\EFDC\" on the worker computers. Note that there no need to create the folder name "MPI_Model", otherwise the next steps will have an error.

...

  • In the List of Computer Nodes table type computer names (including the main and worker nodes) in the Computer column. It is recommended to put the name of the main node on the first row and worker nodes in the following rows (see Figure 7). Another option is to put the IP address of the PCs. Figure 8 shows how to get the IP address of the current PC.

  • In the Slots column, the user can manage the number of subdomains to be assigned to each computer. For example, this model has 8 MPI subdomains, in which two subdomains are assigned to run on the main node, and six subdomains on the worker computer (see Figure 7).

  • Click on the Ping button on the Check column to verify that the connection is established between computers on the local network. Ping is the reaction time of your connection–how fast you get a response after you have sent out a request. A fast ping means a more responsive connection. It will show “Failed” if not connected.

  • Click the Save button to save out the Host file (hostfile.txt). The Host file contains information on the configuration including the list of computer nodes (see Figure 9). Next time the user wants to rerun the model, the Host file can be used by clicking the Load button.

  • Finally, click the Run EFDC+ button to run the model. Figure 10 shows the EFDC+ MPI window running and show the number of cores using in the main and worker computers

...