Multicore CPU support

SIMION 8.1.0 is designed to utilize multiple CPU cores to improve performance on systems that have multiple CPU cores.

Scope: SIMION 8.1.0 can use multiple CPU cores to accelerate various aspects of SIMION’s operation, particularly PA refines, GEM file processing, the 3D Modify view rendering (polygonization), and some other Modify/View screen 3D view renderings. Other areas may be accelerated in the future.

WARNING: Fly’ms are currently not accelerated, but there are plans in 8.2beta as a high priority feature. Some cases are easier than others and may be implemented sooner (e.g. Grouped flying disabled), and some changes will probably be necessary to user programs. One way to simulate parallelization of particle trajectory calculations (even in older 7.0/8.0 versions of SIMION) is to start up multiple instances the SIMION program, each flying a different set of particles. This only has the disadvantages that this takes up more PA memory and Data Recording output from the separate instances later needs to be merged. Another approach demonstrated in SIMION 8.1’s SIMION Example: multiprocess could also be used for parallelizing fly’ms across even multiple computers by having SIMION instances communicating via sockets, but a pre-build example of that has not been provided yet.

System Requirements: The multicore support can utilize systems with multiple CPU cores. This can be a single CPU with multiple cores or a multiprocessor system–i.e. a system with multiple CPUs, each which may have multiple cores. For example, a dual CPU system with four cores per CPU has (2 x 4 = 8 cores). Hyperthreading (which are kind-of like pseudo-cores) was not observed to give much if any improvement. The multicore support also works inside Wine/CrossOver (under Linux or Mac OS X) and virtual machines (provided multiple physical cores have been allocated to the virtual machine).

Speed improvements: Speed improvements vary, but on one eight core (dual CPU, quad core Xeon) workstation, an approximately 4x improvement in Refines was observed. Whether the array size is smaller than your CPU cache may affect whether this number is closer to 3x or 8x. RAM speed and CPU cache sizes, as well as you PA size, may affect speed. Disk I/O speed when loading/saving arrays can be a bottleneck and is not accelerated by multicore support. Feedback on speed improvements on other systems are welcome (anyone with 16, 32 or more cores?).

Usage: By default, SIMION will automatically utilize all CPU cores that are available, so you don’t need to do anything special to take advantage of these speed improvements. However, if you wish, you can control the number of cores with the --num-threads command line parameter (e.g. simion.exe --num-threads=2) or the OMP_NUM_THREADS environment variable. By default, SIMION sets the number of threads equal to the number of CPU cores you have, which is usually ideal. There’s rarely any benefit going above that. You may want to set the number of threads to less than the number of cores if you don’t want a SIMION process to hog all the cores. Note that adding more cores has a diminishing rate of return on calculation speed and beyond a certain point speed could even decrease (e.g. if you have a machine with dozens of cores).

Note

This page is abridged from the full SIMION "Supplemental Documentation" (Help file). The following additional sections can be found in the full version of this page accessible via the "Help > Supplemental Documentation" menu in SIMION 8.1.1 or above:
  • Understanding your system and CPU information
  • Slow speed issue if other programs are using cores (WARNING!), Process Affinity, and Hyperthreading
  • Some more benchmarks

Fly’m Parallelization

As mentioned above, Fly’m are not currently multithreaded, although it is a priority to make it be so. One workaround is to launch multiple independent SIMION processes, each on a different set of particles. It is even possible to launch the SIMION processes from a SIMION user program and collect their results in SIMION:

-- Get SIMION executable native path.
--local SIMION = 'c:\\Program Files\\SIMION-8.1\\simion.exe'
local SIMION = simion._internal.simion_exe
local slash = package.config:sub(1,1)  -- windows \\ or linux /
SIMION = SIMION:gsub('/', slash)
print(SIMION)

-- launch SIMION processes asynchronously.
local jobs = {}
for i=1,8 do
  local f = assert(io.popen('"' .. SIMION .. '" --nogui fly quad.iob'))
  table.insert(jobs, f)
end
print 'la la la...'

-- join and read standard output result from each process.
for i=1,8 do
  local res = jobs[i]:read'*a'
  print(res)
  jobs[i]:close()
end

print 'DONE'

Splitting Fast Adjust Refines Across Multiple Computers

For fast adjust arrays (.PA#), the individual solution arrays (e.g. .PA1, .PA2, etc.) can be solved independently on different computers (or different sets of cores on the same computer) using SIMION 8.1.

-- Generate and refine .PA0 (quick) and .PA1 arrays.
local pa = simion.pas:open('drag.pa#')
pa:refine{solutions={0,1}, convergence=1e-7}
-- Generate and refine .PA2 array
local pa = simion.pas:open('drag.pa#')
pa:refine{solutions={2}, convergence=1e-7}
-- Generate and refine .PA3 array
local pa = simion.pas:open('drag.pa#')
pa:refine{solutions={3}, convergence=1e-7}

See simion.pas pa:refine().

Intel Xeon Phi

There has been preliminary work on supporting the Intel Xeon “Phi” coprocessor. See Intel Xeon Phi.

ZeroMQ

ZeroMQ allows multiple SIMION processes to intercommunicate (on the same computer or different computers).

_images/zmq_parallel.jpg

See Calling External Programs for details. This is SIMION Example: multiprocess_zmq.

Changes

  • 2014-07-11: Refine: Mitigate major Refine performance loss when other programs using CPU cores that Refine is using. See notes on OMP_WAIT_POLICY=ACTIVE Windows environment variable, which is now the default setting.
  • 2014-06: Xeon Phi preliminary support.