Graph 500 Benchmark
Graph 500 Implementation with SWARM
Graph 500 is a new supercomputing benchmark that evaluates system capacity for data-intensive applications with the aim of modeling more realistic application workloads than the traditional Top500 LINPACK benchmark.
ETI has ported the Graph 500 reference MPI implementation to SWARM (SWift Adaptive Runtime Machine) and produced results on four different supercomputers. Comparisons show consistent speed-up from 2-fold to 11-fold.
- SWARM running on the Texas Advanced Computing Center’s Lonestar: 9.22 GE/s at 1024 nodes (Scale 35)
- SWARM running on Oak Ridge National Lab’s JaguarPF: 6.12 GE/s at 512 nodes (Scale 33)
- SWARM running on the Intel Endeavor: 9.03 GE/s at 256 nodes (Scale 34)
SWARM on the Graph 500 June 2011 List
Four supercomputers using SWARM also secured spots on the most recent Graph 500 list announced in June 2011, including three in the top 10 and one recognized as the second best performance per core of all machines on the list.
#6 - TACC's Lonestar running SWARM 4 delivered 8.1 GE/s at 512 nodes (Scale 34)
#8 - Sandia National Lab's Red Sky running SWARM delivered 9.5 GE/s at 512 nodes (Scale 33)
#9 - Intel's Endeavor running SWARM delivered 6.9 GE/s at 256 nodes (Scale 33)
Also running SWARM, Intel's Discovery (Westmere E7-4870) is at #26 on the list and achieved the second highest performance per core at 17.625 million traversed edges per second per core on a single node (Scale 27).
Results using SWARM technology
- SWARM has lower type overhead than MPI.
- SWARM uses active messages while MPI uses message passing (resulting in potentially fewer copies and round trips).
- SWARM threads on the same node share address space, whereas MPI processes must copy data between processes.
- SWARM monitors and allocates cache utilization as a resource shared between all threads on a node. MPI cannot.
- SWARM separates ‘cache friendly’ and ‘cache unfriendly’ code on different cores. MPI processes run all code on the same core and thrash L1/L2 cache.
- SWARM’s idle threads on the same machine can steal work from other threads when applicable. MPI processes cannot easily steal work because of separate address spaces.
- SWARM idles unused threads. MPI spins waiting for messages.
- SWARM is an effective substitute for MPI + OpenMP + Active Messages, all in one package with lower overhead.
To learn more about how SWARM can improve performance of your machine, provide benchmarking services, or deliver on your customized many-core programming needs, contact ETI now or check out our materials: