ARCHER Benchmarks

The ARCHER CSE service at EPCC worked with the ARCHER user community to define a set of benchmarks to help understand the performance charateristics of the ARCHER system.

The work to define these benchmarks has been published as an ARCHER white paper at:

This page contains the most up to date results from running these benchmarks.

ARCHER Application Benchmarks

CASTEP

http://www.castep.org

The benchmark can be found at http://www.castep.org/CASTEP/DNA and was run using CASTEP 16.1.2, compiled according to the instructions at:

https://github.com/ARCHER-CSE/build-instructions/blob/master/CASTEP/build_castep_16.1.2_intel16_ivybrg.md .

The input parameter file, 'polyA20-no-wat.param' had the following additional keywords appended to the end:

%block devel_code
bandpar=8
%endblock devel_code

to enable 8-way band parallelism.

ARCHER Results

NodesProcesses per NodeThreads per ProcessCoresRuntime (s)
25046600016078
50046120008061
100046240003753
200046480002340

Cirrus Results

Cirrus is one of the EPSRC Tier-2 HPC facilities and is provided by EPCC. It is a SGI ICE XA system with 280 compute nodes. Each compute node has 2 18-core Intel Xeon (Broadwell) processors and 256 GiB RAM. Nodes are connected in by FDR Infiniband in a hypercube topology.

NodesProcesses per NodeThreads per ProcessCoresRuntime (s)
13566486039216
27066972027144

Thanks to UKCP and Phil Hasnip at the University of York for providing the benchmark.

CP2K

http://www.cp2k.org

Full instructions on compiling and running the benchmark can be found in:

ARCHER Results

NodesProcesses per NodeThreads per ProcessCoresRuntime (s)
642411536636.9
1282413074332.8
256466144194.4
5124612288120.7
1024462457687.7
2048464915267.0
40962129830465.8

Cirrus Results

Cirrus is one of the EPSRC Tier-2 HPC facilities and is provided by EPCC. It is a SGI ICE XA system with 280 compute nodes. Each compute node has 2 18-core Intel Xeon (Broadwell) processors and 256 GiB RAM. Nodes are connected in by FDR Infiniband in a hypercube topology.

NodesProcesses per NodeThreads per ProcessCoresRuntime (s)
141235043812
281821008940
56662016484
112664032304
224668064207

Thanks to CP2K-UK and Iain Bethune for providing the benchmark.

GROMACS

http://www.gromacs.org

The GROMACS benchmark used is not currently publicly available but we hope to be able to release it later in 2017.

ARCHER Results

NodesProcesses per NodeThreads per ProcessCoresPerformance (ns/day)
2241480.049
4241960.083
82411920.146
162413840.254
322417680.461
6424115360.622
12824130720.761
25624161440.457
512241122880.179

Cirrus Results

Cirrus is one of the EPSRC Tier-2 HPC facilities and is provided by EPCC. It is a SGI ICE XA system with 280 compute nodes. Each compute node has 2 18-core Intel Xeon (Broadwell) processors and 256 GiB RAM. Nodes are connected in by FDR Infiniband in a hypercube topology.

NodesProcesses per NodeThreads per ProcessCoresPerformance (ns/day)
43611440.118
83612880.210
163615760.349
3536112600.532
6436123040.647
7036125200.684
13536148600.679
18036164800.515

Thanks to HEC BioSim and Richard Sessions at the University of Bristol for providing the benchmark.

OpenSBLI

https://opensbli.readthedocs.io

The benchmark is a simulation of the Taylor-Green vortex on a 1024x1024x1024 grid. Source code and documentation for the benchmark can be found at:

The following input file was used for the ARCHER benchmark runs:

ss 1024 10 0

ARCHER Results

NodesProcesses per NodeThreads per ProcessCoresRuntime (s)Parallel Efficiency
5241120183.61.00
4024196098.70.93
256241614449.90.99
384241921625.40.98
5122411228813.20.96
640241153608.80.75
768241184324.70.94
896241215052.60.92

Thanks to UKTC and Satya Jammy at the University of Southampton for providing the benchmark.

OASIS3-MCT (Met Office UM/NEMO)

This benchmark case is still in preparation by NCAS.

Synthetic Benchmarks

Synthetic benchmark data will be added soon.