Coprocessor architectures

Coprocessor architectures

Fundamental capabilities

from io import StringIO
import pandas
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')

data = StringIO("""
package,cores,lanes/core,clock (MHz),peak (GF),bandwidth (GB/s),TDP (W),MSRP
Xeon 8280,28,8,2700,2400,141,205,10000
NVIDIA V100,80,64,1455,7800,900,300,10664
AMD MI60,64,64,1800,7362,1024,300,
AMD Rome,64,4,2000,2048,205,200,6450
""")

df = pandas.read_csv(data, index_col='package')
df
cores lanes/core clock (MHz) peak (GF) bandwidth (GB/s) TDP (W) MSRP
package
Xeon 8280 28 8 2700 2400 141 205 10000.0
NVIDIA V100 80 64 1455 7800 900 300 10664.0
AMD MI60 64 64 1800 7362 1024 300 NaN
AMD Rome 64 4 2000 2048 205 200 6450.0

Amdahl's Law for energy efficiency

df['efficiency (GF/W)'] = df['peak (GF)'] / df['TDP (W)']
df['efficiency (GF/W)']
package
Xeon 8280      11.707317
NVIDIA V100    26.000000
AMD MI60       24.540000
AMD Rome       10.240000
Name: efficiency (GF/W), dtype: float64
ngpu = np.arange(0, 9)
overhead = 100 # Power supply, DRAM, disk, etc.
peak = (ngpu == 0)*df.loc['Xeon 8280']['peak (GF)'] + ngpu*df.loc['NVIDIA V100']['peak (GF)']
tdp = overhead + df.loc['Xeon 8280']['TDP (W)'] + ngpu*df.loc['NVIDIA V100']['TDP (W)']
plt.plot(ngpu, peak / tdp)
plt.xlabel('number of GPUs per CPU')
plt.title('DP Peak efficiency (GF/W)');

png

Compare to Green 500 list

  • #1 system is 15.1 GF/W (2x Xeon E5-2698v4, 8x V100)
  • #2 system (Summit) is 14.7 GF/W (2x Power9, 6x V100)
  • #27 system is 5.8 GF/W on Xeon 6248 (no GPU)

Amdahl for cost efficiency

df['cost (GF/$)'] = df['peak (GF)'] / df['MSRP']
df['cost (GF/$)']
package
Xeon 8280      0.240000
NVIDIA V100    0.731433
AMD MI60            NaN
AMD Rome       0.317519
Name: cost (GF/$), dtype: float64
overhead = 3000 + 2000*ngpu # power supply, memory, cooling, maintenance
cost = overhead + df.loc['Xeon 8280']['MSRP'] + ngpu*df.loc['NVIDIA V100']['MSRP']
plt.plot(ngpu, peak / cost)
plt.xlabel('number of GPUs per CPU')
plt.title('DP cost efficiency (GF/$)');

png

What fraction of datacenter cost goes to the power bill?

  • OLCF Summit is reportedly a \$200M machine.
  • What if we just buy the GPUs at retail?
    • 256 racks
    • 18 nodes per rack
    • 6 GPUs per node
    • V100 MSRP of about $10k
256 * 18 * 6 * 10e3 / 1e6 # millions
276.48
.0638 * 24 * 365
558.8879999999999

Programming models

#pragma acc data copy(A) create(Anew)
while ( error > tol  &&  iter  <  iter_max )  {
  error = 0.0;
#pragma acc kernels {
#pragma acc loop independent collapse(2)
  for (  int  j = 1; j < n-1;  j++ )  {
    for (  int  i = 1; i < m-1; i++ )  {
       Anew [j] [i] = 0.25 * ( A [j] [i+1] + A [j] [i-1] +
                                      A [j-1] [i] + A [j+1] [i]);
       error = max ( error, fabs (Anew [j] [i] - A [j] [i]));
      }
    }
  } 
}

Comparison slides: Is OpenMP 4.5 Target Off-load Ready for Real Life? A Case Study of Three Benchmark Kernels (2018)

Previous
Next