docker run -it --gpus=all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark Unable to find image 'nvcr.io/nvidia/k8s/cuda-sample:nbody' locally nbody: Pulling from nvidia/k8s/cuda-sample 22c5ef60a68e: Pull complete 1f37f461c076: Pull complete f65423f1b49b: Pull complete 548afb82c856: Pull complete e9bff09d04df: Pull complete 1939e4248814: Pull complete 207b64ab7ce6: Pull complete edc14edf1b04: Pull complete a424d45fd86f: Pull complete 9026fb14bf88: Pull complete 2b60900a3ea5: Pull complete Digest: sha256:59261e419d6d48a772aad5bb213f9f1588fcdb042b115ceb7166c89a51f03363 Status: Downloaded newer image for nvcr.io/nvidia/k8s/cuda-sample:nbody Run "nbody -benchmark [-numbodies=]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies= (number of bodies (>= 1) to run in simulation) -device= (where d=0,1,2.... for the CUDA device to use) -numdevices= (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "Pascal" with compute capability 6.1 > Compute 6.1 CUDA device: [NVIDIA GeForce GTX 1080] 20480 bodies, total time for 10 iterations: 15.479 ms = 270.971 billion interactions per second = 5419.423 single-precision GFLOP/s at 20 flops per interaction