The price of the extreme HPC-focused and AI computing Tesla P100 is certainly quite prohibitive. That and not everyone wants to upgrade their entire infrastructure just to accommodate the new Tesla. At the International Supercomputing Conference in Frankfurt, Germany, this week NVIDIA announced a PCIe add-in card based on the top-end Pascal P100 they announced just recently at GDC.
Pascal and the P100 is now even more accessible
The new cards will come in two flavors, with 16GB of HBM2 or 12GB of HBM2. Though these are a more traditional form-factor, they have all the benefits of the 15.3Bn transistor Pascal chip that they released at GDC. Underneath it’s a full featured chip with 3584 CUDA cores and half precision hardware inside to help accelerate AI and DNN type workflows. We actually use one in our testing which would likely make good use of this new card.
There’s some bad news, however. Those that are looking to spend copious amounts of money to play games with this just might be sorely disappointed. This is purely a compute card and there’s no rendering hardware to actually make images appear on a screen. So tough luck, though those that need it will likely enjoy having a normal PCIe-type card to employ. This opens up quite a few new use cases and it lowers the cost of entry, though we’re not entirely sure on the price of either of the new cards.
One of the large selling points that NVIDIA has with the Tesla P100 when in their DGX-1 server platform is the NVLink. These add-in cards don’t make use of that as they’re in the traditional PCIe add-in card form-factor. That isn’t to say that one or two won’t be more efficient than using last generations hardware, just that you won’t see the inter-card communication speeds you would with a full-fledged DGX-1 platform.
NVIDIA Tesla P100 Family |
|||
---|---|---|---|
Tesla P100 (DGX-1) | Tesla P100 (16GB) | Tesla P100 (12GB) | |
CUDA Cores | 3584 | 3584 | 3584 |
Core Clock | 1328MHz | ??? | ??? |
Boost Clock | 1480MHz | 1300MHz | 1300MHz |
Memory Bus Width | 4096-bit | 4096-bit | 3072-bit |
Memory Clock | 1.4Gbps HBM2 | 1.4Gbps HBM2 | 1.4Gbps HBM2 |
Memory Bandwidth | 720GB/s | 720GB/s | 540GB/s |
VRAM | 16GB | 16GB | 12GB |
Half Precision | 21.2 TFLOPS | 18.7TFLOPS | 18.7 TFLOPS |
Single Precision | 10.6 TFLOPS | 9.3 TFLOPS | 9.3 TFLOPS |
Double Precision | 5.3 TFLOPS | 4.7 TFLOPS | 4.7 TFLOPS |
TDP | 300W | 250W | 250W |
NVIDIA has also updated quite a bit of their software as well. DIGITS is up to version 4, GIE is being upgraded and cuDNN is getting a minor performance based update. The end result is better performance, better stability and much easier integration.