Trillium to replace Niagara and Mist in 2025

Niagara went live for users in early 2018, and Mist in late 2019. They’ve served us well, running over 13 million jobs for over four thousand users. But the time has come to replace them.

Thanks to a $52M investment by the Digital Research Alliance of Canada and Ontario’s Ministry of Colleges and Universities, we’re pleased to announce that Niagara & Mist will be replaced early next year by a new cluster called Trillium. This cluster, built by Lenovo, will consist of:

  • 1,224 CPU compute nodes, each with two 96-core AMD EPYC “Zen5” CPUs.
  • 60 GPU compute nodes, each with 4 Nvidia H100 SXM 80GB GPUs and one 96-core AMD EPYC “Zen4” CPU.
  • All nodes will have 768 GiB of memory.
  • Nvidia “NDR” Infiniband network, with 400 Gbps of bandwidth per CPU compute node and 800 Gbps per GPU compute node.
  • 29 petabytes of all-flash storage from Vast Data.

In total, the Trillium cluster will have 235,008 CPU cores, almost triple the number of Niagara. The network will be fully non-blocking, meaning every node can talk to every other node at full bandwidth simultaneously. Each H100 GPU has roughly four times the FP64 performance of a V100, and the x86_64 CPU architecture of the GPU nodes will simplify software installation.

Hardware delivery starts later this year, and the new cluster will be available for users in the spring of 2025. To make room, half of Niagara will be decommissioned starting in December 2024 or January 2025. We’ll update you when we have a better idea of Trillium’s installation schedule.