The cluster is specialised to run Machine Learning/Artificial Intelligence and HTC jobs.
The Lyra HPC cluster has an architecture designed to run Machine Learning/Artificial Intelligence and High Throughput Computing workloads. The cluster was funded by the Walloon Region (convention n° 1910167) with a budget of 700 K€. The cluster is composed of 23 compute nodes deployed as an hyperconverged infrastructure. Each node features the following specifications:
In total, researchers will benefit from XXX CPU cores, xxx CUDA cores, xxx TB RAM, xxx TB scratch and xxx TB home storage space. All the nodes are interconnected via 2x 50 GbE redundant links.
The physical servers are running 2 virtual machines, each featuring xxx cores, xxx GB RAM and one GPU. Each virtual machine (VM) corresponds to one virtual compute node. The VMs are interconnected by a virtual VXLAN-EVPN network. And the storage is managed by the CEPH filesystem solution, offering the storage spaces mounted in the VM. With some parameters tweaking, the virtual HPC cluster can run close to the maximal theoretical performances at the CPU, GPU and memory levels in synthetic benchmarks.
In addition, the virtual cluster is entirely managed via the Infrastructure-as-code approach. It can therefore be redeployed from scratch in a finger snap, and it can be adapted effortlessly, such as supporting multiple VM types to accommodate more heterogeneous workloads or offering temporary specialised working environments.
The Lyra cluster is hosted in the new ULB A6K datacentre. With two distinct power sources, each protected by UPS against short power cuts and by a power generator against long power cuts, the racks ensure continuous power supply. The highly efficient cooling system ensures constant cooling temperature no matter the workload on the cluster. Regarding security, all the possible protections have been deployed at the physical and logical levels.