(target)=interim-service
Interim National HPC Service#
This page describes the nature of the service being put in place to enable researchers maintain access to HPC resources following the closure of the Kay supercomputer. It also highlights the differences between the two systems and services as well as the high level migration plan in operation to migrate researchers to the new system.
Background
ICHEC first presented its plan for the provision of compute sources from foreign sites (to be procured on a commercial basis) at its Board meeting of September 2022. These arrangements were deemed essential to ensure continuity of service to the research community.
After submitting a funding proposal to DFHERIS in order to purchase HPC compute services in May 2023, a tender was published in July seeking compute resources with a similar environment and user interface as that provided by Kay. Subsequently this contract was awarded in November to LuxProvide and their Meluxina HPC platform.
All National HPC Service projects will eventually be hosted on this platform with a gradual migration of existing projects from Kay to it occurring through the first half of 2024. The migration of Class A projects was initiated in mid-December, with Class B to follow in January 2024 and Class C beginning in February 2024. Kay will continue to run on an at-risk basis until June 2024 with a reduced number of compute nodes available. Here, at-risk means that because there is no longer any warranty or technical support covering the system by the hardware and software vendors, certain hardware component or software failures could potentially result in a sudden and permanent loss of service (as well as potentially loss of all data stored on the system).
Quickstart#
This provides some basic guidelines on getting started with the interim service on Meluxina, you should read this documentation first.
Setting up an Account and Access#
To become a user on Meluxina, you must:
Register an ICHEC user account here if you don’t already have one
Then you will need to do one of the following: i. Apply for your own National Service project (Paused until April 2024) ii. Join an existing project by requesting the PI of that project to contact the ICHEC helpdesk
ICHEC support staff then submit a request to LuxProvide to create a user account on Meluxina for you and to add you to the relevant project
You will receive an email from LuxProvide with your Meluxina username (which will be different to you ICHEC one) and instructions on completing the user account setup.
You will need to register a SSH key via the LuxProvide helpdesk in order to be able to log in to Meluxina
Kay vs Meluxina#
Meluxina is actually a tier-0 machine, and as such has bigger and faster nodes. It provides an opportunity to the users of our national service to access tier-0 system in the same project framework as their existing kay projects.
Here we provide some quick references for users transitioning to meluxina, please refere to Meluxina Docs for more details.
Quick Summary
Similar to kay, there is a login node for meluxina
login.lxp.lu
where youssh
to. One noteworthy difference is that meluxina uses port 8822 for ssh instead of the default port 22. So your ssh command would bessh -p 8822 username@login.lxp.lu
. If your ssh session to meluxina does not connect or hangs, it is likely that you are not specifying the correct port (8822) or that your local firewall is preventing outbound connections to this port.The meluxina login nodes do not have access to the centrally installed applications or modules and so to see what modules are available or to load modules and build your software you will need to launch an interactive slurm job.
You are assigned a (different) user name when your account is created on meluxina, so it’s good idea to use $USER variable in your script rather than using your username.
Resources are allocated based on node hours and each node has considerably more CPU cores and/or GPUs than kay. As a result, each node hour used is considerably more expensive (and powerful) than kay and so it is important that you fully use all cores or GPUs on the node. You might need to rethink in terms of domain decomposition, taskfarming or other hardcoded compute or memory sizing when transferring scripts from kay. Please contact us for help with performance optimisation.
Extensive documentation is available on Meluxina Documentation Site
After initial onboarding, all technical support on meluxina should be via the ICHEC Helpdesk.
Below is a Table of comparison between kay and meluxina machine, highlighting similarities and differences -
Specs/features |
kay |
meluxina |
---|---|---|
Standard CPU nodes |
Intel CPUs[1] 40 core, 192GB RAM, 400 GB SSD per node |
Amd CPUs[2] 128core, 512 GB RAM, No local on node |
Standard GPU nodes |
2 x (NVidia V100 cards, 16GB) per node |
4 x (Nvidia A100 cards, 40GB) per node |
Large Memory Nodes |
Intel CPUs 40 core, 1.5 TiB RAM |
AMD CPUs 128core, 4TiB RAM |
Hyperthreading |
No |
On by default, changable per job |
login access |
|
|
queue manager |
Slurm See link |
Slurm See link |
Resource Consumption unit |
CPU core hour |
CPU/GPU Node hour |
Slurm partitions |
DevQ, ProdQ, LongQ, GpuQ, … |
cpu, gpu, fpga, largemem |
Internet connectivity |
No internet connection from compute nodes |
Internet accessible from compute nodes |
Resource usage/allocation command |
mybalance / quota |
myquota |
Connecting to Meluxina#
SSH#
Note
All users must initially configure an SSH key to connect to Meluxina. Check out our tutorial - Setting up SSH Keys - for more details. As a new user, you will need to send your public key via the LuxProvide helpdesk as described in the onboarding email before you can connect.
Warning
Meluxina runs SSH on a non-standard port number 8822 for security reasons.
Your ssh client will need to explicitly specify port 8822 instead of relying on the default port 22. Many university firewalls restrict outgoing connections to all but a few standard port numbers and so you may need to request outbound access to login.lxp.lu on port 8822 in order to be able to connect.
Note
You will need to login using your Meluxina account username, not your ICHEC username.
Logging in#
You can connect to Meluxina using a SSH client:
Unix-like systems, such as Linux, macOS, the SSH clients are usually pre-installed. You can access these through the Terminal (or any other command line interface).
ssh -p 8822 $USER@login.lxp.lu
Where $USER
should be replaced by your username on Meluxina. For example, if your user name is u12345, then replace $USER
in the code examples with u12345
Note
For Windows, you need to install a SSH client. We recommend MobaXterm.
ssh tip
For several users, if may feel a bit tedious to provide the username and port information. One can ease this by defining the configurations for ssh to meluxina login node via following config in your ~/.ssh/config
file in your home directory on the computer you are going to login from.
If the file
~/.ssh/config
does not exist, create it, using emacs/vi or any choice of editor.Fill the following settings, replacing
$USER
with your username.
Host lxp
User $USER
HostName login.lxp.lu
Port 8822
Now save the file
~/.ssh/config
. Now when you ssh usingssh lxp
the ssh client will automatically read the username and the port from the config file, and you don’t need to provide it through command line.
Finding your Project Name and Resource Allocations#
The notion of project is very similar to that of Kay. Use the myquota command to list the node hour allocations and usage as well as disk storage quotas and usage for each of the projects you are a member of. Note that node hour allocations are made on a monthly basis. Projects names are of the form p200XXX.
File Storage#
Two locations are available for storing files on Meluxina
Home: /home/users/$USER
Work: /project/home/$ProjectName
Where $USER
is your username, and $ProjectName
is name of the project assigned to you. The purpose of both Home
and Work
storage is explained through the table below.
Name |
Home |
Work |
---|---|---|
Purpose |
Store personal file or source code |
Store simulation data |
Quota |
Smaller Limit: 100GB |
Larger Limit (see Class Projects Limits) |
Access |
Only to you |
To all users in the project |
Backup |
No |
No |
Data Transfer#
On Unix-like systems, you can use command line applications such as SCP or SFTP.
Using SCP
scp -P 8822 localFilePath $USER@login.lxp.lu:remoteFilePath
Using SFTP
sftp -P 8822 $USER@login.lxp.lu
put localFilePath
Tip
Note, that while using scp or sftp, the port option is given by captical -P
, not small -p
.
On Windows, you can use graphical applications such as WinSCP.
Software Packages#
Environment modules (or commonly known as modules) are used to organise all the software packages we provide on Meluxina including scientific applications, compilers and development libraries. To start using the package of interest, you should load the corresponding module.
Important
Note that modules and the associated application software are only accessible on compute nodes and not on the login nodes.
You can list all the available modules on the system by running an interactive compute jobs and using:
module avail
Note that this can take up to 30 seconds when run the first time on a compute node.
To load the module:
module load GCC/12.3.0
To check the loaded modules in your environment:
module list
Job Submission#
As is standard on most HPC systems a batch system (SLURM) is used to allocate compute nodes to users. Please refer to Slurm Workload Manager, and Slurm Commands pages for more detailed documentation.
To submit a job, create a submission script in a file named, say mybatchjob.sh
using emacs, nano or vim with the following text as an example (for a MPI application):
#!/bin/sh
#SBATCH --time=00:20:00
#SBATCH --nodes=1
#SBATCH -A {{myProjectID}}
#SBATCH --partition cpu
#SBATCH --qos default
#SBATCH --hint nomultithread
# load relevant modules. You can load multiple module depending on
# the dependency of your actual code
module load some_module_name
# run the actual command/program/executable
mpirun /path/to/mpi_executable
In the above script, the first line defines type of interpreter the script is using, which in this case is /bin/sh
shell. The lines following #SBATCH
are read by the SLURM scheduler as job submission parameters. They can, alternatively be also passed as command line arguments, such as sbatch --time=00:20:00 ...
. But writing them within the script is easier for record keeping, and makes the job submission quicker.
Submit the job using:
sbatch mybatchjob.sh
You can check the progress of any jobs you have submitted by using:
squeue
Important
Note the two arguments that you did not use while using Kay. Those are following -
--qos
which defines quality of service. It enables various modes of usage. Allowed values are one of the [dev
,test
,short
,short-preempt
,default
,long
,large
,urgent
].--hint
which accepts one of the two arugements:multithreaded
ornomultithreaded
. It defines whether the node allocated will have multithreadingon
oroff
. Withon
, you will see \(128x2=256\) logical cores, and withoff
you will see 128 logical cores. By default, it’son
.
For more details of these two options, and SLURM usage on Meluxina, see our slurm page.
If you get stuck, you can check our:
FAQs
Tutorials
Software Guides
Or alternatively, contact our Helpdesk.