In this tutorial we will work through setting up Slurm Accounting. This is a pre-requisite for many features within Slurm, including job resource tracking and Slurm Federation. Starting in 3.3.0 Slurm accounting is setup directly in the ParallelCluster UI interface. This tutorial assumes you’re creating a cluster >= 3.3.0.
The first requirement is to setup an external database that Slurm can use to store the accounting data. Use the following CloudFormation Quick-Create link to create the database in your AWS account:
Deploy Accounting DatabaseChange the region in the upper right to create the database in a region separate from us-east-2
.
Change the following stack parameters:
The rest you can leave as default. Click create and wait ~30 minutes for the stack to create.
Once the stack creation has completed, go to the Outputs tab of the stack and make note of the properties as they will be used in the creation of your cluster:
Next, go to ParallelCluster UI and choose the Create option to create a new cluster.
Under the HeadNode section, you’ll find a section called Slurm Properties. Enter in information from the cloudformation stack outputs for Database, Username, and Password, you can use the following mapping to go from CloudFormation to ParallelCluster UI:
Parameter | CloudFormation Stack Output |
---|---|
Database | DatabaseHost:DatabasePort |
Username | DatabaseAdminUser |
Password | DatabaseSecretArn |
Additional Security Groups | DatabaseClientSecurityGroup |
Also on the HeadNode tab, under AdditionalSecurityGroups select the DatabaseClientSecurityGroup
security group output from CloudFormation:
After you’ve configured the HeadNode, Filesystem and Queues, you’ll be asked to review the config. Here’s an example of what the SlurmSettings section should look like:
SlurmSettings:
Database:
Uri: slurm-accounting-cluster.cluster-hash.us-east-2.rds.amazonaws.com:3306
UserName: clusteradmin
PasswordSecretArn: arn:aws:secretsmanager:us-east-2:123456789:secret:AccountingClusterAdminSecre-hash2
You’ll see a warning like: Cannot validate secret arn:aws:secretsmanager:us-east-2:1234567890:secret:AccountingClusterAdminSecret due to lack of permissions. Please refer to ParallelCluster official documentation for more information.
which can be safely ignored.
Once the cluster has been successfully created, we can submit a job to see that accounting is working properly.
sbatch --wrap 'sleep 30'
Once you’ve submitted a job, you can see the job information under the Accounting tab
You can use any of the filters at the top to narrow down the number of jobs in the view to select specific jobs.
If you choose the Job ID in the left column you can see further detials about the job.