Cost Tracking is essential to every single HPC workload, it’s important to be able to track costs per-user, per-project, per-slurm-partition and track these costs overtime.
There’s two different ways to setup cost tracking, each with their own tradeoffs:
We’re going to setup the requisite scripts to tag the instances, allowing users to submit jobs like:
sbatch --comment ProjectA
And administrators to generate reports like:
In order to allow the EC2 instances to modify tags, we need to create an IAM policy that allows tagging.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DeleteTags",
"ec2:DescribeTags",
"ec2:CreateTags"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"budgets:ViewBudget"
],
"Resource": "arn:aws:budgets::*:budget/*"
}
]
}
pclusterTagsAndBudget
In ParallelCluster UI, when you create the cluster, on the HeadNode section drop down the “Advanced Options”.
pclusterTagsAndBudget
you created above.On the ComputeFleet section drop down the “Advanced Options”.
pclusterTagsAndBudget
you created above.Next, review your config. It should look similar to the following:
Image:
Os: alinux2
HeadNode:
InstanceType: c5.2xlarge
Networking:
SubnetId: subnet-1234567
Ssh:
KeyName: keypair
CustomActions:
OnNodeConfigured:
Script: >-
https://raw.githubusercontent.com/sean-smith/pcluster-manager/cost-explorer/resources/scripts/cost-tags.sh
Iam:
AdditionalIamPolicies:
- Policy: arn:aws:iam::1234567890:policy/pclusterTagsAndBudget
- Policy: arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
- Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Dcv:
Enabled: true
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: cpu
Networking:
SubnetIds:
- subnet-1234567
PlacementGroup:
Enabled: true
ComputeResources:
- Name: cpu-hpc6a48xlarge
Instances:
- InstanceType: hpc6a.48xlarge
MinCount: 0
MaxCount: 100
Efa:
Enabled: true
CustomActions:
OnNodeConfigured:
Script: >-
https://raw.githubusercontent.com/sean-smith/pcluster-manager/cost-explorer/resources/scripts/cost-tags.sh
Iam:
AdditionalIamPolicies:
- Policy: arn:aws:iam::822857487308:policy/pclusterTagsAndBudget
Tags:
- Key: aws-parallelcluster-username
Value: NA
- Key: aws-parallelcluster-jobid
Value: NA
- Key: aws-parallelcluster-project
Value: NA
Region: us-east-2
To create a project, edit the file /opt/slurm/etc/projects_list.conf
:
ec2-user=ProjectA, ProjectB
userA=ProjectA, ProjectC
In this file you’ll find a list of users and the projects associated with them. When you submit a job it’ll require you to select a project from that file:
$ sbatch submit.sh
You need to specify a project. "--comment ProjectName"
$ sbatch --comment ProjectB submit.sh
Submitted batch job 5017
You’ll see the following tags get added to the job:
Tag | Description |
---|---|
aws-parallelcluster-username | user who submitted the job |
aws-parallelcluster-project | project name specified in --comment <project-name> |
aws-parallelcluster-jobid | the id of the submitted job |
parallelcluster:queue-name | The Slurm partition these jobs were submitted too. |
Budgets allow us to track specific project cost over time and get alerted if we’re about to hit a cap.
Budget Item | Description |
---|---|
Dimension | Tag |
Tag | aws-parallelcluster-project |
Project Name | Name of the project, corresponding to /opt/slurm/etc/projects_list.conf |
/opt/slurm/bin/sbatch
on the cluster and set budget="yes"
.#enable or disable the budget checks for the projects
budget="yes"
This will check the budget with the same name of the project before the user submits a job and make sure the budget hasn’t been exceeded. For example, if ProjectA
’s budget has been exceeded you’ll see:
sbatch -N 100 --comment ProjectA submit.sh
The Project ProjectA does not have more budget allocated for this month.