Slurm node unexpectedly rebooted

Slurm

Posted by JXLIU on January 6, 2023

After the Slurm computing node is manually restarted, the management node will set the status of the computing node to DOWN

You can use the following command on the Slurm management node to restore the state of the computing node

1
scontrol update NodeName=nodename State=RESUME

Reference:

Slurm Troubleshooting: Nodes stuck in CG status

1
2
scontrol update nodename=node006 state=down reason=cg
scontrol update nodename=node006 state=resume

Reference: