Cluster 업데이트

Cluster 업데이트 가능한 부분

Cluster 업데이트가 가능한 부분과 불가능한 부분이 나뉘어 있습니다. 예를 들어, head node에 관련된 설정은 업데이트가 불가능하기 때문에 Cluster를 삭제하고 재생성해야 합니다.
Compute node 설정 변경, Shared file system 추가 등은 업데이트가 가능합니다. 이와 관련하여 업데이트 가능/불가능한 부분은 여기 정리되어 있습니다.
Cluster를 아주 쉽게 삭제, 수정, 생성할 수 있기 때문에 HPC를 탄력적으로 활용할 수가 있습니다.

Cluster 업데이트 방법

아래 예시에서는 shared file system (EFS)를 추가하는 예시입니다.

EFS와 같은 shared FS를 활용하여 head node, compute node 간에 파일을 공유할 수 있습니다.

먼저 compute fleet을 아래와 같이 stop 합니다.

pcluster update-compute-fleet --cluster-name hpc-cluster --status STOP_REQUESTED

compute fleet 상태를 cluster 상태 명령어로 확인이 가능합니다.

pcluster describe-cluster --cluster-name hpc-cluster

config 파일을 적절하게 변경합니다. 여기서는 EFS를 추가하였습니다.

Region: ap-northeast-2
Image:
  Os: alinux2
HeadNode:
  InstanceType: t3.large
  Networking:
    SubnetId: subnet-0aecd06bb2da5c5e7
  Ssh:
    KeyName: pcluster-head-key-pair
SharedStorage:
  - MountDir: /shared
    Name: efs-filesystem
    StorageType: Efs
Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: process-queue
    ComputeResources:
    - Name: t3xlarge
      Instances:
      - InstanceType: t3.xlarge
      MinCount: 0
      MaxCount: 5
    Networking:
      SubnetIds:
      - subnet-0aecd06bb2da5c5e7

업데이트 가능한지 dryrun 옵션으로 체크할 수 있습니다.

pcluster update-cluster --cluster-name hpc-cluster --cluster-configuration hpc-cluster-v2.yaml --dryrun true

이상이 없으면 실제로 업데이트를 하도록 합니다. 업데이트 과정은 설정에 따라 다르지만 일반적으로 5분 정도 소요됩니다.

pcluster update-cluster --cluster-name hpc-cluster --cluster-configuration hpc-cluster-v2.yaml

다시금 describe-cluster 명령어로 업데이트가 완료되었는지 확인해볼 수 있습니다.

pcluster describe-cluster --cluster-name hpc-cluster

업데이트가 완료되면 compute fleet을 다시 시작합니다.

pcluster update-compute-fleet --cluster-name hpc-cluster --status START_REQUESTED

클러스터 상태를 확인해 볼 수 있습니다.

pcluster describe-cluster --cluster-name hpc-cluster

EFS 사용 확인하기

head node에 접속합니다.

pcluster ssh --cluster-name hpc-cluster -i pcluster-node-key.pem

EFS 디렉토리에 파일을 생성한 후, 해당 파일을 compute node에서 잘 읽는지 체크해 봅니다.

vi /shared/efs-file.txt

Hello Efs!

vi efsjob.sh

#!/bin/bash
sleep 5
echo "EFS message : " $(cat /shared/efs-file.txt)
echo "Hello World from $(hostname) with EFS"

위의 파일을 efsjob.sh로 저장한 후 실행 해 봅니다.

sbatch efsjob.sh

결과 파일을 확인했을 때 EFS에 있는 파일 내용이 잘 나오면 성공입니다!

PreviousCluster 구축 NextCluster 설정 관련 팁

Last updated 2 years ago