Cluster status Fri, 10 Jan 2025 07:42:01 +0000 report from beholder02
Resource usage (overall)
Resource usage (by namespace)
Ceph file system status
Etcd cluster status
Detailed network health report
nVidia Driver and GPU reports
Imp
Dretch
Belial
Fierna
Tiamat
Vecna
Asmodeus
Zariel
Demogorgon
Resource Requested Limit Allocatable Free
cpu (14%) 223.8 (22%) 363.5 1.6k 1.3k
├─ asmodeus (0%) 350.0m (0%) 100.0m 256.0 255.7
│ ├─ dnsutils-asmodeus 100.0m 100.0m
│ └─ kube-router-jzz5s 250.0m 0.0
├─ beholder01 (4%) 900.0m (0%) 100.0m 24.0 23.1
│ ├─ dnsutils-beholder01 100.0m 100.0m
│ ├─ kube-apiserver-beholder01 250.0m 0.0
│ ├─ kube-controller-manager-beholder01 200.0m 0.0
│ ├─ kube-router-9skrf 250.0m 0.0
│ └─ kube-scheduler-beholder01 100.0m 0.0
├─ beholder02 (8%) 1.9 (9%) 2.1 24.0 21.9
│ ├─ dnsutils-beholder02 100.0m 100.0m
│ ├─ kube-apiserver-beholder02 250.0m 0.0
│ ├─ kube-controller-manager-beholder02 200.0m 0.0
│ ├─ kube-router-bqwwt 250.0m 0.0
│ ├─ kube-scheduler-beholder02 100.0m 0.0
│ ├─ wordpress-8c8944c8d-cxbrq 500.0m 1.0
│ └─ wordpress-mariadb-565c547dc8-59fhr 500.0m 1.0
├─ beholder03 (4%) 900.0m (0%) 100.0m 24.0 23.1
│ ├─ dnsutils-beholder03 100.0m 100.0m
│ ├─ kube-apiserver-beholder03 250.0m 0.0
│ ├─ kube-controller-manager-beholder03 200.0m 0.0
│ ├─ kube-router-vv4gb 250.0m 0.0
│ └─ kube-scheduler-beholder03 100.0m 0.0
├─ belial (0%) 350.0m (0%) 100.0m 80.0 79.7
│ ├─ dnsutils-belial 100.0m 100.0m
│ └─ kube-router-2whgq 250.0m 0.0
├─ demogorgon (0%) 350.0m (0%) 100.0m 96.0 95.7
│ ├─ dnsutils-demogorgon 100.0m 100.0m
│ └─ kube-router-69fp9 250.0m 0.0
├─ fierna (0%) 350.0m (0%) 100.0m 80.0 79.7
│ ├─ dnsutils-fierna 100.0m 100.0m
│ └─ kube-router-9mwhj 250.0m 0.0
├─ kiaransalee (47%) 90.3 (73%) 140.1 192.0 51.9
│ ├─ bash-pod 8.0 8.0
│ ├─ dnsutils-kiaransalee 100.0m 100.0m
│ ├─ kube-router-jcv4r 250.0m 0.0
│ ├─ tgi70b-pod-565b6cc77c-d4v64 32.0 32.0
│ └─ ubuntu-gpu1 50.0 100.0
├─ lolth (1%) 450.0m (3%) 1.1 40.0 38.9
│ ├─ dnsutils-lolth 100.0m 100.0m
│ ├─ gatekeeper-controller-manager-66f474f785-czcr2 100.0m 1.0
│ └─ kube-router-l46ld 250.0m 0.0
├─ mindflayer01 (2%) 1.4 (2%) 1.1 64.0 62.6
│ ├─ dex-65cb7757f7-hwlbt 250.0m 0.0
│ ├─ dex-loginapp-fd64f94fc-v4sh9 100.0m 0.0
│ ├─ dnsutils-mindflayer01 100.0m 100.0m
│ ├─ kube-router-nlrkl 250.0m 0.0
│ ├─ mariadb-849498cb44-dldtm 500.0m 1.0
│ ├─ system-registry-8877cd57-gt8rg 100.0m 0.0
│ └─ user-registry-64bb8ff7cf-rvzbz 100.0m 0.0
├─ mindflayer02 (19%) 11.9 (28%) 18.1 64.0 45.9
│ ├─ cool-pod 10.0 16.0
│ ├─ dex-mysql-66b5885f7b-5h8hw 100.0m 0.0
│ ├─ dnsutils-mindflayer02 100.0m 100.0m
│ ├─ kube-router-g446k 250.0m 0.0
│ ├─ mediawiki-mariadb-5cc4866855-xffs4 250.0m 0.0
│ ├─ phpfpm-nginx-5db96d6895-97f2g 1.0 2.0
│ ├─ registry-65974f8f46-lbkhc 100.0m 0.0
│ └─ registry-auth-5c65dbf9dd-82qq7 100.0m 0.0
├─ mindflayer03 (1%) 800.0m (2%) 1.1 64.0 62.9
│ ├─ coredns-55cb58b774-4kpwk 100.0m 0.0
│ ├─ dnsutils-mindflayer03 100.0m 100.0m
│ ├─ gatekeeper-controller-manager-66f474f785-5gv4n 100.0m 1.0
│ ├─ kube-router-fjphj 250.0m 0.0
│ └─ ldap-6987986dbc-2zr2c 250.0m 0.0
├─ tiamat (0%) 1.1 (1%) 3.1 256.0 252.9
│ ├─ coredns-55cb58b774-tt74d 100.0m 0.0
│ ├─ dnsutils-tiamat 100.0m 100.0m
│ ├─ gatekeeper-audit-59d4b6fd4c-4qvjn 100.0m 1.0
│ ├─ gatekeeper-controller-manager-66f474f785-ndtdl 100.0m 1.0
│ ├─ kube-router-bxj9w 250.0m 0.0
│ └─ prometheus-deployment-b65d5d898-nfrw9 500.0m 1.0
├─ vecna (17%) 16.4 (104%) 100.1 96.0 0.0
│ ├─ dnsutils-vecna 100.0m 100.0m
│ ├─ felix-petersen-job-20 16.0 100.0
│ └─ kube-router-nzkdb 250.0m 0.0
└─ zariel (38%) 96.3 (38%) 96.1 256.0 159.7
├─ dnsutils-zariel 100.0m 100.0m
├─ julian-welzel-job-1 96.0 96.0
└─ kube-router-2crh9 250.0m 0.0
ephemeral-storage (0%) 0.0 (0%) 0.0 11.9T 11.9T
├─ asmodeus (0%) 0.0 (0%) 0.0 94.6G 94.6G
├─ beholder01 (0%) 0.0 (0%) 0.0 1.7T 1.7T
├─ beholder02 (0%) 0.0 (0%) 0.0 1.7T 1.7T
├─ beholder03 (0%) 0.0 (0%) 0.0 1.7T 1.7T
├─ belial (0%) 0.0 (0%) 0.0 189.2G 189.2G
├─ demogorgon (0%) 0.0 (0%) 0.0 706.7G 706.7G
├─ fierna (0%) 0.0 (0%) 0.0 189.2G 189.2G
├─ kiaransalee (0%) 0.0 (0%) 0.0 1.7T 1.7T
├─ lolth (0%) 0.0 (0%) 0.0 530.5G 530.5G
├─ mindflayer01 (0%) 0.0 (0%) 0.0 211.5G 211.5G
├─ mindflayer02 (0%) 0.0 (0%) 0.0 211.5G 211.5G
├─ mindflayer03 (0%) 0.0 (0%) 0.0 211.5G 211.5G
├─ tiamat (0%) 0.0 (0%) 0.0 164.4G 164.4G
├─ vecna (0%) 0.0 (0%) 0.0 849.0G 849.0G
└─ zariel (0%) 0.0 (0%) 0.0 1.7T 1.7T
memory (6%) 836.4G (21%) 2.7Ti 12.9Ti 10.2Ti
├─ asmodeus (0%) 350.0Mi (0%) 100.0Mi 2.0Ti 2.0Ti
│ ├─ dnsutils-asmodeus 100.0Mi 100.0Mi
│ └─ kube-router-jzz5s 250.0Mi 0.0
├─ beholder01 (0%) 350.0Mi (0%) 100.0Mi 92.9Gi 92.5Gi
│ ├─ dnsutils-beholder01 100.0Mi 100.0Mi
│ └─ kube-router-9skrf 250.0Mi 0.0
├─ beholder02 (1%) 862.0Mi (1%) 1.1Gi 92.9Gi 91.8Gi
│ ├─ dnsutils-beholder02 100.0Mi 100.0Mi
│ ├─ kube-router-bqwwt 250.0Mi 0.0
│ ├─ wordpress-8c8944c8d-cxbrq 256.0Mi 512.0Mi
│ └─ wordpress-mariadb-565c547dc8-59fhr 256.0Mi 512.0Mi
├─ beholder03 (0%) 350.0Mi (0%) 100.0Mi 92.9Gi 92.5Gi
│ ├─ dnsutils-beholder03 100.0Mi 100.0Mi
│ └─ kube-router-vv4gb 250.0Mi 0.0
├─ belial (0%) 350.0Mi (0%) 100.0Mi 754.4Gi 754.1Gi
│ ├─ dnsutils-belial 100.0Mi 100.0Mi
│ └─ kube-router-2whgq 250.0Mi 0.0
├─ demogorgon (0%) 350.0Mi (0%) 100.0Mi 2.0Ti 2.0Ti
│ ├─ dnsutils-demogorgon 100.0Mi 100.0Mi
│ └─ kube-router-69fp9 250.0Mi 0.0
├─ fierna (0%) 350.0Mi (0%) 100.0Mi 754.4Gi 754.1Gi
│ ├─ dnsutils-fierna 100.0Mi 100.0Mi
│ └─ kube-router-9mwhj 250.0Mi 0.0
├─ kiaransalee (34%) 544.8G (43%) 656.1Gi 1.5Ti 855.3Gi
│ ├─ bash-pod 16.0Gi 16.0Gi
│ ├─ dnsutils-kiaransalee 100.0Mi 100.0Mi
│ ├─ jupyter-elenasolar 1.1G 0.0
│ ├─ kube-router-jcv4r 250.0Mi 0.0
│ ├─ tgi70b-pod-565b6cc77c-d4v64 240.0Gi 240.0Gi
│ └─ ubuntu-gpu1 250.0Gi 400.0Gi
├─ lolth (0%) 606.0Mi (0%) 612.0Mi 251.4Gi 250.8Gi
│ ├─ dnsutils-lolth 100.0Mi 100.0Mi
│ ├─ gatekeeper-controller-manager-66f474f785-czcr2 256.0Mi 512.0Mi
│ └─ kube-router-l46ld 250.0Mi 0.0
├─ mindflayer01 (0%) 606.0Mi (0%) 612.0Mi 376.5Gi 375.9Gi
│ ├─ dnsutils-mindflayer01 100.0Mi 100.0Mi
│ ├─ kube-router-nlrkl 250.0Mi 0.0
│ └─ mariadb-849498cb44-dldtm 256.0Mi 512.0Mi
├─ mindflayer02 (17%) 64.8Gi (17%) 65.1Gi 376.5Gi 311.4Gi
│ ├─ cool-pod 64.0Gi 64.0Gi
│ ├─ dnsutils-mindflayer02 100.0Mi 100.0Mi
│ ├─ kube-router-g446k 250.0Mi 0.0
│ └─ phpfpm-nginx-5db96d6895-97f2g 512.0Mi 1.0Gi
├─ mindflayer03 (0%) 676.0Mi (0%) 782.0Mi 376.5Gi 375.7Gi
│ ├─ coredns-55cb58b774-4kpwk 70.0Mi 170.0Mi
│ ├─ dnsutils-mindflayer03 100.0Mi 100.0Mi
│ ├─ gatekeeper-controller-manager-66f474f785-5gv4n 256.0Mi 512.0Mi
│ └─ kube-router-fjphj 250.0Mi 0.0
├─ tiamat (0%) 1.5G (0%) 2.3Gi 1007.6Gi 1005.3Gi
│ ├─ coredns-55cb58b774-tt74d 70.0Mi 170.0Mi
│ ├─ dnsutils-tiamat 100.0Mi 100.0Mi
│ ├─ gatekeeper-audit-59d4b6fd4c-4qvjn 256.0Mi 512.0Mi
│ ├─ gatekeeper-controller-manager-66f474f785-ndtdl 256.0Mi 512.0Mi
│ ├─ kube-router-bxj9w 250.0Mi 0.0
│ └─ prometheus-deployment-b65d5d898-nfrw9 500.0M 1.0Gi
├─ vecna (7%) 100.3Gi (106%) 1.6Ti 1.5Ti 0.0
│ ├─ dnsutils-vecna 100.0Mi 100.0Mi
│ ├─ felix-petersen-job-20 100.0Gi 1.6Ti
│ └─ kube-router-nzkdb 250.0Mi 0.0
└─ zariel (5%) 100.3Gi (20%) 400.1Gi 2.0Ti 1.6Ti
├─ dnsutils-zariel 100.0Mi 100.0Mi
├─ julian-welzel-job-1 100.0Gi 400.0Gi
└─ kube-router-2crh9 250.0Mi 0.0
nvidia.com/gpu (41%) 26.0 (41%) 26.0 63.0 37.0
├─ asmodeus (0%) 0.0 (0%) 0.0 4.0 4.0
├─ belial (0%) 0.0 (0%) 0.0 8.0 8.0
├─ demogorgon (0%) 0.0 (0%) 0.0 8.0 8.0
├─ fierna (0%) 0.0 (0%) 0.0 8.0 8.0
├─ kiaransalee (86%) 6.0 (86%) 6.0 7.0 1.0
│ ├─ jupyter-elenasolar 1.0 1.0
│ ├─ tgi70b-pod-565b6cc77c-d4v64 4.0 4.0
│ └─ ubuntu-gpu1 1.0 1.0
├─ tiamat (0%) 0.0 (0%) 0.0 4.0 4.0
├─ vecna (100%) 16.0 (100%) 16.0 16.0 0.0
│ └─ felix-petersen-job-20 16.0 16.0
└─ zariel (50%) 4.0 (50%) 4.0 8.0 4.0
└─ julian-welzel-job-1 4.0 4.0
nvidia.com/mig-1g.10gb (0%) 0.0 (0%) 0.0 7.0 7.0
└─ kiaransalee (0%) 0.0 (0%) 0.0 7.0 7.0
pods (7%) 117.0 (7%) 117.0 1.6k 1.5k
├─ asmodeus (4%) 4.0 (4%) 4.0 110.0 106.0
│ ├─ dnsutils-asmodeus 1.0 1.0
│ ├─ kube-proxy-rbzdb 1.0 1.0
│ ├─ kube-router-jzz5s 1.0 1.0
│ └─ nvidia-device-plugin-daemonset-qlq9x 1.0 1.0
├─ beholder01 (5%) 6.0 (5%) 6.0 110.0 104.0
│ ├─ dnsutils-beholder01 1.0 1.0
│ ├─ kube-apiserver-beholder01 1.0 1.0
│ ├─ kube-controller-manager-beholder01 1.0 1.0
│ ├─ kube-proxy-7f5v4 1.0 1.0
│ ├─ kube-router-9skrf 1.0 1.0
│ └─ kube-scheduler-beholder01 1.0 1.0
├─ beholder02 (7%) 8.0 (7%) 8.0 110.0 102.0
│ ├─ dnsutils-beholder02 1.0 1.0
│ ├─ kube-apiserver-beholder02 1.0 1.0
│ ├─ kube-controller-manager-beholder02 1.0 1.0
│ ├─ kube-proxy-dlc5n 1.0 1.0
│ ├─ kube-router-bqwwt 1.0 1.0
│ ├─ kube-scheduler-beholder02 1.0 1.0
│ ├─ wordpress-8c8944c8d-cxbrq 1.0 1.0
│ └─ wordpress-mariadb-565c547dc8-59fhr 1.0 1.0
├─ beholder03 (5%) 6.0 (5%) 6.0 110.0 104.0
│ ├─ dnsutils-beholder03 1.0 1.0
│ ├─ kube-apiserver-beholder03 1.0 1.0
│ ├─ kube-controller-manager-beholder03 1.0 1.0
│ ├─ kube-proxy-jqcj7 1.0 1.0
│ ├─ kube-router-vv4gb 1.0 1.0
│ └─ kube-scheduler-beholder03 1.0 1.0
├─ belial (5%) 5.0 (5%) 5.0 110.0 105.0
│ ├─ dnsutils-belial 1.0 1.0
│ ├─ gpu-feature-discovery-5ft45 1.0 1.0
│ ├─ kube-proxy-nsltd 1.0 1.0
│ ├─ kube-router-2whgq 1.0 1.0
│ └─ nvidia-device-plugin-daemonset-sc992 1.0 1.0
├─ demogorgon (4%) 4.0 (4%) 4.0 110.0 106.0
│ ├─ dnsutils-demogorgon 1.0 1.0
│ ├─ kube-proxy-gknqq 1.0 1.0
│ ├─ kube-router-69fp9 1.0 1.0
│ └─ nvidia-device-plugin-daemonset-7w6d2 1.0 1.0
├─ fierna (5%) 5.0 (5%) 5.0 110.0 105.0
│ ├─ dnsutils-fierna 1.0 1.0
│ ├─ gpu-feature-discovery-dkzfn 1.0 1.0
│ ├─ kube-proxy-9tg4f 1.0 1.0
│ ├─ kube-router-9mwhj 1.0 1.0
│ └─ nvidia-device-plugin-daemonset-tvr8l 1.0 1.0
├─ kiaransalee (9%) 10.0 (9%) 10.0 110.0 100.0
│ ├─ bash-pod 1.0 1.0
│ ├─ continuous-image-puller-sb47g 1.0 1.0
│ ├─ dnsutils-kiaransalee 1.0 1.0
│ ├─ gpu-feature-discovery-2dqmt 1.0 1.0
│ ├─ jupyter-elenasolar 1.0 1.0
│ ├─ kube-proxy-65675 1.0 1.0
│ ├─ kube-router-jcv4r 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-tmgl5 1.0 1.0
│ ├─ tgi70b-pod-565b6cc77c-d4v64 1.0 1.0
│ └─ ubuntu-gpu1 1.0 1.0
├─ lolth (8%) 9.0 (8%) 9.0 110.0 101.0
│ ├─ dnsutils-lolth 1.0 1.0
│ ├─ echo1-77fbfb54d-hdspr 1.0 1.0
│ ├─ gatekeeper-controller-manager-66f474f785-czcr2 1.0 1.0
│ ├─ hub-57d6577bf9-5z5md 1.0 1.0
│ ├─ kube-proxy-cl9gc 1.0 1.0
│ ├─ kube-router-l46ld 1.0 1.0
│ ├─ local-path-provisioner-759479454f-8kh6v 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-xqrc5 1.0 1.0
│ └─ user-scheduler-c7db6c584-kcb95 1.0 1.0
├─ mindflayer01 (15%) 17.0 (15%) 17.0 110.0 93.0
│ ├─ dex-65cb7757f7-hwlbt 1.0 1.0
│ ├─ dex-loginapp-fd64f94fc-v4sh9 1.0 1.0
│ ├─ dnsutils-mindflayer01 1.0 1.0
│ ├─ kube-proxy-d2xl4 1.0 1.0
│ ├─ kube-router-nlrkl 1.0 1.0
│ ├─ mariadb-849498cb44-dldtm 1.0 1.0
│ ├─ memcached-578474d6f9-8dgb8 1.0 1.0
│ ├─ nginx-frontend-bc5b65957-s2gc2 1.0 1.0
│ ├─ nginx-ip-2023-78b9c84dbf-znhrx 1.0 1.0
│ ├─ nginx-k8s-7c8d949b5f-grwnx 1.0 1.0
│ ├─ nginx-rec-2023-79df8c77cd-gg87f 1.0 1.0
│ ├─ nginx-rsn-2024-7dbc6b668b-jbjk7 1.0 1.0
│ ├─ nginx-self-service-password-65b44f7547-dhvwp 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-c5mbb 1.0 1.0
│ ├─ pdf-b647544f-snvmm 1.0 1.0
│ ├─ system-registry-8877cd57-gt8rg 1.0 1.0
│ └─ user-registry-64bb8ff7cf-rvzbz 1.0 1.0
├─ mindflayer02 (9%) 10.0 (9%) 10.0 110.0 100.0
│ ├─ cool-pod 1.0 1.0
│ ├─ dex-mysql-66b5885f7b-5h8hw 1.0 1.0
│ ├─ dnsutils-mindflayer02 1.0 1.0
│ ├─ kube-proxy-zbwk4 1.0 1.0
│ ├─ kube-router-g446k 1.0 1.0
│ ├─ mediawiki-mariadb-5cc4866855-xffs4 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-l6hx2 1.0 1.0
│ ├─ phpfpm-nginx-5db96d6895-97f2g 1.0 1.0
│ ├─ registry-65974f8f46-lbkhc 1.0 1.0
│ └─ registry-auth-5c65dbf9dd-82qq7 1.0 1.0
├─ mindflayer03 (7%) 8.0 (7%) 8.0 110.0 102.0
│ ├─ coredns-55cb58b774-4kpwk 1.0 1.0
│ ├─ dnsutils-mindflayer03 1.0 1.0
│ ├─ echo1-77fbfb54d-rcwzd 1.0 1.0
│ ├─ gatekeeper-controller-manager-66f474f785-5gv4n 1.0 1.0
│ ├─ kube-proxy-b68xq 1.0 1.0
│ ├─ kube-router-fjphj 1.0 1.0
│ ├─ ldap-6987986dbc-2zr2c 1.0 1.0
│ └─ nvidia-device-plugin-daemonset-fnkln 1.0 1.0
├─ tiamat (14%) 15.0 (14%) 15.0 110.0 95.0
│ ├─ coredns-55cb58b774-tt74d 1.0 1.0
│ ├─ dnsutils-tiamat 1.0 1.0
│ ├─ echo1-77fbfb54d-67t9q 1.0 1.0
│ ├─ echo1-77fbfb54d-t79hv 1.0 1.0
│ ├─ echo2-5d58759df-2zc5c 1.0 1.0
│ ├─ gatekeeper-audit-59d4b6fd4c-4qvjn 1.0 1.0
│ ├─ gatekeeper-controller-manager-66f474f785-ndtdl 1.0 1.0
│ ├─ gpu-feature-discovery-kbcdl 1.0 1.0
│ ├─ kube-proxy-nbjvl 1.0 1.0
│ ├─ kube-router-bxj9w 1.0 1.0
│ ├─ kube-state-metrics-8945855d-hn5v2 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-8r88p 1.0 1.0
│ ├─ prometheus-deployment-b65d5d898-nfrw9 1.0 1.0
│ ├─ proxy-5bc89cc587-jvkxw 1.0 1.0
│ └─ user-scheduler-c7db6c584-k2cgj 1.0 1.0
├─ vecna (5%) 5.0 (5%) 5.0 110.0 105.0
│ ├─ dnsutils-vecna 1.0 1.0
│ ├─ felix-petersen-job-20 1.0 1.0
│ ├─ kube-proxy-djtx4 1.0 1.0
│ ├─ kube-router-nzkdb 1.0 1.0
│ └─ nvidia-device-plugin-daemonset-8zv6j 1.0 1.0
└─ zariel (5%) 5.0 (5%) 5.0 110.0 105.0
├─ dnsutils-zariel 1.0 1.0
├─ julian-welzel-job-1 1.0 1.0
├─ kube-proxy-c8frf 1.0 1.0
├─ kube-router-2crh9 1.0 1.0
└─ nvidia-device-plugin-daemonset-sz94v 1.0 1.0
Resource Requested Limit Allocatable Free
auth
├─ mindflayer01
│ ├─ cpu 350.0m 0.0
│ └─ pods 2.0 2.0
├─ mindflayer02
│ ├─ cpu 300.0m 0.0
│ └─ pods 3.0 3.0
└─ mindflayer03
├─ cpu 250.0m 0.0
└─ pods 1.0 1.0
frontend 4.0 4.0
├─ lolth 1.0 1.0
│ └─ pods 1.0 1.0
├─ mindflayer01 2.0 2.0
│ └─ pods 2.0 2.0
└─ tiamat 1.0 1.0
└─ pods 1.0 1.0
gatekeeper-system
├─ lolth
│ ├─ cpu 100.0m 1.0
│ ├─ memory 256.0Mi 512.0Mi
│ └─ pods 1.0 1.0
├─ mindflayer03
│ ├─ cpu 100.0m 1.0
│ ├─ memory 256.0Mi 512.0Mi
│ └─ pods 1.0 1.0
└─ tiamat
├─ cpu 200.0m 2.0
├─ memory 512.0Mi 1.0Gi
└─ pods 2.0 2.0
gcpr-vmv-2022
└─ beholder02
├─ cpu 1.0 2.0
├─ memory 512.0Mi 1.0Gi
└─ pods 2.0 2.0
jupyterhub
├─ kiaransalee
│ ├─ memory 1.1G 0.0
│ ├─ nvidia.com/gpu 1.0 1.0
│ └─ pods 2.0 2.0
├─ lolth 2.0 2.0
│ └─ pods 2.0 2.0
└─ tiamat 2.0 2.0
└─ pods 2.0 2.0
kube-system
├─ asmodeus
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ beholder01
│ ├─ cpu 900.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 6.0 6.0
├─ beholder02
│ ├─ cpu 900.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 6.0 6.0
├─ beholder03
│ ├─ cpu 900.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 6.0 6.0
├─ belial
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ demogorgon
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ fierna
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ kiaransalee
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ lolth
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ mindflayer01
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ mindflayer02
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ mindflayer03
│ ├─ cpu 450.0m 100.0m
│ ├─ memory 420.0Mi 270.0Mi
│ └─ pods 5.0 5.0
├─ tiamat
│ ├─ cpu 450.0m 100.0m
│ ├─ memory 420.0Mi 270.0Mi
│ └─ pods 6.0 6.0
├─ vecna
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
└─ zariel
├─ cpu 350.0m 100.0m
├─ memory 350.0Mi 100.0Mi
└─ pods 4.0 4.0
local-path-storage 1.0 1.0
└─ lolth 1.0 1.0
└─ pods 1.0 1.0
monitoring
└─ tiamat
├─ cpu 500.0m 1.0
├─ memory 500.0M 1.0Gi
└─ pods 2.0 2.0
registry
└─ mindflayer01
├─ cpu 200.0m 0.0
└─ pods 2.0 2.0
testing 3.0 3.0
├─ mindflayer03 1.0 1.0
│ └─ pods 1.0 1.0
└─ tiamat 2.0 2.0
└─ pods 2.0 2.0
user-alex-chan
└─ mindflayer02
├─ cpu 10.0 16.0
├─ memory 64.0Gi 64.0Gi
└─ pods 1.0 1.0
user-andri-rutschmann
└─ kiaransalee
├─ cpu 40.0 40.0
├─ memory 256.0Gi 256.0Gi
├─ nvidia.com/gpu 4.0 4.0
└─ pods 2.0 2.0
user-felix-petersen
└─ vecna
├─ cpu 16.0 100.0
├─ memory 100.0Gi 1.6Ti
├─ nvidia.com/gpu 16.0 16.0
└─ pods 1.0 1.0
user-julian-welzel
└─ zariel
├─ cpu 96.0 96.0
├─ memory 100.0Gi 400.0Gi
├─ nvidia.com/gpu 4.0 4.0
└─ pods 1.0 1.0
user-segun-aroyehun
└─ kiaransalee
├─ cpu 50.0 100.0
├─ memory 250.0Gi 400.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-v-time
├─ mindflayer01
│ ├─ cpu 500.0m 1.0
│ ├─ memory 256.0Mi 512.0Mi
│ └─ pods 1.0 1.0
└─ mindflayer02
├─ cpu 1.0 2.0
├─ memory 512.0Mi 1.0Gi
└─ pods 1.0 1.0
web
├─ mindflayer01 6.0 6.0
│ └─ pods 6.0 6.0
└─ mindflayer02
├─ cpu 250.0m 0.0
└─ pods 1.0 1.0
cluster:
id: 3fee6f38-ba9f-11ec-9328-e188936dcafd
health: HEALTH_WARN
1 daemons have recently crashed
services:
mon: 5 daemons, quorum beholder03,beholder01,beholder02,mindflayer02,mindflayer03 (age 5d)
mgr: beholder01.verxwn(active, since 5d), standbys: beholder02.akktmp, beholder03.nprqzk, mindflayer03.rzdvrr, mindflayer01.mkuopd, mindflayer02.ympgrs
mds: 4/4 daemons up, 2 standby
osd: 24 osds: 24 up (since 5d), 24 in (since 6M)
data:
volumes: 1/1 healthy
pools: 3 pools, 545 pgs
objects: 64.06M objects, 101 TiB
usage: 203 TiB used, 79 TiB / 282 TiB avail
pgs: 544 active+clean
1 active+clean+scrubbing+deep
io:
client: 33 KiB/s wr, 0 op/s rd, 3 op/s wr
HEALTH_WARN 1 daemons have recently crashed
[WRN] RECENT_CRASH: 1 daemons have recently crashed
osd.18 crashed on host mindflayer01 at 2025-01-04T10:25:35.528683Z
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.1.1:4252 | 7126e7a3a9cc42ca | 3.4.30 | 156 MB | true | false | 302051 | 532171904 | 532171904 | |
| 192.168.1.2:4252 | 39d72894bf6c7600 | 3.4.30 | 156 MB | false | false | 302051 | 532171904 | 532171904 | |
| 192.168.1.3:4252 | bbf4a2b99c3fd692 | 3.4.30 | 156 MB | false | false | 302051 | 532171904 | 532171904 | |
| 192.168.2.1:4252 | 5cb9997dd1c2246b | 3.3.25 | 156 MB | false | false | 302051 | 532171904 | 0 | |
| 192.168.2.3:4252 | cbc1cf89959ea4e | 3.3.25 | 156 MB | false | false | 302051 | 532171904 | 0 | |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
beholder01 |
SSH port open |
yes |
Report available |
yes |
External interface up |
ok |
Infiniband interface up |
ok |
API servers reachable |
1
2
3
4
|
Ceph monitors reachable |
1
2
3
|
cephfs mounted |
/cephfs |
local raid mounted |
/raid |
Test pod responding |
dnsutils-beholder01 |
Can reach kube-dns |
10.96.0.10 |
Pod can reach kube-dns |
yes |
Pod can reach internet |
yes |
|
beholder02 |
SSH port open |
yes |
Report available |
yes |
External interface up |
ok |
Infiniband interface up |
ok |
API servers reachable |
1
2
3
4
|
Ceph monitors reachable |
1
2
3
|
cephfs mounted |
/cephfs |
local raid mounted |
/raid |
Test pod responding |
dnsutils-beholder02 |
Can reach kube-dns |
10.96.0.10 |
Pod can reach kube-dns |
yes |
Pod can reach internet |
yes |
|
beholder03 |
SSH port open |
yes |
Report available |
yes |
External interface up |
ok |
Infiniband interface up |
ok |
API servers reachable |
1
2
3
4
|
Ceph monitors reachable |
1
2
3
|
cephfs mounted |
/cephfs |
local raid mounted |
/raid |
Test pod responding |
dnsutils-beholder03 |
Can reach kube-dns |
10.96.0.10 |
Pod can reach kube-dns |
yes |
Pod can reach internet |
yes |
|
lolth |
SSH port open |
yes |
Report available |
yes |
External interface up |
ok |
Infiniband interface up |
ok |
API servers reachable |
1
2
3
4
|
Ceph monitors reachable |
1
2
3
|
cephfs mounted |
/cephfs |
local raid mounted |
/raid |
Test pod responding |
dnsutils-lolth |
Can reach kube-dns |
10.96.0.10 |
Pod can reach kube-dns |
yes |
Pod can reach internet |
yes |
|
Fri Jan 10 07:42:47 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro RTX 6000 Off | 00000000:1B:00.0 Off | Off |
| 33% 32C P8 4W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro RTX 6000 Off | 00000000:1C:00.0 Off | Off |
| 33% 31C P8 11W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Quadro RTX 6000 Off | 00000000:1D:00.0 Off | Off |
| 33% 31C P8 4W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Quadro RTX 6000 Off | 00000000:1E:00.0 Off | Off |
| 33% 32C P8 4W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Quadro RTX 6000 Off | 00000000:3D:00.0 Off | Off |
| 33% 33C P8 14W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Quadro RTX 6000 Off | 00000000:3F:00.0 Off | Off |
| 33% 32C P8 6W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Quadro RTX 6000 Off | 00000000:40:00.0 Off | Off |
| 33% 32C P8 4W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Quadro RTX 6000 Off | 00000000:41:00.0 Off | Off |
| 33% 33C P8 4W / 260W | 2MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Fri Jan 10 07:42:49 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro RTX 6000 Off | 00000000:1B:00.0 Off | Off |
| 33% 31C P8 19W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro RTX 6000 Off | 00000000:1C:00.0 Off | Off |
| 33% 31C P8 16W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Quadro RTX 6000 Off | 00000000:1D:00.0 Off | Off |
| 33% 32C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Quadro RTX 6000 Off | 00000000:1E:00.0 Off | Off |
| 33% 31C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Quadro RTX 6000 Off | 00000000:3D:00.0 Off | Off |
| 33% 32C P8 15W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Quadro RTX 6000 Off | 00000000:3F:00.0 Off | Off |
| 33% 32C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Quadro RTX 6000 Off | 00000000:40:00.0 Off | Off |
| 33% 32C P8 12W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Quadro RTX 6000 Off | 00000000:41:00.0 Off | Off |
| 33% 33C P8 16W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Fri Jan 10 07:42:51 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:01:00.0 Off | 0 |
| N/A 30C P0 50W / 400W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-40GB On | 00000000:41:00.0 Off | 0 |
| N/A 31C P0 53W / 400W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-40GB On | 00000000:81:00.0 Off | 0 |
| N/A 30C P0 50W / 400W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-40GB On | 00000000:C1:00.0 Off | 0 |
| N/A 29C P0 51W / 400W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Fri Jan 10 08:42:54 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-SXM3-32GB On | 00000000:34:00.0 Off | 0 |
| N/A 49C P0 205W / 350W | 23938MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla V100-SXM3-32GB On | 00000000:36:00.0 Off | 0 |
| N/A 50C P0 207W / 350W | 23962MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla V100-SXM3-32GB On | 00000000:39:00.0 Off | 0 |
| N/A 59C P0 223W / 350W | 23962MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla V100-SXM3-32GB On | 00000000:3B:00.0 Off | 0 |
| N/A 62C P0 230W / 350W | 23842MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Tesla V100-SXM3-32GB On | 00000000:57:00.0 Off | 0 |
| N/A 52C P0 230W / 350W | 23940MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Tesla V100-SXM3-32GB On | 00000000:59:00.0 Off | 0 |
| N/A 67C P0 240W / 350W | 23964MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Tesla V100-SXM3-32GB On | 00000000:5C:00.0 Off | 0 |
| N/A 53C P0 220W / 350W | 23964MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Tesla V100-SXM3-32GB On | 00000000:5E:00.0 Off | 0 |
| N/A 66C P0 223W / 350W | 23844MiB / 32768MiB | 99% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 8 Tesla V100-SXM3-32GB On | 00000000:B7:00.0 Off | 0 |
| N/A 35C P0 50W / 350W | 1MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 9 Tesla V100-SXM3-32GB On | 00000000:B9:00.0 Off | 0 |
| N/A 35C P0 49W / 350W | 1MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 10 Tesla V100-SXM3-32GB On | 00000000:BC:00.0 Off | 0 |
| N/A 38C P0 52W / 350W | 1MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 11 Tesla V100-SXM3-32GB On | 00000000:BE:00.0 Off | 0 |
| N/A 39C P0 49W / 350W | 1MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 12 Tesla V100-SXM3-32GB On | 00000000:E0:00.0 Off | 0 |
| N/A 54C P0 226W / 350W | 23940MiB / 32768MiB | 97% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 13 Tesla V100-SXM3-32GB On | 00000000:E2:00.0 Off | 0 |
| N/A 53C P0 231W / 350W | 23964MiB / 32768MiB | 97% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 14 Tesla V100-SXM3-32GB On | 00000000:E5:00.0 Off | 0 |
| N/A 70C P0 257W / 350W | 23964MiB / 32768MiB | 97% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 15 Tesla V100-SXM3-32GB On | 00000000:E7:00.0 Off | 0 |
| N/A 67C P0 246W / 350W | 23844MiB / 32768MiB | 97% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 81514 C python 23932MiB |
| 1 N/A N/A 81562 C python 23956MiB |
| 2 N/A N/A 82038 C python 23956MiB |
| 3 N/A N/A 82505 C python 23836MiB |
| 4 N/A N/A 82834 C python 23932MiB |
| 5 N/A N/A 83008 C python 23956MiB |
| 6 N/A N/A 83123 C python 23956MiB |
| 7 N/A N/A 83657 C python 23836MiB |
| 12 N/A N/A 80878 C python 23932MiB |
| 13 N/A N/A 80655 C python 23956MiB |
| 14 N/A N/A 81097 C python 23956MiB |
| 15 N/A N/A 81249 C python 23836MiB |
+-----------------------------------------------------------------------------------------+
Fri Jan 10 07:42:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | 0 |
| N/A 31C P0 62W / 500W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:41:00.0 Off | 0 |
| N/A 33C P0 60W / 500W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:81:00.0 Off | 0 |
| N/A 31C P0 60W / 500W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:C1:00.0 Off | 0 |
| N/A 30C P0 61W / 500W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Fri Jan 10 08:42:58 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.03 Driver Version: 535.216.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:07:00.0 Off | 0 |
| N/A 34C P0 56W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-40GB On | 00000000:0F:00.0 Off | 0 |
| N/A 33C P0 53W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM4-40GB On | 00000000:47:00.0 Off | 0 |
| N/A 51C P0 218W / 400W | 19375MiB / 40960MiB | 99% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM4-40GB On | 00000000:4E:00.0 Off | 0 |
| N/A 54C P0 246W / 400W | 16647MiB / 40960MiB | 99% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM4-40GB On | 00000000:87:00.0 Off | 0 |
| N/A 58C P0 247W / 400W | 19375MiB / 40960MiB | 98% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM4-40GB On | 00000000:90:00.0 Off | 0 |
| N/A 57C P0 244W / 400W | 16647MiB / 40960MiB | 98% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM4-40GB On | 00000000:B7:00.0 Off | 0 |
| N/A 43C P0 53W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM4-40GB On | 00000000:BD:00.0 Off | 0 |
| N/A 43C P0 59W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 2 N/A N/A 1647322 C python 19366MiB |
| 3 N/A N/A 1709434 C python 16638MiB |
| 4 N/A N/A 1329388 C python 19366MiB |
| 5 N/A N/A 1660692 C python 16638MiB |
+---------------------------------------------------------------------------------------+
Fri Jan 10 07:43:02 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A40 Off | 00000000:01:00.0 Off | 0 |
| 0% 30C P8 15W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A40 Off | 00000000:25:00.0 Off | 0 |
| 0% 31C P8 24W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A40 Off | 00000000:41:00.0 Off | 0 |
| 0% 32C P8 24W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A40 Off | 00000000:61:00.0 Off | 0 |
| 0% 30C P8 15W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A40 Off | 00000000:81:00.0 Off | 0 |
| 0% 30C P8 15W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A40 Off | 00000000:A1:00.0 Off | 0 |
| 0% 30C P8 15W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A40 Off | 00000000:C1:00.0 Off | 0 |
| 0% 33C P8 24W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A40 Off | 00000000:E1:00.0 Off | 0 |
| 0% 33C P8 24W / 300W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Fri Jan 10 07:43:05 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:26:00.0 Off | 0 |
| N/A 39C P0 145W / 700W | 74602MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:2F:00.0 Off | 0 |
| N/A 48C P0 142W / 700W | 74602MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:46:00.0 Off | 0 |
| N/A 47C P0 143W / 700W | 74602MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:54:00.0 Off | 0 |
| N/A 41C P0 146W / 700W | 74602MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 Off | 00000000:A6:00.0 Off | 0 |
| N/A 33C P0 79W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 Off | 00000000:AF:00.0 Off | 0 |
| N/A 39C P0 83W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 Off | 00000000:C6:00.0 Off | 0 |
| N/A 43C P0 137W / 700W | 53294MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 Off | 00000000:CF:00.0 Off | On |
| N/A 33C P0 76W / 700W | 90MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 7 7 0 0 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 8 0 1 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 9 0 2 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 11 0 3 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 12 0 4 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 13 0 5 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 14 0 6 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3220045 C /opt/conda/bin/python3.11 74592MiB |
| 1 N/A N/A 3220043 C /opt/conda/bin/python3.11 74592MiB |
| 2 N/A N/A 3220044 C /opt/conda/bin/python3.11 74592MiB |
| 3 N/A N/A 3220047 C /opt/conda/bin/python3.11 74592MiB |
| 6 N/A N/A 3420301 C /opt/conda/bin/python 53284MiB |
+-----------------------------------------------------------------------------------------+