Cluster status Wed, 04 Feb 2026 02:32:01 +0000 report from beholder01
Resource usage (overall)
Resource usage (by namespace)
Ceph file system status
Etcd cluster status
Detailed network health report
nVidia Driver and GPU reports
Imp
Dretch
Belial
Fierna
Tiamat
Vecna
Asmodeus
Zariel
Demogorgon
Resource Requested Limit Allocatable Free
cpu (61%) 963.2 (65%) 1.0k 1.6k 549.3
├─ asmodeus (52%) 132.3 (52%) 132.1 256.0 123.7
│ ├─ asmodeus-single-gpu-julian-schaefer-zimmermann 64.0 64.0
│ ├─ dnsutils-asmodeus 100.0m 100.0m
│ ├─ kube-router-5cjkf 250.0m 0.0
│ ├─ step16-coolchic-gpu-run-asmodeus-a100-hq 20.0 20.0
│ ├─ step18-ftic-gpu-run-asmodeus-a100-hq-40cpu 40.0 40.0
│ └─ ulas-bingoel-model-pod-5 8.0 8.0
├─ beholder01 (7%) 1.7 (0%) 100.0m 24.0 22.3
│ ├─ coredns-66bc5c9577-kjst9 100.0m 0.0
│ ├─ dex-69dddb47b7-pk5ks 250.0m 0.0
│ ├─ dex-loginapp-5f5974b54d-6dskk 100.0m 0.0
│ ├─ dex-mysql-589f4586bc-94f8p 100.0m 0.0
│ ├─ dnsutils-beholder01 100.0m 100.0m
│ ├─ kube-apiserver-beholder01 250.0m 0.0
│ ├─ kube-controller-manager-beholder01 200.0m 0.0
│ ├─ kube-router-4zkq2 250.0m 0.0
│ ├─ kube-scheduler-beholder01 100.0m 0.0
│ └─ mediawiki-mariadb-7ffb6c9b8d-g87lk 250.0m 0.0
├─ beholder02 (5%) 1.1 (5%) 1.1 24.0 22.9
│ ├─ dnsutils-beholder02 100.0m 100.0m
│ ├─ kube-apiserver-beholder02 250.0m 0.0
│ ├─ kube-controller-manager-beholder02 200.0m 0.0
│ ├─ kube-router-6mwfx 250.0m 0.0
│ ├─ kube-scheduler-beholder02 100.0m 0.0
│ └─ nginx-rsn-2024-57d49484d-47rfc 250.0m 1.0
├─ beholder03 (5%) 1.2 (2%) 400.0m 24.0 22.8
│ ├─ dnsutils-beholder03 100.0m 100.0m
│ ├─ kube-apiserver-beholder03 250.0m 0.0
│ ├─ kube-controller-manager-beholder03 200.0m 0.0
│ ├─ kube-router-jnkdl 250.0m 0.0
│ ├─ kube-scheduler-beholder03 100.0m 0.0
│ ├─ ldap-67b47cf9b9-v6vvm 250.0m 0.0
│ └─ nfd-master-6589cf6d4c-9xw6v 100.0m 300.0m
├─ belial (90%) 72.1 (91%) 73.1 80.0 6.9
│ ├─ cool-pod 2.0 2.0
│ ├─ coredns-66bc5c9577-kxlwp 100.0m 0.0
│ ├─ dnsutils-belial 100.0m 100.0m
│ ├─ gatekeeper-controller-manager-66f474f785-bq2vs 100.0m 1.0
│ ├─ kube-router-2hmkx 250.0m 0.0
│ ├─ nfd-gc-7b6c64c4b8-gc2k5 10.0m 20.0m
│ ├─ ollama-pod 1.0 1.0
│ ├─ prometheus-deployment-b65d5d898-q55j8 500.0m 1.0
│ ├─ pytorch-pod 1.0 1.0
│ ├─ till-aust-baseline-arrowhead-job-pv7w7 32.0 32.0
│ ├─ till-aust-baseline-car-job-x7tlg 32.0 32.0
│ ├─ till-aust-ubuntu-entry-pod 1.0 1.0
│ └─ valentin-schmuker-storage 2.0 2.0
├─ demogorgon (84%) 80.3 (116%) 111.1 96.0 0.0
│ ├─ a2v2-single-gpu-jcsz 8.0 8.0
│ ├─ demogorgon-a2v2 15.0 15.0
│ ├─ dnsutils-demogorgon 100.0m 100.0m
│ ├─ gpu-demogorgon 48.0 48.0
│ ├─ kube-router-v72jn 250.0m 0.0
│ ├─ pycharm 1.0 32.0
│ └─ ulas-bingoel-model-pod-3 8.0 8.0
├─ fierna (82%) 65.5 (84%) 67.1 80.0 12.9
│ ├─ dnsutils-fierna 100.0m 100.0m
│ ├─ gatekeeper-controller-manager-66f474f785-dvhjf 100.0m 1.0
│ ├─ kube-router-6p89p 250.0m 0.0
│ ├─ ledavio-text-search-5d8f755795-cndsv 1.0 2.0
│ ├─ till-aust-baseline-adiac-job-66j56 32.0 32.0
│ └─ till-aust-baseline-beef-job-ts4jw 32.0 32.0
├─ kiaransalee (26%) 49.4 (39%) 74.1 192.0 117.9
│ ├─ bash-pod 12.0 12.0
│ ├─ dnsutils-kiaransalee 100.0m 100.0m
│ ├─ kube-router-s5qjv 250.0m 0.0
│ ├─ tgillm-pod-7cd979b759-wgxjg 12.0 12.0
│ └─ ubuntu-gpu1 25.0 50.0
├─ mindflayer01 (3%) 1.9 (6%) 4.1 64.0 59.9
│ ├─ dnsutils-mindflayer01 100.0m 100.0m
│ ├─ gatekeeper-controller-manager-66f474f785-s8b5w 100.0m 1.0
│ ├─ kube-router-w8xhq 250.0m 0.0
│ ├─ mariadb-849498cb44-kjj5q 500.0m 1.0
│ └─ phpfpm-nginx-5db96d6895-vwwf5 1.0 2.0
├─ mindflayer02 (1%) 900.0m (3%) 2.1 64.0 61.9
│ ├─ dnsutils-mindflayer02 100.0m 100.0m
│ ├─ gatekeeper-audit-59d4b6fd4c-gtwjm 100.0m 1.0
│ ├─ kube-router-sv7f6 250.0m 0.0
│ ├─ mediawiki-77f9c84df5-p6k9g 250.0m 0.0
│ ├─ registry-auth-bd4bb7d8b-cdjnd 100.0m 0.0
│ └─ ubuntu-test-pod 100.0m 1.0
├─ mindflayer03 (1%) 650.0m (0%) 100.0m 64.0 63.4
│ ├─ dnsutils-mindflayer03 100.0m 100.0m
│ ├─ kube-router-rzv5h 250.0m 0.0
│ ├─ registry-7495ddbf59-vvf2x 100.0m 0.0
│ └─ registry-browser-7f4cbdf96b-5x2mv 200.0m 0.0
├─ tiamat (100%) 255.3 (100%) 255.1 256.0 650.0m
│ ├─ a2v-pt-ft-hyena-large-julian-zimmermann 255.0 255.0
│ ├─ dnsutils-tiamat 100.0m 100.0m
│ └─ kube-router-g92j5 250.0m 0.0
├─ vecna (52%) 50.4 (58%) 56.1 96.0 39.9
│ ├─ dnsutils-vecna 100.0m 100.0m
│ ├─ kube-router-4gqqj 250.0m 0.0
│ ├─ train-maap-qln5k 10.0 16.0
│ ├─ ulas-bingoel-model-pod-4 8.0 8.0
│ ├─ valentin-schmuker-mvp-seg-full 16.0 16.0
│ └─ valentin-schmuker-mvp-seg-image 16.0 16.0
└─ zariel (98%) 250.3 (98%) 250.1 256.0 5.7
├─ dnsutils-zariel 100.0m 100.0m
├─ kube-router-fmt6c 250.0m 0.0
└─ zariel-a2v2-pt-ft-jcsz 250.0 250.0
ephemeral-storage (1%) 55.0Gi (1%) 95.0Gi 11.3T 11.2T
├─ asmodeus (0%) 0.0 (0%) 0.0 94.6G 94.6G
├─ beholder01 (0%) 0.0 (0%) 0.0 1.7T 1.7T
├─ beholder02 (0%) 0.0 (0%) 0.0 1.7T 1.7T
├─ beholder03 (0%) 0.0 (0%) 0.0 1.7T 1.7T
├─ belial (0%) 0.0 (0%) 0.0 189.2G 189.2G
├─ demogorgon (2%) 15.0Gi (5%) 30.0Gi 706.7G 674.5G
│ └─ pycharm 15.0Gi 30.0Gi
├─ fierna (0%) 0.0 (0%) 0.0 189.2G 189.2G
├─ kiaransalee (1%) 20.0Gi (3%) 40.0Gi 1.7T 1.7T
│ └─ ubuntu-gpu1 20.0Gi 40.0Gi
├─ mindflayer01 (0%) 0.0 (0%) 0.0 211.5G 211.5G
├─ mindflayer02 (0%) 0.0 (0%) 0.0 211.5G 211.5G
├─ mindflayer03 (0%) 0.0 (0%) 0.0 211.5G 211.5G
├─ tiamat (0%) 0.0 (0%) 0.0 164.4G 164.4G
├─ vecna (3%) 20.0Gi (3%) 25.0Gi 849.0G 822.1G
│ └─ train-maap-qln5k 20.0Gi 25.0Gi
└─ zariel (0%) 0.0 (0%) 0.0 1.7T 1.7T
memory (41%) 5.7T (43%) 5.4Ti 12.7Ti 7.3Ti
├─ asmodeus (38%) 752.3Gi (38%) 752.1Gi 2.0Ti 1.2Ti
│ ├─ asmodeus-single-gpu-julian-schaefer-zimmermann 512.0Gi 512.0Gi
│ ├─ dnsutils-asmodeus 100.0Mi 100.0Mi
│ ├─ kube-router-5cjkf 250.0Mi 0.0
│ ├─ step16-coolchic-gpu-run-asmodeus-a100-hq 100.0Gi 100.0Gi
│ ├─ step18-ftic-gpu-run-asmodeus-a100-hq-40cpu 100.0Gi 100.0Gi
│ └─ ulas-bingoel-model-pod-5 40.0Gi 40.0Gi
├─ beholder01 (0%) 420.0Mi (0%) 270.0Mi 92.9Gi 92.5Gi
│ ├─ coredns-66bc5c9577-kjst9 70.0Mi 170.0Mi
│ ├─ dnsutils-beholder01 100.0Mi 100.0Mi
│ └─ kube-router-4zkq2 250.0Mi 0.0
├─ beholder02 (0%) 350.0Mi (0%) 100.0Mi 92.9Gi 92.5Gi
│ ├─ dnsutils-beholder02 100.0Mi 100.0Mi
│ └─ kube-router-6mwfx 250.0Mi 0.0
├─ beholder03 (1%) 478.0Mi (4%) 4.1Gi 92.9Gi 88.8Gi
│ ├─ dnsutils-beholder03 100.0Mi 100.0Mi
│ ├─ kube-router-jnkdl 250.0Mi 0.0
│ └─ nfd-master-6589cf6d4c-9xw6v 128.0Mi 4.0Gi
├─ belial (38%) 310.6G (44%) 330.8Gi 754.4Gi 423.7Gi
│ ├─ cool-pod 32.0Gi 32.0Gi
│ ├─ coredns-66bc5c9577-kxlwp 70.0Mi 170.0Mi
│ ├─ dnsutils-belial 100.0Mi 100.0Mi
│ ├─ gatekeeper-controller-manager-66f474f785-bq2vs 256.0Mi 512.0Mi
│ ├─ kube-router-2hmkx 250.0Mi 0.0
│ ├─ nfd-gc-7b6c64c4b8-gc2k5 128.0Mi 1.0Gi
│ ├─ ollama-pod 32.0Gi 32.0Gi
│ ├─ prometheus-deployment-b65d5d898-q55j8 500.0M 1.0Gi
│ ├─ pytorch-pod 12.0Gi 12.0Gi
│ ├─ till-aust-baseline-arrowhead-job-pv7w7 100.0Gi 120.0Gi
│ ├─ till-aust-baseline-car-job-x7tlg 100.0Gi 120.0Gi
│ ├─ till-aust-ubuntu-entry-pod 10.0Gi 10.0Gi
│ └─ valentin-schmuker-storage 2.0Gi 2.0Gi
├─ demogorgon (38%) 771.3Gi (40%) 793.1Gi 2.0Ti 1.2Ti
│ ├─ a2v2-single-gpu-jcsz 256.0Gi 256.0Gi
│ ├─ demogorgon-a2v2 395.0Gi 395.0Gi
│ ├─ dnsutils-demogorgon 100.0Mi 100.0Mi
│ ├─ gpu-demogorgon 80.0Gi 80.0Gi
│ ├─ kube-router-v72jn 250.0Mi 0.0
│ ├─ pycharm 10.0Gi 32.0Gi
│ └─ ulas-bingoel-model-pod-3 30.0Gi 30.0Gi
├─ fierna (31%) 232.6Gi (36%) 272.6Gi 754.4Gi 481.8Gi
│ ├─ dnsutils-fierna 100.0Mi 100.0Mi
│ ├─ gatekeeper-controller-manager-66f474f785-dvhjf 256.0Mi 512.0Mi
│ ├─ kube-router-6p89p 250.0Mi 0.0
│ ├─ ledavio-text-search-5d8f755795-cndsv 32.0Gi 32.0Gi
│ ├─ till-aust-baseline-adiac-job-66j56 100.0Gi 120.0Gi
│ └─ till-aust-baseline-beef-job-ts4jw 100.0Gi 120.0Gi
├─ kiaransalee (13%) 214.0G (16%) 246.1Gi 1.5Ti 1.2Ti
│ ├─ bash-pod 16.0Gi 16.0Gi
│ ├─ dnsutils-kiaransalee 100.0Mi 100.0Mi
│ ├─ jupyter-leo-2dtiedemann 1.1G 0.0
│ ├─ jupyter-matheus-2dstefanini-2dmariano 1.1G 0.0
│ ├─ jupyter-saroyehun 1.1G 0.0
│ ├─ kube-router-s5qjv 250.0Mi 0.0
│ ├─ tgillm-pod-7cd979b759-wgxjg 80.0Gi 80.0Gi
│ └─ ubuntu-gpu1 100.0Gi 150.0Gi
├─ mindflayer01 (0%) 1.3Gi (1%) 2.1Gi 376.5Gi 374.4Gi
│ ├─ dnsutils-mindflayer01 100.0Mi 100.0Mi
│ ├─ gatekeeper-controller-manager-66f474f785-s8b5w 256.0Mi 512.0Mi
│ ├─ kube-router-w8xhq 250.0Mi 0.0
│ ├─ mariadb-849498cb44-kjj5q 256.0Mi 512.0Mi
│ └─ phpfpm-nginx-5db96d6895-vwwf5 512.0Mi 1.0Gi
├─ mindflayer02 (0%) 706.0Mi (0%) 1.6Gi 376.5Gi 374.9Gi
│ ├─ dnsutils-mindflayer02 100.0Mi 100.0Mi
│ ├─ gatekeeper-audit-59d4b6fd4c-gtwjm 256.0Mi 512.0Mi
│ ├─ kube-router-sv7f6 250.0Mi 0.0
│ └─ ubuntu-test-pod 100.0Mi 1.0Gi
├─ mindflayer03 (0%) 350.0Mi (0%) 100.0Mi 376.5Gi 376.1Gi
│ ├─ dnsutils-mindflayer03 100.0Mi 100.0Mi
│ └─ kube-router-rzv5h 250.0Mi 0.0
├─ tiamat (89%) 896.3Gi (89%) 896.1Gi 1007.6Gi 111.2Gi
│ ├─ a2v-pt-ft-hyena-large-julian-zimmermann 896.0Gi 896.0Gi
│ ├─ dnsutils-tiamat 100.0Mi 100.0Mi
│ └─ kube-router-g92j5 250.0Mi 0.0
├─ vecna (15%) 222.3Gi (21%) 318.1Gi 1.5Ti 1.2Ti
│ ├─ dnsutils-vecna 100.0Mi 100.0Mi
│ ├─ kube-router-4gqqj 250.0Mi 0.0
│ ├─ train-maap-qln5k 64.0Gi 160.0Gi
│ ├─ ulas-bingoel-model-pod-4 30.0Gi 30.0Gi
│ ├─ valentin-schmuker-mvp-seg-full 64.0Gi 64.0Gi
│ └─ valentin-schmuker-mvp-seg-image 64.0Gi 64.0Gi
└─ zariel (95%) 1.9Ti (95%) 1.9Ti 2.0Ti 95.2Gi
├─ dnsutils-zariel 100.0Mi 100.0Mi
├─ kube-router-fmt6c 250.0Mi 0.0
└─ zariel-a2v2-pt-ft-jcsz 1.9Ti 1.9Ti
nvidia.com/gpu (74%) 46.0 (74%) 46.0 62.0 16.0
├─ asmodeus (100%) 4.0 (100%) 4.0 4.0 0.0
│ ├─ asmodeus-single-gpu-julian-schaefer-zimmermann 1.0 1.0
│ ├─ step16-coolchic-gpu-run-asmodeus-a100-hq 1.0 1.0
│ ├─ step18-ftic-gpu-run-asmodeus-a100-hq-40cpu 1.0 1.0
│ └─ ulas-bingoel-model-pod-5 1.0 1.0
├─ belial (62%) 5.0 (62%) 5.0 8.0 3.0
│ ├─ ollama-pod 1.0 1.0
│ ├─ pytorch-pod 2.0 2.0
│ ├─ till-aust-baseline-arrowhead-job-pv7w7 1.0 1.0
│ └─ till-aust-baseline-car-job-x7tlg 1.0 1.0
├─ demogorgon (100%) 8.0 (100%) 8.0 8.0 0.0
│ ├─ a2v2-single-gpu-jcsz 1.0 1.0
│ ├─ demogorgon-a2v2 4.0 4.0
│ ├─ gpu-demogorgon 1.0 1.0
│ ├─ pycharm 1.0 1.0
│ └─ ulas-bingoel-model-pod-3 1.0 1.0
├─ fierna (38%) 3.0 (38%) 3.0 8.0 5.0
│ ├─ ledavio-text-search-5d8f755795-cndsv 1.0 1.0
│ ├─ till-aust-baseline-adiac-job-66j56 1.0 1.0
│ └─ till-aust-baseline-beef-job-ts4jw 1.0 1.0
├─ kiaransalee (67%) 4.0 (67%) 4.0 6.0 2.0
│ ├─ jupyter-leo-2dtiedemann 1.0 1.0
│ ├─ jupyter-saroyehun 1.0 1.0
│ ├─ tgillm-pod-7cd979b759-wgxjg 1.0 1.0
│ └─ ubuntu-gpu1 1.0 1.0
├─ tiamat (100%) 4.0 (100%) 4.0 4.0 0.0
│ └─ a2v-pt-ft-hyena-large-julian-zimmermann 4.0 4.0
├─ vecna (62%) 10.0 (62%) 10.0 16.0 6.0
│ ├─ train-maap-qln5k 1.0 1.0
│ ├─ ulas-bingoel-model-pod-4 1.0 1.0
│ ├─ valentin-schmuker-mvp-seg-full 4.0 4.0
│ └─ valentin-schmuker-mvp-seg-image 4.0 4.0
└─ zariel (100%) 8.0 (100%) 8.0 8.0 0.0
└─ zariel-a2v2-pt-ft-jcsz 8.0 8.0
nvidia.com/mig-1g.10gb (0%) 0.0 (0%) 0.0 7.0 7.0
└─ kiaransalee (0%) 0.0 (0%) 0.0 7.0 7.0
nvidia.com/mig-3g.40gb (0%) 0.0 (0%) 0.0 1.0 1.0
└─ kiaransalee (0%) 0.0 (0%) 0.0 1.0 1.0
nvidia.com/mig-4g.40gb (100%) 1.0 (100%) 1.0 1.0 0.0
└─ kiaransalee (100%) 1.0 (100%) 1.0 1.0 0.0
└─ jupyter-matheus-2dstefanini-2dmariano 1.0 1.0
pods (10%) 149.0 (10%) 149.0 1.5k 1.4k
├─ asmodeus (8%) 9.0 (8%) 9.0 110.0 101.0
│ ├─ asmodeus-single-gpu-julian-schaefer-zimmermann 1.0 1.0
│ ├─ dnsutils-asmodeus 1.0 1.0
│ ├─ gpu-feature-discovery-9wxd8 1.0 1.0
│ ├─ kube-proxy-kxtvf 1.0 1.0
│ ├─ kube-router-5cjkf 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-pgtkn 1.0 1.0
│ ├─ step16-coolchic-gpu-run-asmodeus-a100-hq 1.0 1.0
│ ├─ step18-ftic-gpu-run-asmodeus-a100-hq-40cpu 1.0 1.0
│ └─ ulas-bingoel-model-pod-5 1.0 1.0
├─ beholder01 (12%) 13.0 (12%) 13.0 110.0 97.0
│ ├─ coredns-66bc5c9577-kjst9 1.0 1.0
│ ├─ dex-69dddb47b7-pk5ks 1.0 1.0
│ ├─ dex-loginapp-5f5974b54d-6dskk 1.0 1.0
│ ├─ dex-mysql-589f4586bc-94f8p 1.0 1.0
│ ├─ dnsutils-beholder01 1.0 1.0
│ ├─ kube-apiserver-beholder01 1.0 1.0
│ ├─ kube-controller-manager-beholder01 1.0 1.0
│ ├─ kube-proxy-6tptz 1.0 1.0
│ ├─ kube-router-4zkq2 1.0 1.0
│ ├─ kube-scheduler-beholder01 1.0 1.0
│ ├─ mediawiki-mariadb-7ffb6c9b8d-g87lk 1.0 1.0
│ ├─ nginx-k8s-5889449f8b-6p8j7 1.0 1.0
│ └─ pdf-55ccd6f459-mnq72 1.0 1.0
├─ beholder02 (8%) 9.0 (8%) 9.0 110.0 101.0
│ ├─ dnsutils-beholder02 1.0 1.0
│ ├─ hub-848f5d5578-47s2j 1.0 1.0
│ ├─ kube-apiserver-beholder02 1.0 1.0
│ ├─ kube-controller-manager-beholder02 1.0 1.0
│ ├─ kube-proxy-cmzf2 1.0 1.0
│ ├─ kube-router-6mwfx 1.0 1.0
│ ├─ kube-scheduler-beholder02 1.0 1.0
│ ├─ nginx-ip-2025-7fd66b99dd-2khh6 1.0 1.0
│ └─ nginx-rsn-2024-57d49484d-47rfc 1.0 1.0
├─ beholder03 (8%) 9.0 (8%) 9.0 110.0 101.0
│ ├─ dnsutils-beholder03 1.0 1.0
│ ├─ kube-apiserver-beholder03 1.0 1.0
│ ├─ kube-controller-manager-beholder03 1.0 1.0
│ ├─ kube-proxy-cftmz 1.0 1.0
│ ├─ kube-router-jnkdl 1.0 1.0
│ ├─ kube-scheduler-beholder03 1.0 1.0
│ ├─ ldap-67b47cf9b9-v6vvm 1.0 1.0
│ ├─ memcached-6b68cdd947-w4k2q 1.0 1.0
│ └─ nfd-master-6589cf6d4c-9xw6v 1.0 1.0
├─ belial (16%) 18.0 (16%) 18.0 110.0 92.0
│ ├─ cool-pod 1.0 1.0
│ ├─ coredns-66bc5c9577-kxlwp 1.0 1.0
│ ├─ dnsutils-belial 1.0 1.0
│ ├─ gatekeeper-controller-manager-66f474f785-bq2vs 1.0 1.0
│ ├─ gpu-feature-discovery-x4hnt 1.0 1.0
│ ├─ hub-78d6dd898d-7f6kx 1.0 1.0
│ ├─ kube-proxy-xxgbc 1.0 1.0
│ ├─ kube-router-2hmkx 1.0 1.0
│ ├─ nfd-gc-7b6c64c4b8-gc2k5 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-9nzsv 1.0 1.0
│ ├─ ollama-pod 1.0 1.0
│ ├─ prometheus-deployment-b65d5d898-q55j8 1.0 1.0
│ ├─ pytorch-pod 1.0 1.0
│ ├─ till-aust-baseline-arrowhead-job-pv7w7 1.0 1.0
│ ├─ till-aust-baseline-car-job-x7tlg 1.0 1.0
│ ├─ till-aust-ubuntu-entry-pod 1.0 1.0
│ ├─ user-scheduler-5cf5ffbc54-tjld9 1.0 1.0
│ └─ valentin-schmuker-storage 1.0 1.0
├─ demogorgon (9%) 10.0 (9%) 10.0 110.0 100.0
│ ├─ a2v2-single-gpu-jcsz 1.0 1.0
│ ├─ demogorgon-a2v2 1.0 1.0
│ ├─ dnsutils-demogorgon 1.0 1.0
│ ├─ gpu-demogorgon 1.0 1.0
│ ├─ gpu-feature-discovery-2d8cr 1.0 1.0
│ ├─ kube-proxy-xdkh7 1.0 1.0
│ ├─ kube-router-v72jn 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-2b2z8 1.0 1.0
│ ├─ pycharm 1.0 1.0
│ └─ ulas-bingoel-model-pod-3 1.0 1.0
├─ fierna (12%) 13.0 (12%) 13.0 110.0 97.0
│ ├─ dnsutils-fierna 1.0 1.0
│ ├─ gatekeeper-controller-manager-66f474f785-dvhjf 1.0 1.0
│ ├─ gpu-feature-discovery-k27g5 1.0 1.0
│ ├─ kube-proxy-pgw9t 1.0 1.0
│ ├─ kube-router-6p89p 1.0 1.0
│ ├─ ledavio-text-search-5d8f755795-cndsv 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-rqv7h 1.0 1.0
│ ├─ proxy-5495d795d5-jjk7j 1.0 1.0
│ ├─ proxy-7f79cc645f-52qjx 1.0 1.0
│ ├─ till-aust-baseline-adiac-job-66j56 1.0 1.0
│ ├─ till-aust-baseline-beef-job-ts4jw 1.0 1.0
│ ├─ user-scheduler-5cf5ffbc54-wrnrk 1.0 1.0
│ └─ user-scheduler-c7db6c584-6vbss 1.0 1.0
├─ kiaransalee (12%) 13.0 (12%) 13.0 110.0 97.0
│ ├─ bash-pod 1.0 1.0
│ ├─ continuous-image-puller-6fs4k 1.0 1.0
│ ├─ continuous-image-puller-6z8bj 1.0 1.0
│ ├─ dnsutils-kiaransalee 1.0 1.0
│ ├─ gpu-feature-discovery-w8j9g 1.0 1.0
│ ├─ jupyter-leo-2dtiedemann 1.0 1.0
│ ├─ jupyter-matheus-2dstefanini-2dmariano 1.0 1.0
│ ├─ jupyter-saroyehun 1.0 1.0
│ ├─ kube-proxy-l8zn4 1.0 1.0
│ ├─ kube-router-s5qjv 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-8jskd 1.0 1.0
│ ├─ tgillm-pod-7cd979b759-wgxjg 1.0 1.0
│ └─ ubuntu-gpu1 1.0 1.0
├─ mindflayer01 (9%) 10.0 (9%) 10.0 110.0 100.0
│ ├─ dnsutils-mindflayer01 1.0 1.0
│ ├─ gatekeeper-controller-manager-66f474f785-s8b5w 1.0 1.0
│ ├─ kube-proxy-9xsph 1.0 1.0
│ ├─ kube-router-w8xhq 1.0 1.0
│ ├─ mariadb-849498cb44-kjj5q 1.0 1.0
│ ├─ nginx-rec-2023-79df8c77cd-w9xwl 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-qv82v 1.0 1.0
│ ├─ phpfpm-nginx-5db96d6895-vwwf5 1.0 1.0
│ ├─ proxy-5bc89cc587-vlm9x 1.0 1.0
│ └─ whoami-74dc54d675-swm2n 1.0 1.0
├─ mindflayer02 (14%) 15.0 (14%) 15.0 110.0 95.0
│ ├─ cert-manager-79559475b4-7kv54 1.0 1.0
│ ├─ cert-manager-cainjector-966fc8fbc-zql8j 1.0 1.0
│ ├─ cert-manager-webhook-854cf5f458-wwf4d 1.0 1.0
│ ├─ dnsutils-mindflayer02 1.0 1.0
│ ├─ gatekeeper-audit-59d4b6fd4c-gtwjm 1.0 1.0
│ ├─ kube-proxy-dfgxf 1.0 1.0
│ ├─ kube-router-sv7f6 1.0 1.0
│ ├─ kube-state-metrics-8945855d-dqg79 1.0 1.0
│ ├─ local-path-provisioner-759479454f-7pqw8 1.0 1.0
│ ├─ mediawiki-77f9c84df5-p6k9g 1.0 1.0
│ ├─ nginx-ip-2023-78b9c84dbf-x8qs7 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-h7f9j 1.0 1.0
│ ├─ registry-auth-bd4bb7d8b-cdjnd 1.0 1.0
│ ├─ ubuntu-test-pod 1.0 1.0
│ └─ user-scheduler-c7db6c584-2pxhd 1.0 1.0
├─ mindflayer03 (7%) 8.0 (7%) 8.0 110.0 102.0
│ ├─ dnsutils-mindflayer03 1.0 1.0
│ ├─ kube-proxy-nph7j 1.0 1.0
│ ├─ kube-router-rzv5h 1.0 1.0
│ ├─ nginx-self-service-password-54767ddc56-wncd2 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-5h7w5 1.0 1.0
│ ├─ registry-7495ddbf59-vvf2x 1.0 1.0
│ ├─ registry-browser-7f4cbdf96b-5x2mv 1.0 1.0
│ └─ traefik-deployment-d8ccbfdd4-7nxdn 1.0 1.0
├─ tiamat (6%) 7.0 (6%) 7.0 110.0 103.0
│ ├─ a2v-pt-ft-hyena-large-julian-zimmermann 1.0 1.0
│ ├─ continuous-image-puller-wtkhd 1.0 1.0
│ ├─ dnsutils-tiamat 1.0 1.0
│ ├─ gpu-feature-discovery-vc97v 1.0 1.0
│ ├─ kube-proxy-n8m88 1.0 1.0
│ ├─ kube-router-g92j5 1.0 1.0
│ └─ nvidia-device-plugin-daemonset-pqfxg 1.0 1.0
├─ vecna (8%) 9.0 (8%) 9.0 110.0 101.0
│ ├─ dnsutils-vecna 1.0 1.0
│ ├─ gpu-feature-discovery-8mn9m 1.0 1.0
│ ├─ kube-proxy-6r98t 1.0 1.0
│ ├─ kube-router-4gqqj 1.0 1.0
│ ├─ nvidia-device-plugin-daemonset-hv8rs 1.0 1.0
│ ├─ train-maap-qln5k 1.0 1.0
│ ├─ ulas-bingoel-model-pod-4 1.0 1.0
│ ├─ valentin-schmuker-mvp-seg-full 1.0 1.0
│ └─ valentin-schmuker-mvp-seg-image 1.0 1.0
└─ zariel (5%) 6.0 (5%) 6.0 110.0 104.0
├─ dnsutils-zariel 1.0 1.0
├─ gpu-feature-discovery-tpxz6 1.0 1.0
├─ kube-proxy-gsqm7 1.0 1.0
├─ kube-router-fmt6c 1.0 1.0
├─ nvidia-device-plugin-daemonset-4q45h 1.0 1.0
└─ zariel-a2v2-pt-ft-jcsz 1.0 1.0
Resource Requested Limit Allocatable Free
auth
├─ beholder01
│ ├─ cpu 450.0m 0.0
│ └─ pods 3.0 3.0
├─ beholder03
│ ├─ cpu 250.0m 0.0
│ └─ pods 1.0 1.0
└─ mindflayer03 1.0 1.0
└─ pods 1.0 1.0
cert-manager 3.0 3.0
└─ mindflayer02 3.0 3.0
└─ pods 3.0 3.0
gatekeeper-system
├─ belial
│ ├─ cpu 100.0m 1.0
│ ├─ memory 256.0Mi 512.0Mi
│ └─ pods 1.0 1.0
├─ fierna
│ ├─ cpu 100.0m 1.0
│ ├─ memory 256.0Mi 512.0Mi
│ └─ pods 1.0 1.0
├─ mindflayer01
│ ├─ cpu 100.0m 1.0
│ ├─ memory 256.0Mi 512.0Mi
│ └─ pods 1.0 1.0
└─ mindflayer02
├─ cpu 100.0m 1.0
├─ memory 256.0Mi 512.0Mi
└─ pods 1.0 1.0
jupyterhub
├─ beholder02 1.0 1.0
│ └─ pods 1.0 1.0
├─ fierna 1.0 1.0
│ └─ pods 1.0 1.0
├─ kiaransalee
│ ├─ memory 3.2G 0.0
│ ├─ nvidia.com/gpu 2.0 2.0
│ ├─ nvidia.com/mig-4g.40gb 1.0 1.0
│ └─ pods 4.0 4.0
├─ mindflayer01 1.0 1.0
│ └─ pods 1.0 1.0
└─ mindflayer02 1.0 1.0
└─ pods 1.0 1.0
jupyterhub-kuckling 2.0 2.0
├─ fierna 1.0 1.0
│ └─ pods 1.0 1.0
└─ tiamat 1.0 1.0
└─ pods 1.0 1.0
jupyterhub-students 5.0 5.0
├─ belial 2.0 2.0
│ └─ pods 2.0 2.0
├─ fierna 2.0 2.0
│ └─ pods 2.0 2.0
└─ kiaransalee 1.0 1.0
└─ pods 1.0 1.0
kube-system
├─ asmodeus
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ beholder01
│ ├─ cpu 1.0 100.0m
│ ├─ memory 420.0Mi 270.0Mi
│ └─ pods 7.0 7.0
├─ beholder02
│ ├─ cpu 900.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 6.0 6.0
├─ beholder03
│ ├─ cpu 900.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 6.0 6.0
├─ belial
│ ├─ cpu 450.0m 100.0m
│ ├─ memory 420.0Mi 270.0Mi
│ └─ pods 6.0 6.0
├─ demogorgon
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ fierna
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ kiaransalee
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ mindflayer01
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ mindflayer02
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ mindflayer03
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 4.0 4.0
├─ tiamat
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
├─ vecna
│ ├─ cpu 350.0m 100.0m
│ ├─ memory 350.0Mi 100.0Mi
│ └─ pods 5.0 5.0
└─ zariel
├─ cpu 350.0m 100.0m
├─ memory 350.0Mi 100.0Mi
└─ pods 5.0 5.0
local-path-storage 1.0 1.0
└─ mindflayer02 1.0 1.0
└─ pods 1.0 1.0
monitoring
├─ belial
│ ├─ cpu 500.0m 1.0
│ ├─ memory 500.0M 1.0Gi
│ └─ pods 1.0 1.0
└─ mindflayer02 1.0 1.0
└─ pods 1.0 1.0
node-feature-discovery
├─ beholder03
│ ├─ cpu 100.0m 300.0m
│ ├─ memory 128.0Mi 4.0Gi
│ └─ pods 1.0 1.0
└─ belial
├─ cpu 10.0m 20.0m
├─ memory 128.0Mi 1.0Gi
└─ pods 1.0 1.0
registry
├─ mindflayer02
│ ├─ cpu 100.0m 0.0
│ └─ pods 1.0 1.0
└─ mindflayer03
├─ cpu 300.0m 0.0
└─ pods 2.0 2.0
traefik 2.0 2.0
├─ mindflayer01 1.0 1.0
│ └─ pods 1.0 1.0
└─ mindflayer03 1.0 1.0
└─ pods 1.0 1.0
user-alex-chan
├─ belial
│ ├─ cpu 2.0 2.0
│ ├─ memory 32.0Gi 32.0Gi
│ └─ pods 1.0 1.0
└─ vecna
├─ cpu 10.0 16.0
├─ ephemeral-storage 20.0Gi 25.0Gi
├─ memory 64.0Gi 160.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-andri-rutschmann
└─ kiaransalee
├─ cpu 12.0 12.0
├─ memory 16.0Gi 16.0Gi
└─ pods 1.0 1.0
user-celine-angonin
└─ demogorgon
├─ cpu 15.0 15.0
├─ memory 395.0Gi 395.0Gi
├─ nvidia.com/gpu 4.0 4.0
└─ pods 1.0 1.0
user-christoph-hanselka
└─ belial
├─ cpu 2.0 2.0
├─ memory 44.0Gi 44.0Gi
├─ nvidia.com/gpu 3.0 3.0
└─ pods 2.0 2.0
user-eduard-buss
└─ demogorgon
├─ cpu 1.0 32.0
├─ ephemeral-storage 15.0Gi 30.0Gi
├─ memory 10.0Gi 32.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-giovanna-ratini
└─ mindflayer02
├─ cpu 100.0m 1.0
├─ memory 100.0Mi 1.0Gi
└─ pods 1.0 1.0
user-julian-jandeleit
└─ demogorgon
├─ cpu 48.0 48.0
├─ memory 80.0Gi 80.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-julian-zimmermann
├─ asmodeus
│ ├─ cpu 64.0 64.0
│ ├─ memory 512.0Gi 512.0Gi
│ ├─ nvidia.com/gpu 1.0 1.0
│ └─ pods 1.0 1.0
├─ demogorgon
│ ├─ cpu 8.0 8.0
│ ├─ memory 256.0Gi 256.0Gi
│ ├─ nvidia.com/gpu 1.0 1.0
│ └─ pods 1.0 1.0
├─ tiamat
│ ├─ cpu 255.0 255.0
│ ├─ memory 896.0Gi 896.0Gi
│ ├─ nvidia.com/gpu 4.0 4.0
│ └─ pods 1.0 1.0
└─ zariel
├─ cpu 250.0 250.0
├─ memory 1.9Ti 1.9Ti
├─ nvidia.com/gpu 8.0 8.0
└─ pods 1.0 1.0
user-mike-battistella
└─ fierna
├─ cpu 1.0 2.0
├─ memory 32.0Gi 32.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-mohsen-jenadeleh
└─ asmodeus
├─ cpu 60.0 60.0
├─ memory 200.0Gi 200.0Gi
├─ nvidia.com/gpu 2.0 2.0
└─ pods 2.0 2.0
user-segun-aroyehun
└─ kiaransalee
├─ cpu 25.0 50.0
├─ ephemeral-storage 20.0Gi 40.0Gi
├─ memory 100.0Gi 150.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-stephen-tyndel
└─ kiaransalee
├─ cpu 12.0 12.0
├─ memory 80.0Gi 80.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-till-aust
├─ belial
│ ├─ cpu 65.0 65.0
│ ├─ memory 210.0Gi 250.0Gi
│ ├─ nvidia.com/gpu 2.0 2.0
│ └─ pods 3.0 3.0
└─ fierna
├─ cpu 64.0 64.0
├─ memory 200.0Gi 240.0Gi
├─ nvidia.com/gpu 2.0 2.0
└─ pods 2.0 2.0
user-ulas-bingoel
├─ asmodeus
│ ├─ cpu 8.0 8.0
│ ├─ memory 40.0Gi 40.0Gi
│ ├─ nvidia.com/gpu 1.0 1.0
│ └─ pods 1.0 1.0
├─ demogorgon
│ ├─ cpu 8.0 8.0
│ ├─ memory 30.0Gi 30.0Gi
│ ├─ nvidia.com/gpu 1.0 1.0
│ └─ pods 1.0 1.0
└─ vecna
├─ cpu 8.0 8.0
├─ memory 30.0Gi 30.0Gi
├─ nvidia.com/gpu 1.0 1.0
└─ pods 1.0 1.0
user-v-time
└─ mindflayer01
├─ cpu 1.5 3.0
├─ memory 768.0Mi 1.5Gi
└─ pods 2.0 2.0
user-valentin-schmuker
├─ belial
│ ├─ cpu 2.0 2.0
│ ├─ memory 2.0Gi 2.0Gi
│ └─ pods 1.0 1.0
└─ vecna
├─ cpu 32.0 32.0
├─ memory 128.0Gi 128.0Gi
├─ nvidia.com/gpu 8.0 8.0
└─ pods 2.0 2.0
web
├─ beholder01
│ ├─ cpu 250.0m 0.0
│ └─ pods 3.0 3.0
├─ beholder02
│ ├─ cpu 250.0m 1.0
│ └─ pods 2.0 2.0
├─ beholder03 1.0 1.0
│ └─ pods 1.0 1.0
├─ mindflayer01 1.0 1.0
│ └─ pods 1.0 1.0
└─ mindflayer02
├─ cpu 250.0m 0.0
└─ pods 2.0 2.0
cluster:
id: 3fee6f38-ba9f-11ec-9328-e188936dcafd
health: HEALTH_OK
services:
mon: 5 daemons, quorum beholder03,beholder01,beholder02,mindflayer02,mindflayer03 (age 4w) [leader: beholder03]
mgr: mindflayer01.mkuopd(active, since 6w), standbys: mindflayer02.ympgrs, beholder02.akktmp, beholder01.verxwn, beholder03.nprqzk, mindflayer03.rzdvrr
mds: 4/4 daemons up, 2 standby
osd: 24 osds: 24 up (since 4w), 24 in (since 4w)
data:
volumes: 1/1 healthy
pools: 3 pools, 545 pgs
objects: 73.56M objects, 97 TiB
usage: 195 TiB used, 87 TiB / 282 TiB avail
pgs: 543 active+clean
2 active+clean+scrubbing+deep
io:
client: 1.4 MiB/s rd, 944 KiB/s wr, 10 op/s rd, 15 op/s wr
HEALTH_OK
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.1.1:4252 | 7126e7a3a9cc42ca | 3.4.30 | 156 MB | false | false | 302065 | 706720078 | 706720078 | |
| 192.168.1.2:4252 | 39d72894bf6c7600 | 3.4.30 | 156 MB | false | false | 302065 | 706720078 | 706720078 | |
| 192.168.1.3:4252 | bbf4a2b99c3fd692 | 3.4.30 | 156 MB | false | false | 302065 | 706720078 | 706720078 | |
| 192.168.2.1:4252 | 5cb9997dd1c2246b | 3.4.30 | 156 MB | true | false | 302065 | 706720078 | 706720078 | |
| 192.168.2.3:4252 | cbc1cf89959ea4e | 3.4.30 | 156 MB | false | false | 302065 | 706720078 | 706720078 | |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| beholder01 |
| SSH port open |
yes |
| Report available |
yes |
| External interface up |
ok |
| Infiniband interface up |
ok |
| API servers reachable |
1
2
3
4
|
| Ceph monitors reachable |
1
2
3
|
| cephfs mounted |
/cephfs |
| local raid mounted |
/raid |
| Test pod responding |
dnsutils-beholder01 |
| Can reach kube-dns |
10.96.0.10 |
| Pod can reach kube-dns |
yes |
| Pod can reach internet |
yes |
|
| beholder02 |
| SSH port open |
yes |
| Report available |
yes |
| External interface up |
ok |
| Infiniband interface up |
ok |
| API servers reachable |
1
2
3
4
|
| Ceph monitors reachable |
1
2
3
|
| cephfs mounted |
/cephfs |
| local raid mounted |
/raid |
| Test pod responding |
dnsutils-beholder02 |
| Can reach kube-dns |
10.96.0.10 |
| Pod can reach kube-dns |
yes |
| Pod can reach internet |
yes |
|
| beholder03 |
| SSH port open |
yes |
| Report available |
yes |
| External interface up |
ok |
| Infiniband interface up |
ok |
| API servers reachable |
1
2
3
4
|
| Ceph monitors reachable |
1
2
3
|
| cephfs mounted |
/cephfs |
| local raid mounted |
/raid |
| Test pod responding |
dnsutils-beholder03 |
| Can reach kube-dns |
10.96.0.10 |
| Pod can reach kube-dns |
yes |
| Pod can reach internet |
yes |
|
Wed Feb 4 02:32:49 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro RTX 6000 Off | 00000000:1B:00.0 Off | Off |
| 33% 33C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro RTX 6000 Off | 00000000:1C:00.0 Off | Off |
| 33% 32C P8 12W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Quadro RTX 6000 Off | 00000000:1D:00.0 Off | Off |
| 33% 33C P8 5W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Quadro RTX 6000 Off | 00000000:1E:00.0 Off | Off |
| 33% 33C P8 5W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Quadro RTX 6000 Off | 00000000:3D:00.0 Off | Off |
| 33% 33C P8 13W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Quadro RTX 6000 Off | 00000000:3F:00.0 Off | Off |
| 33% 32C P8 6W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Quadro RTX 6000 Off | 00000000:40:00.0 Off | Off |
| 33% 33C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Quadro RTX 6000 Off | 00000000:41:00.0 Off | Off |
| 33% 33C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Wed Feb 4 02:32:51 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro RTX 6000 On | 00000000:1B:00.0 Off | Off |
| 33% 33C P8 17W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Quadro RTX 6000 On | 00000000:1C:00.0 Off | Off |
| 33% 32C P8 17W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Quadro RTX 6000 On | 00000000:1D:00.0 Off | Off |
| 33% 33C P8 4W / 260W | 4100MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Quadro RTX 6000 On | 00000000:1E:00.0 Off | Off |
| 33% 32C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Quadro RTX 6000 On | 00000000:3D:00.0 Off | Off |
| 33% 31C P8 15W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Quadro RTX 6000 On | 00000000:3F:00.0 Off | Off |
| 33% 32C P8 4W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Quadro RTX 6000 On | 00000000:40:00.0 Off | Off |
| 33% 32C P8 12W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Quadro RTX 6000 On | 00000000:41:00.0 Off | Off |
| 33% 32C P8 16W / 260W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 2 N/A N/A 2012068 C python 4096MiB |
+-----------------------------------------------------------------------------------------+
Wed Feb 4 02:32:53 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:01:00.0 Off | 0 |
| N/A 52C P0 211W / 400W | 39823MiB / 40960MiB | 98% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-40GB On | 00000000:41:00.0 Off | 0 |
| N/A 52C P0 159W / 400W | 39907MiB / 40960MiB | 86% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-40GB On | 00000000:81:00.0 Off | 0 |
| N/A 44C P0 99W / 400W | 40075MiB / 40960MiB | 98% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-40GB On | 00000000:C1:00.0 Off | 0 |
| N/A 46C P0 323W / 400W | 40015MiB / 40960MiB | 66% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 45258 C /usr/bin/python3 39812MiB |
| 1 N/A N/A 45259 C /usr/bin/python3 39896MiB |
| 2 N/A N/A 45260 C /usr/bin/python3 40064MiB |
| 3 N/A N/A 45261 C /usr/bin/python3 40004MiB |
+-----------------------------------------------------------------------------------------+
Wed Feb 4 03:32:56 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-SXM3-32GB Off | 00000000:34:00.0 Off | 0 |
| N/A 34C P0 48W / 350W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla V100-SXM3-32GB Off | 00000000:36:00.0 Off | 0 |
| N/A 34C P0 48W / 350W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla V100-SXM3-32GB Off | 00000000:39:00.0 Off | 0 |
| N/A 37C P0 54W / 350W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla V100-SXM3-32GB Off | 00000000:3B:00.0 Off | 0 |
| N/A 59C P0 304W / 350W | 25740MiB / 32768MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Tesla V100-SXM3-32GB Off | 00000000:57:00.0 Off | 0 |
| N/A 49C P0 117W / 350W | 25738MiB / 32768MiB | 80% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Tesla V100-SXM3-32GB Off | 00000000:59:00.0 Off | 0 |
| N/A 65C P0 335W / 350W | 25766MiB / 32768MiB | 54% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Tesla V100-SXM3-32GB Off | 00000000:5C:00.0 Off | 0 |
| N/A 53C P0 345W / 350W | 25770MiB / 32768MiB | 73% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Tesla V100-SXM3-32GB Off | 00000000:5E:00.0 Off | 0 |
| N/A 68C P0 350W / 350W | 25750MiB / 32768MiB | 60% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 8 Tesla V100-SXM3-32GB Off | 00000000:B7:00.0 Off | 0 |
| N/A 48C P0 87W / 350W | 25746MiB / 32768MiB | 94% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 9 Tesla V100-SXM3-32GB Off | 00000000:B9:00.0 Off | 0 |
| N/A 50C P0 233W / 350W | 25766MiB / 32768MiB | 36% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 10 Tesla V100-SXM3-32GB Off | 00000000:BC:00.0 Off | 0 |
| N/A 63C P0 350W / 350W | 25762MiB / 32768MiB | 50% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 11 Tesla V100-SXM3-32GB Off | 00000000:BE:00.0 Off | 0 |
| N/A 67C P0 321W / 350W | 25750MiB / 32768MiB | 67% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 12 Tesla V100-SXM3-32GB Off | 00000000:E0:00.0 Off | 0 |
| N/A 37C P0 48W / 350W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 13 Tesla V100-SXM3-32GB Off | 00000000:E2:00.0 Off | 0 |
| N/A 37C P0 49W / 350W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 14 Tesla V100-SXM3-32GB Off | 00000000:E5:00.0 Off | 0 |
| N/A 40C P0 51W / 350W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 15 Tesla V100-SXM3-32GB Off | 00000000:E7:00.0 Off | 0 |
| N/A 39C P0 49W / 350W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 3 N/A N/A 3915219 C python 25734MiB |
| 4 N/A N/A 1620655 C /opt/conda/bin/python 25728MiB |
| 5 N/A N/A 1620656 C /opt/conda/bin/python 25756MiB |
| 6 N/A N/A 1620657 C /opt/conda/bin/python 25760MiB |
| 7 N/A N/A 1620658 C /opt/conda/bin/python 25740MiB |
| 8 N/A N/A 2939670 C /opt/conda/bin/python 25736MiB |
| 9 N/A N/A 2939671 C /opt/conda/bin/python 25756MiB |
| 10 N/A N/A 2939672 C /opt/conda/bin/python 25752MiB |
| 11 N/A N/A 2939673 C /opt/conda/bin/python 25740MiB |
+-----------------------------------------------------------------------------------------+
Wed Feb 4 02:32:59 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | 0 |
| N/A 46C P0 244W / 500W | 55084MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:41:00.0 Off | 0 |
| N/A 44C P0 216W / 500W | 7498MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:81:00.0 Off | 0 |
| N/A 27C P0 59W / 500W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:C1:00.0 Off | 0 |
| N/A 27C P0 60W / 500W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2085878 C python 13020MiB |
| 0 N/A N/A 2103268 C python 15440MiB |
| 0 N/A N/A 2107017 C python 15440MiB |
| 0 N/A N/A 2125087 C python 11110MiB |
| 1 N/A N/A 388245 C python 12MiB |
| 1 N/A N/A 388252 C python 12MiB |
| 1 N/A N/A 2112002 C python 3560MiB |
| 1 N/A N/A 2124517 C python 3760MiB |
+-----------------------------------------------------------------------------------------+
Wed Feb 4 03:33:02 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:07:00.0 Off | 0 |
| N/A 48C P0 178W / 400W | 34476MiB / 40960MiB | 32% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-SXM4-40GB On | 00000000:0F:00.0 Off | 0 |
| N/A 45C P0 192W / 400W | 34002MiB / 40960MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-SXM4-40GB On | 00000000:47:00.0 Off | 0 |
| N/A 43C P0 177W / 400W | 34506MiB / 40960MiB | 77% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-SXM4-40GB On | 00000000:4E:00.0 Off | 0 |
| N/A 44C P0 174W / 400W | 34942MiB / 40960MiB | 96% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100-SXM4-40GB On | 00000000:87:00.0 Off | 0 |
| N/A 61C P0 206W / 400W | 34004MiB / 40960MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100-SXM4-40GB On | 00000000:90:00.0 Off | 0 |
| N/A 56C P0 150W / 400W | 34878MiB / 40960MiB | 95% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100-SXM4-40GB On | 00000000:B7:00.0 Off | 0 |
| N/A 57C P0 198W / 400W | 34170MiB / 40960MiB | 99% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100-SXM4-40GB On | 00000000:BD:00.0 Off | 0 |
| N/A 55C P0 176W / 400W | 34472MiB / 40960MiB | 92% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1060845 C /usr/bin/python 34466MiB |
| 1 N/A N/A 1060846 C /usr/bin/python 33992MiB |
| 2 N/A N/A 1060847 C /usr/bin/python 34496MiB |
| 3 N/A N/A 1060848 C /usr/bin/python 34932MiB |
| 4 N/A N/A 1060849 C /usr/bin/python 33994MiB |
| 5 N/A N/A 1060850 C /usr/bin/python 34868MiB |
| 6 N/A N/A 1060851 C /usr/bin/python 34160MiB |
| 7 N/A N/A 1060852 C /usr/bin/python 34462MiB |
+-----------------------------------------------------------------------------------------+
Wed Feb 4 02:33:06 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A40 Off | 00000000:01:00.0 Off | 0 |
| 0% 30C P8 25W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A40 Off | 00000000:25:00.0 Off | 0 |
| 0% 29C P8 24W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A40 Off | 00000000:41:00.0 Off | 0 |
| 0% 29C P8 24W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A40 Off | 00000000:61:00.0 Off | 0 |
| 0% 29C P8 24W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A40 Off | 00000000:81:00.0 Off | 0 |
| 0% 29C P8 17W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A40 Off | 00000000:A1:00.0 Off | 0 |
| 0% 42C P0 80W / 300W | 39707MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A40 Off | 00000000:C1:00.0 Off | 0 |
| 0% 47C P0 83W / 300W | 4093MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A40 Off | 00000000:E1:00.0 Off | 0 |
| 0% 29C P8 16W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 5 N/A N/A 1964129 C /usr/bin/python 39698MiB |
| 6 N/A N/A 649962 C python 4084MiB |
+-----------------------------------------------------------------------------------------+
Wed Feb 4 02:33:08 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:26:00.0 Off | 0 |
| N/A 28C P0 97W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:2F:00.0 Off | 0 |
| N/A 68C P0 700W / 700W | 58452MiB / 81559MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:46:00.0 Off | 0 |
| N/A 29C P0 74W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:54:00.0 Off | 0 |
| N/A 33C P0 148W / 700W | 724MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:A6:00.0 Off | 0 |
| N/A 43C P0 82W / 700W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:AF:00.0 Off | 0 |
| N/A 31C P0 127W / 700W | 75430MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:C6:00.0 Off | On |
| N/A 42C P0 344W / 700W | 29600MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:CF:00.0 Off | On |
| N/A 26C P0 73W / 700W | 90MiB / 81559MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 6 1 0 0 | 29534MiB / 40320MiB | 64 0 | 4 0 4 0 4 |
| | 3MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 6 2 0 1 | 38MiB / 40320MiB | 60 0 | 3 0 3 0 3 |
| | 0MiB / 65535MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 7 0 0 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 8 0 1 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 9 0 2 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 11 0 3 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 12 0 4 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 13 0 5 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 7 14 0 6 | 13MiB / 9984MiB | 16 0 | 1 0 1 0 1 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 1 N/A N/A 2316272 C VLLM::EngineCore 58442MiB |
| 3 N/A N/A 2310751 C /home/jovyan/server_env/bin/python 714MiB |
| 5 N/A N/A 2572298 C VLLM::EngineCore 75420MiB |
| 6 1 0 1989885 C python 29502MiB |
+-----------------------------------------------------------------------------------------+