Cluster status Thu, 18 Sep 2025 14:32:01 +0000 report from beholder01

Resource usage (overall)
Resource usage (by namespace)
Ceph file system status
Etcd cluster status
Detailed network health report
nVidia Driver and GPU reports
  Imp
  Dretch
  Belial
  Fierna
  Tiamat
  Vecna
  Asmodeus
  Zariel
  Demogorgon

Resource usage (overall)

 Resource                                                   Requested          Limit  Allocatable      Free 
  cpu                                                     (38%) 614.2    (50%) 807.0         1.6k     809.0 
  ├─ asmodeus                                             (98%) 250.3    (98%) 250.1        256.0       5.7 
  │  ├─ asmodeus-all-gpu-julian-schaefer-zimmermann             250.0          250.0                        
  │  ├─ dnsutils-asmodeus                                      100.0m         100.0m                        
  │  └─ kube-router-qv489                                      250.0m            0.0                        
  ├─ beholder01                                           (4%) 900.0m    (0%) 100.0m         24.0      23.1 
  │  ├─ dnsutils-beholder01                                    100.0m         100.0m                        
  │  ├─ kube-apiserver-beholder01                              250.0m            0.0                        
  │  ├─ kube-controller-manager-beholder01                     200.0m            0.0                        
  │  ├─ kube-router-d2rq4                                      250.0m            0.0                        
  │  └─ kube-scheduler-beholder01                              100.0m            0.0                        
  ├─ beholder02                                              (8%) 1.9       (9%) 2.1         24.0      21.9 
  │  ├─ dnsutils-beholder02                                    100.0m         100.0m                        
  │  ├─ kube-apiserver-beholder02                              250.0m            0.0                        
  │  ├─ kube-controller-manager-beholder02                     200.0m            0.0                        
  │  ├─ kube-router-dlcmv                                      250.0m            0.0                        
  │  ├─ kube-scheduler-beholder02                              100.0m            0.0                        
  │  ├─ wordpress-8c8944c8d-cxbrq                              500.0m            1.0                        
  │  └─ wordpress-mariadb-565c547dc8-59fhr                     500.0m            1.0                        
  ├─ beholder03                                           (4%) 900.0m    (0%) 100.0m         24.0      23.1 
  │  ├─ dnsutils-beholder03                                    100.0m         100.0m                        
  │  ├─ kube-apiserver-beholder03                              250.0m            0.0                        
  │  ├─ kube-controller-manager-beholder03                     200.0m            0.0                        
  │  ├─ kube-router-hm6f8                                      250.0m            0.0                        
  │  └─ kube-scheduler-beholder03                              100.0m            0.0                        
  ├─ belial                                                (74%) 59.4   (146%) 117.1         80.0       0.0 
  │  ├─ dnsutils-belial                                        100.0m         100.0m                        
  │  ├─ gpu-pod                                                   1.0            1.0                        
  │  ├─ kube-router-bbv5t                                      250.0m            0.0                        
  │  ├─ mariya-automl-well-being-job-7w56h                        4.0            8.0                        
  │  ├─ mariya-lstm-well-being-job-894nj                          4.0            8.0                        
  │  └─ ubuntu-gpu1                                              50.0          100.0                        
  ├─ demogorgon                                            (92%) 88.3     (92%) 88.1         96.0       7.7 
  │  ├─ a2v2-triple-gpu-jcsz                                     16.0           16.0                        
  │  ├─ dnsutils-demogorgon                                    100.0m         100.0m                        
  │  ├─ gpu-demogorgon                                           48.0           48.0                        
  │  ├─ kube-router-nh4mn                                      250.0m            0.0                        
  │  └─ ulas-bingoel-model-pod-6                                 24.0           24.0                        
  ├─ fierna                                                (33%) 26.4     (33%) 26.1         80.0      53.6 
  │  ├─ beesbook-days-0-qb4cm                                    12.0           12.0                        
  │  ├─ beesbook-days-1-mwm9s                                    12.0           12.0                        
  │  ├─ dnsutils-fierna                                        100.0m         100.0m                        
  │  ├─ gpu-pod                                                   1.0            1.0                        
  │  ├─ kube-router-nndmg                                      250.0m            0.0                        
  │  └─ mi-swarm-pod                                              1.0            1.0                        
  ├─ kiaransalee                                          (0%) 350.0m    (0%) 100.0m        192.0     191.7 
  │  ├─ dnsutils-kiaransalee                                   100.0m         100.0m                        
  │  └─ kube-router-bw8bz                                      250.0m            0.0                        
  ├─ lolth                                                (2%) 750.0m       (8%) 3.1         40.0      36.9 
  │  ├─ coredns-55cb58b774-cxfmw                               100.0m            0.0                        
  │  ├─ dnsutils-lolth                                         100.0m         100.0m                        
  │  ├─ gatekeeper-audit-59d4b6fd4c-q7cj8                      100.0m            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-qfkmj         100.0m            1.0                        
  │  ├─ kube-router-z6szh                                      250.0m            0.0                        
  │  └─ ubuntu-test-pod                                        100.0m            1.0                        
  ├─ mindflayer01                                            (4%) 2.3       (8%) 5.1         64.0      58.9 
  │  ├─ dex-7ddbb96f79-cmc5m                                   250.0m            0.0                        
  │  ├─ dex-loginapp-7d9c6ff56-lhj2r                           100.0m            0.0                        
  │  ├─ dex-mysql-66b5885f7b-2wdw8                             100.0m            0.0                        
  │  ├─ dnsutils-mindflayer01                                  100.0m         100.0m                        
  │  ├─ gatekeeper-controller-manager-66f474f785-lxl5x         100.0m            1.0                        
  │  ├─ kube-router-rzzr2                                      250.0m            0.0                        
  │  ├─ login-pod                                               60.0m            1.0                        
  │  ├─ mariadb-849498cb44-dldtm                               500.0m            1.0                        
  │  ├─ prometheus-deployment-b65d5d898-t2z9l                  500.0m            1.0                        
  │  ├─ registry-7fcf447f4-m5sz9                               100.0m            0.0                        
  │  ├─ registry-auth-7dcc7f6766-m8bsh                         100.0m            0.0                        
  │  └─ temp-pod                                               100.0m            1.0                        
  ├─ mindflayer02                                          (34%) 21.6     (44%) 28.1         64.0      35.9 
  │  ├─ cool-pod                                                 10.0           16.0                        
  │  ├─ dnsutils-mindflayer02                                  100.0m         100.0m                        
  │  ├─ gpu-pod-aalbi                                            10.0           10.0                        
  │  ├─ kube-router-k8t5k                                      250.0m            0.0                        
  │  ├─ mediawiki-mariadb-5cc4866855-xffs4                     250.0m            0.0                        
  │  └─ phpfpm-nginx-5db96d6895-97f2g                             1.0            2.0                        
  ├─ mindflayer03                                            (3%) 2.1       (4%) 2.6         64.0      61.4 
  │  ├─ check-data-pod                                         100.0m         500.0m                        
  │  ├─ coredns-55cb58b774-4kpwk                               100.0m            0.0                        
  │  ├─ dnsutils-mindflayer03                                  100.0m         100.0m                        
  │  ├─ gatekeeper-controller-manager-66f474f785-5gv4n         100.0m            1.0                        
  │  ├─ kube-router-ptw9p                                      250.0m            0.0                        
  │  ├─ ldap-6987986dbc-2zr2c                                  250.0m            0.0                        
  │  ├─ ledavio-text-search                                       1.0            1.0                        
  │  ├─ system-registry-8877cd57-59pzr                         100.0m            0.0                        
  │  └─ user-registry-64bb8ff7cf-5cktq                         100.0m            0.0                        
  ├─ tiamat                                               (0%) 350.0m    (0%) 100.0m        256.0     255.7 
  │  ├─ dnsutils-tiamat                                        100.0m         100.0m                        
  │  └─ kube-router-6zps8                                      250.0m            0.0                        
  ├─ vecna                                                 (17%) 16.4   (104%) 100.1         96.0       0.0 
  │  ├─ dnsutils-vecna                                         100.0m         100.0m                        
  │  ├─ felix-petersen-job-20                                    16.0          100.0                        
  │  └─ kube-router-f67nm                                      250.0m            0.0                        
  └─ zariel                                               (56%) 142.3    (72%) 184.1        256.0      71.9 
     ├─ dnsutils-zariel                                        100.0m         100.0m                        
     ├─ julian-welzel-job-1                                      96.0           96.0                        
     ├─ kube-router-fww75                                      250.0m            0.0                        
     ├─ mariya-cnn-well-being-classifier-job-7kx7t                4.0            8.0                        
     ├─ oliver-wiedemann-pod                                     32.0           64.0                        
     └─ yolo2-kc67n                                              10.0           16.0                        
  ephemeral-storage                                      (1%) 100.0Gi   (1%) 150.0Gi        11.9T     11.7T 
  ├─ asmodeus                                                (0%) 0.0       (0%) 0.0        94.6G     94.6G 
  ├─ beholder01                                              (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ beholder02                                              (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ beholder03                                              (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ belial                                             (57%) 100.0Gi  (85%) 150.0Gi       189.2G     28.2G 
  │  └─ ubuntu-gpu1                                           100.0Gi        150.0Gi                        
  ├─ demogorgon                                              (0%) 0.0       (0%) 0.0       706.7G    706.7G 
  ├─ fierna                                                  (0%) 0.0       (0%) 0.0       189.2G    189.2G 
  ├─ kiaransalee                                             (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ lolth                                                   (0%) 0.0       (0%) 0.0       530.5G    530.5G 
  ├─ mindflayer01                                            (0%) 0.0       (0%) 0.0       211.5G    211.5G 
  ├─ mindflayer02                                            (0%) 0.0       (0%) 0.0       211.5G    211.5G 
  ├─ mindflayer03                                            (0%) 0.0       (0%) 0.0       211.5G    211.5G 
  ├─ tiamat                                                  (0%) 0.0       (0%) 0.0       164.4G    164.4G 
  ├─ vecna                                                   (0%) 0.0       (0%) 0.0       849.0G    849.0G 
  └─ zariel                                                  (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  memory                                                   (30%) 4.2T    (45%) 5.9Ti       12.9Ti     7.0Ti 
  ├─ asmodeus                                             (77%) 1.5Ti    (77%) 1.5Ti        2.0Ti   467.4Gi 
  │  ├─ asmodeus-all-gpu-julian-schaefer-zimmermann             1.5Ti          1.5Ti                        
  │  ├─ dnsutils-asmodeus                                     100.0Mi        100.0Mi                        
  │  └─ kube-router-qv489                                     250.0Mi            0.0                        
  ├─ beholder01                                          (0%) 350.0Mi   (0%) 100.0Mi       92.9Gi    92.5Gi 
  │  ├─ dnsutils-beholder01                                   100.0Mi        100.0Mi                        
  │  └─ kube-router-d2rq4                                     250.0Mi            0.0                        
  ├─ beholder02                                          (1%) 862.0Mi     (1%) 1.1Gi       92.9Gi    91.8Gi 
  │  ├─ dnsutils-beholder02                                   100.0Mi        100.0Mi                        
  │  ├─ kube-router-dlcmv                                     250.0Mi            0.0                        
  │  ├─ wordpress-8c8944c8d-cxbrq                             256.0Mi        512.0Mi                        
  │  └─ wordpress-mariadb-565c547dc8-59fhr                    256.0Mi        512.0Mi                        
  ├─ beholder03                                          (0%) 350.0Mi   (0%) 100.0Mi       92.9Gi    92.5Gi 
  │  ├─ dnsutils-beholder03                                   100.0Mi        100.0Mi                        
  │  └─ kube-router-hm6f8                                     250.0Mi            0.0                        
  ├─ belial                                             (45%) 340.3Gi  (70%) 530.1Gi      754.4Gi   224.3Gi 
  │  ├─ dnsutils-belial                                       100.0Mi        100.0Mi                        
  │  ├─ gpu-pod                                                10.0Gi         10.0Gi                        
  │  ├─ kube-router-bbv5t                                     250.0Mi            0.0                        
  │  ├─ mariya-automl-well-being-job-7w56h                     40.0Gi         60.0Gi                        
  │  ├─ mariya-lstm-well-being-job-894nj                       40.0Gi         60.0Gi                        
  │  └─ ubuntu-gpu1                                           250.0Gi        400.0Gi                        
  ├─ demogorgon                                           (59%) 1.2Ti    (59%) 1.2Ti        2.0Ti   819.5Gi 
  │  ├─ a2v2-triple-gpu-jcsz                                    1.0Ti          1.0Ti                        
  │  ├─ dnsutils-demogorgon                                   100.0Mi        100.0Mi                        
  │  ├─ gpu-demogorgon                                         80.0Gi         80.0Gi                        
  │  ├─ kube-router-nh4mn                                     250.0Mi            0.0                        
  │  └─ ulas-bingoel-model-pod-6                               80.0Gi         80.0Gi                        
  ├─ fierna                                              (11%) 84.3Gi  (20%) 148.1Gi      754.4Gi   606.3Gi 
  │  ├─ beesbook-days-0-qb4cm                                  32.0Gi         64.0Gi                        
  │  ├─ beesbook-days-1-mwm9s                                  32.0Gi         64.0Gi                        
  │  ├─ dnsutils-fierna                                       100.0Mi        100.0Mi                        
  │  ├─ gpu-pod                                                10.0Gi         10.0Gi                        
  │  ├─ kube-router-nndmg                                     250.0Mi            0.0                        
  │  └─ mi-swarm-pod                                           10.0Gi         10.0Gi                        
  ├─ kiaransalee                                            (1%) 9.0G   (0%) 100.0Mi        1.5Ti      1.6T 
  │  ├─ dnsutils-kiaransalee                                  100.0Mi        100.0Mi                        
  │  ├─ jupyter-aivan-2dau                                       1.1G            0.0                        
  │  ├─ jupyter-alebellina412                                    1.1G            0.0                        
  │  ├─ jupyter-huygenssteiner                                   1.1G            0.0                        
  │  ├─ jupyter-jamannio                                         1.1G            0.0                        
  │  ├─ jupyter-kmayer24                                         1.1G            0.0                        
  │  ├─ jupyter-matheus-2dstefanini-2dmariano                    1.1G            0.0                        
  │  ├─ jupyter-samrauh                                          1.1G            0.0                        
  │  ├─ jupyter-saroyehun                                        1.1G            0.0                        
  │  └─ kube-router-bw8bz                                     250.0Mi            0.0                        
  ├─ lolth                                                 (0%) 1.0Gi     (1%) 2.3Gi      251.4Gi   249.2Gi 
  │  ├─ coredns-55cb58b774-cxfmw                               70.0Mi        170.0Mi                        
  │  ├─ dnsutils-lolth                                        100.0Mi        100.0Mi                        
  │  ├─ gatekeeper-audit-59d4b6fd4c-q7cj8                     256.0Mi        512.0Mi                        
  │  ├─ gatekeeper-controller-manager-66f474f785-qfkmj        256.0Mi        512.0Mi                        
  │  ├─ kube-router-z6szh                                     250.0Mi            0.0                        
  │  └─ ubuntu-test-pod                                       100.0Mi          1.0Gi                        
  ├─ mindflayer01                                           (0%) 1.6G     (1%) 4.1Gi      376.5Gi   372.4Gi 
  │  ├─ dnsutils-mindflayer01                                 100.0Mi        100.0Mi                        
  │  ├─ gatekeeper-controller-manager-66f474f785-lxl5x        256.0Mi        512.0Mi                        
  │  ├─ kube-router-rzzr2                                     250.0Mi            0.0                        
  │  ├─ login-pod                                             100.0Mi          1.0Gi                        
  │  ├─ mariadb-849498cb44-dldtm                              256.0Mi        512.0Mi                        
  │  ├─ prometheus-deployment-b65d5d898-t2z9l                  500.0M          1.0Gi                        
  │  └─ temp-pod                                              100.0Mi          1.0Gi                        
  ├─ mindflayer02                                        (20%) 74.8Gi   (20%) 75.1Gi      376.5Gi   301.4Gi 
  │  ├─ cool-pod                                               64.0Gi         64.0Gi                        
  │  ├─ dnsutils-mindflayer02                                 100.0Mi        100.0Mi                        
  │  ├─ gpu-pod-aalbi                                          10.0Gi         10.0Gi                        
  │  ├─ kube-router-k8t5k                                     250.0Mi            0.0                        
  │  └─ phpfpm-nginx-5db96d6895-97f2g                         512.0Mi          1.0Gi                        
  ├─ mindflayer03                                         (9%) 32.8Gi    (9%) 33.3Gi      376.5Gi   343.2Gi 
  │  ├─ check-data-pod                                        100.0Mi        500.0Mi                        
  │  ├─ coredns-55cb58b774-4kpwk                               70.0Mi        170.0Mi                        
  │  ├─ dnsutils-mindflayer03                                 100.0Mi        100.0Mi                        
  │  ├─ gatekeeper-controller-manager-66f474f785-5gv4n        256.0Mi        512.0Mi                        
  │  ├─ kube-router-ptw9p                                     250.0Mi            0.0                        
  │  └─ ledavio-text-search                                    32.0Gi         32.0Gi                        
  ├─ tiamat                                              (0%) 350.0Mi   (0%) 100.0Mi     1007.6Gi  1007.2Gi 
  │  ├─ dnsutils-tiamat                                       100.0Mi        100.0Mi                        
  │  └─ kube-router-6zps8                                     250.0Mi            0.0                        
  ├─ vecna                                               (7%) 100.3Gi   (106%) 1.6Ti        1.5Ti       0.0 
  │  ├─ dnsutils-vecna                                        100.0Mi        100.0Mi                        
  │  ├─ felix-petersen-job-20                                 100.0Gi          1.6Ti                        
  │  └─ kube-router-f67nm                                     250.0Mi            0.0                        
  └─ zariel                                             (28%) 560.3Gi  (44%) 890.1Gi        2.0Ti     1.1Ti 
     ├─ dnsutils-zariel                                       100.0Mi        100.0Mi                        
     ├─ julian-welzel-job-1                                   100.0Gi        400.0Gi                        
     ├─ kube-router-fww75                                     250.0Mi            0.0                        
     ├─ mariya-cnn-well-being-classifier-job-7kx7t             30.0Gi         60.0Gi                        
     ├─ oliver-wiedemann-pod                                  400.0Gi        400.0Gi                        
     └─ yolo2-kc67n                                            30.0Gi         30.0Gi                        
  nvidia.com/gpu                                           (77%) 48.0     (77%) 48.0         62.0      14.0 
  ├─ asmodeus                                              (100%) 4.0     (100%) 4.0          4.0       0.0 
  │  └─ asmodeus-all-gpu-julian-schaefer-zimmermann               4.0            4.0                        
  ├─ belial                                                 (50%) 4.0      (50%) 4.0          8.0       4.0 
  │  ├─ gpu-pod                                                   1.0            1.0                        
  │  ├─ mariya-automl-well-being-job-7w56h                        1.0            1.0                        
  │  ├─ mariya-lstm-well-being-job-894nj                          1.0            1.0                        
  │  └─ ubuntu-gpu1                                               1.0            1.0                        
  ├─ demogorgon                                             (75%) 6.0      (75%) 6.0          8.0       2.0 
  │  ├─ a2v2-triple-gpu-jcsz                                      3.0            3.0                        
  │  ├─ gpu-demogorgon                                            1.0            1.0                        
  │  └─ ulas-bingoel-model-pod-6                                  2.0            2.0                        
  ├─ fierna                                                 (50%) 4.0      (50%) 4.0          8.0       4.0 
  │  ├─ beesbook-days-0-qb4cm                                     1.0            1.0                        
  │  ├─ beesbook-days-1-mwm9s                                     1.0            1.0                        
  │  ├─ gpu-pod                                                   1.0            1.0                        
  │  └─ mi-swarm-pod                                              1.0            1.0                        
  ├─ kiaransalee                                           (100%) 6.0     (100%) 6.0          6.0       0.0 
  │  ├─ jupyter-aivan-2dau                                        1.0            1.0                        
  │  ├─ jupyter-alebellina412                                     1.0            1.0                        
  │  ├─ jupyter-huygenssteiner                                    1.0            1.0                        
  │  ├─ jupyter-jamannio                                          1.0            1.0                        
  │  ├─ jupyter-matheus-2dstefanini-2dmariano                     1.0            1.0                        
  │  └─ jupyter-saroyehun                                         1.0            1.0                        
  ├─ tiamat                                                  (0%) 0.0       (0%) 0.0          4.0       4.0 
  ├─ vecna                                                (100%) 16.0    (100%) 16.0         16.0       0.0 
  │  └─ felix-petersen-job-20                                    16.0           16.0                        
  └─ zariel                                                (100%) 8.0     (100%) 8.0          8.0       0.0 
     ├─ julian-welzel-job-1                                       4.0            4.0                        
     ├─ mariya-cnn-well-being-classifier-job-7kx7t                1.0            1.0                        
     ├─ oliver-wiedemann-pod                                      2.0            2.0                        
     └─ yolo2-kc67n                                               1.0            1.0                        
  nvidia.com/mig-1g.10gb                                    (29%) 2.0      (29%) 2.0          7.0       5.0 
  └─ kiaransalee                                            (29%) 2.0      (29%) 2.0          7.0       5.0 
     ├─ jupyter-kmayer24                                          1.0            1.0                        
     └─ jupyter-samrauh                                           1.0            1.0                        
  nvidia.com/mig-3g.40gb                                     (0%) 0.0       (0%) 0.0          1.0       1.0 
  └─ kiaransalee                                             (0%) 0.0       (0%) 0.0          1.0       1.0 
  nvidia.com/mig-4g.40gb                                     (0%) 0.0       (0%) 0.0          1.0       1.0 
  └─ kiaransalee                                             (0%) 0.0       (0%) 0.0          1.0       1.0 
  pods                                                     (9%) 151.0     (9%) 151.0         1.6k      1.5k 
  ├─ asmodeus                                                (5%) 5.0       (5%) 5.0        110.0     105.0 
  │  ├─ asmodeus-all-gpu-julian-schaefer-zimmermann               1.0            1.0                        
  │  ├─ dnsutils-asmodeus                                         1.0            1.0                        
  │  ├─ kube-proxy-rbzdb                                          1.0            1.0                        
  │  ├─ kube-router-qv489                                         1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-lrkw7                      1.0            1.0                        
  ├─ beholder01                                              (5%) 6.0       (5%) 6.0        110.0     104.0 
  │  ├─ dnsutils-beholder01                                       1.0            1.0                        
  │  ├─ kube-apiserver-beholder01                                 1.0            1.0                        
  │  ├─ kube-controller-manager-beholder01                        1.0            1.0                        
  │  ├─ kube-proxy-7f5v4                                          1.0            1.0                        
  │  ├─ kube-router-d2rq4                                         1.0            1.0                        
  │  └─ kube-scheduler-beholder01                                 1.0            1.0                        
  ├─ beholder02                                              (7%) 8.0       (7%) 8.0        110.0     102.0 
  │  ├─ dnsutils-beholder02                                       1.0            1.0                        
  │  ├─ kube-apiserver-beholder02                                 1.0            1.0                        
  │  ├─ kube-controller-manager-beholder02                        1.0            1.0                        
  │  ├─ kube-proxy-dlc5n                                          1.0            1.0                        
  │  ├─ kube-router-dlcmv                                         1.0            1.0                        
  │  ├─ kube-scheduler-beholder02                                 1.0            1.0                        
  │  ├─ wordpress-8c8944c8d-cxbrq                                 1.0            1.0                        
  │  └─ wordpress-mariadb-565c547dc8-59fhr                        1.0            1.0                        
  ├─ beholder03                                              (5%) 6.0       (5%) 6.0        110.0     104.0 
  │  ├─ dnsutils-beholder03                                       1.0            1.0                        
  │  ├─ kube-apiserver-beholder03                                 1.0            1.0                        
  │  ├─ kube-controller-manager-beholder03                        1.0            1.0                        
  │  ├─ kube-proxy-jqcj7                                          1.0            1.0                        
  │  ├─ kube-router-hm6f8                                         1.0            1.0                        
  │  └─ kube-scheduler-beholder03                                 1.0            1.0                        
  ├─ belial                                                  (8%) 9.0       (8%) 9.0        110.0     101.0 
  │  ├─ dnsutils-belial                                           1.0            1.0                        
  │  ├─ gpu-feature-discovery-4w2g5                               1.0            1.0                        
  │  ├─ gpu-pod                                                   1.0            1.0                        
  │  ├─ kube-proxy-nsltd                                          1.0            1.0                        
  │  ├─ kube-router-bbv5t                                         1.0            1.0                        
  │  ├─ mariya-automl-well-being-job-7w56h                        1.0            1.0                        
  │  ├─ mariya-lstm-well-being-job-894nj                          1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-kn6h4                      1.0            1.0                        
  │  └─ ubuntu-gpu1                                               1.0            1.0                        
  ├─ demogorgon                                              (6%) 7.0       (6%) 7.0        110.0     103.0 
  │  ├─ a2v2-triple-gpu-jcsz                                      1.0            1.0                        
  │  ├─ dnsutils-demogorgon                                       1.0            1.0                        
  │  ├─ gpu-demogorgon                                            1.0            1.0                        
  │  ├─ kube-proxy-gknqq                                          1.0            1.0                        
  │  ├─ kube-router-nh4mn                                         1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-wcsbj                      1.0            1.0                        
  │  └─ ulas-bingoel-model-pod-6                                  1.0            1.0                        
  ├─ fierna                                                  (8%) 9.0       (8%) 9.0        110.0     101.0 
  │  ├─ beesbook-days-0-qb4cm                                     1.0            1.0                        
  │  ├─ beesbook-days-1-mwm9s                                     1.0            1.0                        
  │  ├─ dnsutils-fierna                                           1.0            1.0                        
  │  ├─ gpu-feature-discovery-8tz6l                               1.0            1.0                        
  │  ├─ gpu-pod                                                   1.0            1.0                        
  │  ├─ kube-proxy-9tg4f                                          1.0            1.0                        
  │  ├─ kube-router-nndmg                                         1.0            1.0                        
  │  ├─ mi-swarm-pod                                              1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-r5pcj                      1.0            1.0                        
  ├─ kiaransalee                                           (14%) 15.0     (14%) 15.0        110.0      95.0 
  │  ├─ continuous-image-puller-6fs4k                             1.0            1.0                        
  │  ├─ continuous-image-puller-6z8bj                             1.0            1.0                        
  │  ├─ dnsutils-kiaransalee                                      1.0            1.0                        
  │  ├─ gpu-feature-discovery-znz6m                               1.0            1.0                        
  │  ├─ jupyter-aivan-2dau                                        1.0            1.0                        
  │  ├─ jupyter-alebellina412                                     1.0            1.0                        
  │  ├─ jupyter-huygenssteiner                                    1.0            1.0                        
  │  ├─ jupyter-jamannio                                          1.0            1.0                        
  │  ├─ jupyter-kmayer24                                          1.0            1.0                        
  │  ├─ jupyter-matheus-2dstefanini-2dmariano                     1.0            1.0                        
  │  ├─ jupyter-samrauh                                           1.0            1.0                        
  │  ├─ jupyter-saroyehun                                         1.0            1.0                        
  │  ├─ kube-proxy-65675                                          1.0            1.0                        
  │  ├─ kube-router-bw8bz                                         1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-wj2wf                      1.0            1.0                        
  ├─ lolth                                                 (16%) 18.0     (16%) 18.0        110.0      92.0 
  │  ├─ coredns-55cb58b774-cxfmw                                  1.0            1.0                        
  │  ├─ dnsutils-lolth                                            1.0            1.0                        
  │  ├─ echo1-77fbfb54d-8t4dq                                     1.0            1.0                        
  │  ├─ echo1-77fbfb54d-hcrph                                     1.0            1.0                        
  │  ├─ echo2-5d58759df-ssc7q                                     1.0            1.0                        
  │  ├─ gatekeeper-audit-59d4b6fd4c-q7cj8                         1.0            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-qfkmj            1.0            1.0                        
  │  ├─ hub-767b56fc4d-5k8hh                                      1.0            1.0                        
  │  ├─ hub-78d6dd898d-89np9                                      1.0            1.0                        
  │  ├─ hub-85db65cb54-8tktv                                      1.0            1.0                        
  │  ├─ kube-proxy-cl9gc                                          1.0            1.0                        
  │  ├─ kube-router-z6szh                                         1.0            1.0                        
  │  ├─ kube-state-metrics-8945855d-9fqtg                         1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-smgtn                      1.0            1.0                        
  │  ├─ proxy-5bc89cc587-z8q9p                                    1.0            1.0                        
  │  ├─ ubuntu-test-pod                                           1.0            1.0                        
  │  ├─ user-scheduler-5cf5ffbc54-htfdj                           1.0            1.0                        
  │  └─ user-scheduler-c7db6c584-cf297                            1.0            1.0                        
  ├─ mindflayer01                                          (21%) 23.0     (21%) 23.0        110.0      87.0 
  │  ├─ dex-7ddbb96f79-cmc5m                                      1.0            1.0                        
  │  ├─ dex-loginapp-7d9c6ff56-lhj2r                              1.0            1.0                        
  │  ├─ dex-mysql-66b5885f7b-2wdw8                                1.0            1.0                        
  │  ├─ dnsutils-mindflayer01                                     1.0            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-lxl5x            1.0            1.0                        
  │  ├─ kube-proxy-d2xl4                                          1.0            1.0                        
  │  ├─ kube-router-rzzr2                                         1.0            1.0                        
  │  ├─ local-path-provisioner-759479454f-jxl54                   1.0            1.0                        
  │  ├─ login-pod                                                 1.0            1.0                        
  │  ├─ mariadb-849498cb44-dldtm                                  1.0            1.0                        
  │  ├─ memcached-578474d6f9-8dgb8                                1.0            1.0                        
  │  ├─ nginx-frontend-5749b578b9-fjshf                           1.0            1.0                        
  │  ├─ nginx-ip-2023-78b9c84dbf-znhrx                            1.0            1.0                        
  │  ├─ nginx-k8s-7c8d949b5f-grwnx                                1.0            1.0                        
  │  ├─ nginx-rec-2023-79df8c77cd-gg87f                           1.0            1.0                        
  │  ├─ nginx-rsn-2024-7dbc6b668b-jbjk7                           1.0            1.0                        
  │  ├─ nginx-self-service-password-65b44f7547-dhvwp              1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-5gmd7                      1.0            1.0                        
  │  ├─ pdf-b647544f-snvmm                                        1.0            1.0                        
  │  ├─ prometheus-deployment-b65d5d898-t2z9l                     1.0            1.0                        
  │  ├─ registry-7fcf447f4-m5sz9                                  1.0            1.0                        
  │  ├─ registry-auth-7dcc7f6766-m8bsh                            1.0            1.0                        
  │  └─ temp-pod                                                  1.0            1.0                        
  ├─ mindflayer02                                            (7%) 8.0       (7%) 8.0        110.0     102.0 
  │  ├─ cool-pod                                                  1.0            1.0                        
  │  ├─ dnsutils-mindflayer02                                     1.0            1.0                        
  │  ├─ gpu-pod-aalbi                                             1.0            1.0                        
  │  ├─ kube-proxy-zbwk4                                          1.0            1.0                        
  │  ├─ kube-router-k8t5k                                         1.0            1.0                        
  │  ├─ mediawiki-mariadb-5cc4866855-xffs4                        1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-7wv2j                      1.0            1.0                        
  │  └─ phpfpm-nginx-5db96d6895-97f2g                             1.0            1.0                        
  ├─ mindflayer03                                          (16%) 18.0     (16%) 18.0        110.0      92.0 
  │  ├─ check-data-pod                                            1.0            1.0                        
  │  ├─ coredns-55cb58b774-4kpwk                                  1.0            1.0                        
  │  ├─ dnsutils-mindflayer03                                     1.0            1.0                        
  │  ├─ echo1-77fbfb54d-8bnp6                                     1.0            1.0                        
  │  ├─ echo1-77fbfb54d-rcwzd                                     1.0            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-5gv4n            1.0            1.0                        
  │  ├─ kube-proxy-b68xq                                          1.0            1.0                        
  │  ├─ kube-router-ptw9p                                         1.0            1.0                        
  │  ├─ ldap-6987986dbc-2zr2c                                     1.0            1.0                        
  │  ├─ ledavio-text-search                                       1.0            1.0                        
  │  ├─ nginx-ip-2025-5888cccd9-fmnh8                             1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-7k6p6                      1.0            1.0                        
  │  ├─ proxy-5495d795d5-vx2ld                                    1.0            1.0                        
  │  ├─ proxy-7f79cc645f-gj2ld                                    1.0            1.0                        
  │  ├─ system-registry-8877cd57-59pzr                            1.0            1.0                        
  │  ├─ user-registry-64bb8ff7cf-5cktq                            1.0            1.0                        
  │  ├─ user-scheduler-5cf5ffbc54-gkwvq                           1.0            1.0                        
  │  └─ user-scheduler-c7db6c584-4xdvn                            1.0            1.0                        
  ├─ tiamat                                                  (5%) 6.0       (5%) 6.0        110.0     104.0 
  │  ├─ continuous-image-puller-wtkhd                             1.0            1.0                        
  │  ├─ dnsutils-tiamat                                           1.0            1.0                        
  │  ├─ gpu-feature-discovery-5qb95                               1.0            1.0                        
  │  ├─ kube-proxy-nbjvl                                          1.0            1.0                        
  │  ├─ kube-router-6zps8                                         1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-9b4tb                      1.0            1.0                        
  ├─ vecna                                                   (5%) 5.0       (5%) 5.0        110.0     105.0 
  │  ├─ dnsutils-vecna                                            1.0            1.0                        
  │  ├─ felix-petersen-job-20                                     1.0            1.0                        
  │  ├─ kube-proxy-djtx4                                          1.0            1.0                        
  │  ├─ kube-router-f67nm                                         1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-qln9t                      1.0            1.0                        
  └─ zariel                                                  (7%) 8.0       (7%) 8.0        110.0     102.0 
     ├─ dnsutils-zariel                                           1.0            1.0                        
     ├─ julian-welzel-job-1                                       1.0            1.0                        
     ├─ kube-proxy-c8frf                                          1.0            1.0                        
     ├─ kube-router-fww75                                         1.0            1.0                        
     ├─ mariya-cnn-well-being-classifier-job-7kx7t                1.0            1.0                        
     ├─ nvidia-device-plugin-daemonset-djsrl                      1.0            1.0                        
     ├─ oliver-wiedemann-pod                                      1.0            1.0                        
     └─ yolo2-kc67n                                               1.0            1.0                        




Resource usage by namespace

 Resource                       Requested    Limit  Allocatable  Free 
  auth                                                                
  ├─ mindflayer01                                                     
  │  ├─ cpu                        650.0m      0.0                    
  │  └─ pods                          5.0      5.0                    
  └─ mindflayer03                                                     
     ├─ cpu                        250.0m      0.0                    
     └─ pods                          1.0      1.0                    
  frontend                            4.0      4.0                    
  ├─ lolth                            1.0      1.0                    
  │  └─ pods                          1.0      1.0                    
  ├─ mindflayer01                     2.0      2.0                    
  │  └─ pods                          2.0      2.0                    
  └─ mindflayer03                     1.0      1.0                    
     └─ pods                          1.0      1.0                    
  gatekeeper-system                                                   
  ├─ lolth                                                            
  │  ├─ cpu                        200.0m      2.0                    
  │  ├─ memory                    512.0Mi    1.0Gi                    
  │  └─ pods                          2.0      2.0                    
  ├─ mindflayer01                                                     
  │  ├─ cpu                        100.0m      1.0                    
  │  ├─ memory                    256.0Mi  512.0Mi                    
  │  └─ pods                          1.0      1.0                    
  └─ mindflayer03                                                     
     ├─ cpu                        100.0m      1.0                    
     ├─ memory                    256.0Mi  512.0Mi                    
     └─ pods                          1.0      1.0                    
  gcpr-vmv-2022                                                       
  └─ beholder02                                                       
     ├─ cpu                           1.0      2.0                    
     ├─ memory                    512.0Mi    1.0Gi                    
     └─ pods                          2.0      2.0                    
  jupyterhub                                                          
  ├─ kiaransalee                                                      
  │  ├─ memory                       6.4G      0.0                    
  │  ├─ nvidia.com/gpu                6.0      6.0                    
  │  └─ pods                          7.0      7.0                    
  ├─ lolth                            3.0      3.0                    
  │  └─ pods                          3.0      3.0                    
  └─ mindflayer03                     1.0      1.0                    
     └─ pods                          1.0      1.0                    
  jupyterhub-kuckling                 3.0      3.0                    
  ├─ lolth                            1.0      1.0                    
  │  └─ pods                          1.0      1.0                    
  ├─ mindflayer03                     1.0      1.0                    
  │  └─ pods                          1.0      1.0                    
  └─ tiamat                           1.0      1.0                    
     └─ pods                          1.0      1.0                    
  jupyterhub-students                                                 
  ├─ kiaransalee                                                      
  │  ├─ memory                       2.1G      0.0                    
  │  ├─ nvidia.com/mig-1g.10gb        2.0      2.0                    
  │  └─ pods                          3.0      3.0                    
  ├─ lolth                            2.0      2.0                    
  │  └─ pods                          2.0      2.0                    
  └─ mindflayer03                     2.0      2.0                    
     └─ pods                          2.0      2.0                    
  kube-system                                                         
  ├─ asmodeus                                                         
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          4.0      4.0                    
  ├─ beholder01                                                       
  │  ├─ cpu                        900.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          6.0      6.0                    
  ├─ beholder02                                                       
  │  ├─ cpu                        900.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          6.0      6.0                    
  ├─ beholder03                                                       
  │  ├─ cpu                        900.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          6.0      6.0                    
  ├─ belial                                                           
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          5.0      5.0                    
  ├─ demogorgon                                                       
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          4.0      4.0                    
  ├─ fierna                                                           
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          5.0      5.0                    
  ├─ kiaransalee                                                      
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          5.0      5.0                    
  ├─ lolth                                                            
  │  ├─ cpu                        450.0m   100.0m                    
  │  ├─ memory                    420.0Mi  270.0Mi                    
  │  └─ pods                          5.0      5.0                    
  ├─ mindflayer01                                                     
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          4.0      4.0                    
  ├─ mindflayer02                                                     
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          4.0      4.0                    
  ├─ mindflayer03                                                     
  │  ├─ cpu                        450.0m   100.0m                    
  │  ├─ memory                    420.0Mi  270.0Mi                    
  │  └─ pods                          5.0      5.0                    
  ├─ tiamat                                                           
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          5.0      5.0                    
  ├─ vecna                                                            
  │  ├─ cpu                        350.0m   100.0m                    
  │  ├─ memory                    350.0Mi  100.0Mi                    
  │  └─ pods                          4.0      4.0                    
  └─ zariel                                                           
     ├─ cpu                        350.0m   100.0m                    
     ├─ memory                    350.0Mi  100.0Mi                    
     └─ pods                          4.0      4.0                    
  local-path-storage                  1.0      1.0                    
  └─ mindflayer01                     1.0      1.0                    
     └─ pods                          1.0      1.0                    
  monitoring                                                          
  ├─ lolth                            1.0      1.0                    
  │  └─ pods                          1.0      1.0                    
  └─ mindflayer01                                                     
     ├─ cpu                        500.0m      1.0                    
     ├─ memory                     500.0M    1.0Gi                    
     └─ pods                          1.0      1.0                    
  registry                                                            
  └─ mindflayer03                                                     
     ├─ cpu                        200.0m      0.0                    
     └─ pods                          2.0      2.0                    
  testing                             3.0      3.0                    
  ├─ lolth                            2.0      2.0                    
  │  └─ pods                          2.0      2.0                    
  └─ mindflayer03                     1.0      1.0                    
     └─ pods                          1.0      1.0                    
  user-adwait-deshpande                                               
  ├─ mindflayer01                                                     
  │  ├─ cpu                        100.0m      1.0                    
  │  ├─ memory                    100.0Mi    1.0Gi                    
  │  └─ pods                          1.0      1.0                    
  └─ mindflayer03                                                     
     ├─ cpu                        100.0m   500.0m                    
     ├─ memory                    100.0Mi  500.0Mi                    
     └─ pods                          1.0      1.0                    
  user-alex-chan                                                      
  └─ mindflayer02                                                     
     ├─ cpu                          10.0     16.0                    
     ├─ memory                     64.0Gi   64.0Gi                    
     └─ pods                          1.0      1.0                    
  user-angela-albi                                                    
  ├─ mindflayer02                                                     
  │  ├─ cpu                          10.0     10.0                    
  │  ├─ memory                     10.0Gi   10.0Gi                    
  │  └─ pods                          1.0      1.0                    
  └─ zariel                                                           
     ├─ cpu                          10.0     16.0                    
     ├─ memory                     30.0Gi   30.0Gi                    
     ├─ nvidia.com/gpu                1.0      1.0                    
     └─ pods                          1.0      1.0                    
  user-felix-petersen                                                 
  └─ vecna                                                            
     ├─ cpu                          16.0    100.0                    
     ├─ memory                    100.0Gi    1.6Ti                    
     ├─ nvidia.com/gpu               16.0     16.0                    
     └─ pods                          1.0      1.0                    
  user-jacob-davidson                                                 
  ├─ fierna                                                           
  │  ├─ cpu                          24.0     24.0                    
  │  ├─ memory                     64.0Gi  128.0Gi                    
  │  ├─ nvidia.com/gpu                2.0      2.0                    
  │  └─ pods                          2.0      2.0                    
  └─ mindflayer01                                                     
     ├─ cpu                         60.0m      1.0                    
     ├─ memory                    100.0Mi    1.0Gi                    
     └─ pods                          1.0      1.0                    
  user-julian-jandeleit                                               
  ├─ demogorgon                                                       
  │  ├─ cpu                          48.0     48.0                    
  │  ├─ memory                     80.0Gi   80.0Gi                    
  │  ├─ nvidia.com/gpu                1.0      1.0                    
  │  └─ pods                          1.0      1.0                    
  └─ lolth                                                            
     ├─ cpu                        100.0m      1.0                    
     ├─ memory                    100.0Mi    1.0Gi                    
     └─ pods                          1.0      1.0                    
  user-julian-welzel                                                  
  └─ zariel                                                           
     ├─ cpu                          96.0     96.0                    
     ├─ memory                    100.0Gi  400.0Gi                    
     ├─ nvidia.com/gpu                4.0      4.0                    
     └─ pods                          1.0      1.0                    
  user-julian-zimmermann                                              
  ├─ asmodeus                                                         
  │  ├─ cpu                         250.0    250.0                    
  │  ├─ memory                      1.5Ti    1.5Ti                    
  │  ├─ nvidia.com/gpu                4.0      4.0                    
  │  └─ pods                          1.0      1.0                    
  └─ demogorgon                                                       
     ├─ cpu                          16.0     16.0                    
     ├─ memory                      1.0Ti    1.0Ti                    
     ├─ nvidia.com/gpu                3.0      3.0                    
     └─ pods                          1.0      1.0                    
  user-mariya-tykhonchuk                                              
  ├─ belial                                                           
  │  ├─ cpu                           9.0     17.0                    
  │  ├─ memory                     90.0Gi  130.0Gi                    
  │  ├─ nvidia.com/gpu                3.0      3.0                    
  │  └─ pods                          3.0      3.0                    
  └─ zariel                                                           
     ├─ cpu                           4.0      8.0                    
     ├─ memory                     30.0Gi   60.0Gi                    
     ├─ nvidia.com/gpu                1.0      1.0                    
     └─ pods                          1.0      1.0                    
  user-maya-dagher                                                    
  └─ fierna                                                           
     ├─ cpu                           2.0      2.0                    
     ├─ memory                     20.0Gi   20.0Gi                    
     ├─ nvidia.com/gpu                2.0      2.0                    
     └─ pods                          2.0      2.0                    
  user-mike-battistella                                               
  └─ mindflayer03                                                     
     ├─ cpu                           1.0      1.0                    
     ├─ memory                     32.0Gi   32.0Gi                    
     └─ pods                          1.0      1.0                    
  user-oliver-wiedemann                                               
  └─ zariel                                                           
     ├─ cpu                          32.0     64.0                    
     ├─ memory                    400.0Gi  400.0Gi                    
     ├─ nvidia.com/gpu                2.0      2.0                    
     └─ pods                          1.0      1.0                    
  user-segun-aroyehun                                                 
  └─ belial                                                           
     ├─ cpu                          50.0    100.0                    
     ├─ ephemeral-storage         100.0Gi  150.0Gi                    
     ├─ memory                    250.0Gi  400.0Gi                    
     ├─ nvidia.com/gpu                1.0      1.0                    
     └─ pods                          1.0      1.0                    
  user-ulas-bingoel                                                   
  └─ demogorgon                                                       
     ├─ cpu                          24.0     24.0                    
     ├─ memory                     80.0Gi   80.0Gi                    
     ├─ nvidia.com/gpu                2.0      2.0                    
     └─ pods                          1.0      1.0                    
  user-v-time                                                         
  ├─ mindflayer01                                                     
  │  ├─ cpu                        500.0m      1.0                    
  │  ├─ memory                    256.0Mi  512.0Mi                    
  │  └─ pods                          1.0      1.0                    
  └─ mindflayer02                                                     
     ├─ cpu                           1.0      2.0                    
     ├─ memory                    512.0Mi    1.0Gi                    
     └─ pods                          1.0      1.0                    
  web                                                                 
  ├─ mindflayer01                     6.0      6.0                    
  │  └─ pods                          6.0      6.0                    
  ├─ mindflayer02                                                     
  │  ├─ cpu                        250.0m      0.0                    
  │  └─ pods                          1.0      1.0                    
  └─ mindflayer03                     1.0      1.0                    
     └─ pods                          1.0      1.0                    




Ceph file system report

  cluster:
    id:     3fee6f38-ba9f-11ec-9328-e188936dcafd
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum beholder03,beholder01,beholder02,mindflayer02,mindflayer03 (age 4M)
    mgr: beholder01.verxwn(active, since 8M), standbys: beholder03.nprqzk, mindflayer03.rzdvrr, mindflayer01.mkuopd, mindflayer02.ympgrs, beholder02.akktmp
    mds: 4/4 daemons up, 2 standby
    osd: 24 osds: 24 up (since 8M), 24 in (since 15M)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 545 pgs
    objects: 69.20M objects, 97 TiB
    usage:   195 TiB used, 86 TiB / 282 TiB avail
    pgs:     541 active+clean
             3   active+clean+scrubbing+deep
             1   active+clean+scrubbing
 
  io:
    client:   7.4 MiB/s wr, 0 op/s rd, 15 op/s wr
 
HEALTH_OK




Etcd cluster

+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|     ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.1.1:4252 | 7126e7a3a9cc42ca |  3.4.30 |  156 MB |     false |      false |    302053 |  642000778 |          642000778 |        |
| 192.168.1.2:4252 | 39d72894bf6c7600 |  3.4.30 |  156 MB |     false |      false |    302053 |  642000778 |          642000778 |        |
| 192.168.1.3:4252 | bbf4a2b99c3fd692 |  3.4.30 |  156 MB |     false |      false |    302053 |  642000778 |          642000778 |        |
| 192.168.2.1:4252 | 5cb9997dd1c2246b |  3.3.25 |  156 MB |      true |      false |    302053 |  642000778 |                  0 |        |
| 192.168.2.3:4252 |  cbc1cf89959ea4e |  3.3.25 |  156 MB |     false |      false |    302053 |  642000778 |                  0 |        |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+




Detailed network health

API and web servers

beholder01
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-beholder01
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
beholder02
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-beholder02
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
beholder03
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-beholder03
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
lolth
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-lolth
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes




Ceph osd nodes

mindflayer01
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-mindflayer01
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
mindflayer02
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-mindflayer02
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
mindflayer03
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-mindflayer03
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes




Compute nodes

vecna
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-vecna
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
kiaransalee
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-kiaransalee
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
belial
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-belial
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
fierna
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-fierna
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
demogorgon
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-demogorgon
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
tiamat
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-tiamat
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
asmodeus
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-asmodeus
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
zariel
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-zariel
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes




nVidia driver and GPU status

dretch

belial

Thu Sep 18 14:32:51 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro RTX 6000                Off |   00000000:1B:00.0 Off |                  Off |
| 33%   31C    P8              5W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Quadro RTX 6000                Off |   00000000:1C:00.0 Off |                  Off |
| 33%   31C    P8             11W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Quadro RTX 6000                Off |   00000000:1D:00.0 Off |                  Off |
| 33%   33C    P8              7W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Quadro RTX 6000                Off |   00000000:1E:00.0 Off |                  Off |
| 42%   66C    P0            201W /  260W |    7122MiB /  24576MiB |     54%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Quadro RTX 6000                Off |   00000000:3D:00.0 Off |                  Off |
| 33%   31C    P8             12W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Quadro RTX 6000                Off |   00000000:3F:00.0 Off |                  Off |
| 33%   31C    P8              6W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Quadro RTX 6000                Off |   00000000:40:00.0 Off |                  Off |
| 33%   31C    P8              4W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Quadro RTX 6000                Off |   00000000:41:00.0 Off |                  Off |
| 33%   32C    P8              4W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    3   N/A  N/A   2389981      C   python                                       7118MiB |
+-----------------------------------------------------------------------------------------+

fierna

Thu Sep 18 14:32:53 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro RTX 6000                Off |   00000000:1B:00.0 Off |                  Off |
| 33%   48C    P0            101W /  260W |   13073MiB /  24576MiB |     49%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Quadro RTX 6000                Off |   00000000:1C:00.0 Off |                  Off |
| 33%   43C    P0            125W /  260W |   13069MiB /  24576MiB |     98%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Quadro RTX 6000                Off |   00000000:1D:00.0 Off |                  Off |
| 33%   33C    P8              4W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Quadro RTX 6000                Off |   00000000:1E:00.0 Off |                  Off |
| 33%   31C    P8              5W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Quadro RTX 6000                Off |   00000000:3D:00.0 Off |                  Off |
| 33%   30C    P8             15W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Quadro RTX 6000                Off |   00000000:3F:00.0 Off |                  Off |
| 33%   31C    P8              4W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Quadro RTX 6000                Off |   00000000:40:00.0 Off |                  Off |
| 33%   31C    P8             12W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Quadro RTX 6000                Off |   00000000:41:00.0 Off |                  Off |
| 33%   31C    P8             16W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   2862971      C   python                                       4358MiB |
|    0   N/A  N/A   2862972      C   python                                       4356MiB |
|    0   N/A  N/A   2862973      C   python                                       4356MiB |
|    1   N/A  N/A   2862768      C   python                                       4356MiB |
|    1   N/A  N/A   2862769      C   python                                       4356MiB |
|    1   N/A  N/A   2862770      C   python                                       4354MiB |
+-----------------------------------------------------------------------------------------+

tiamat

Thu Sep 18 14:32:55 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:01:00.0 Off |                    0 |
| N/A   27C    P0             50W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          Off |   00000000:41:00.0 Off |                    0 |
| N/A   27C    P0             52W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          Off |   00000000:81:00.0 Off |                    0 |
| N/A   26C    P0             49W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          Off |   00000000:C1:00.0 Off |                    0 |
| N/A   25C    P0             50W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

vecna

Thu Sep 18 16:32:58 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-SXM3-32GB           On  |   00000000:34:00.0 Off |                    0 |
| N/A   33C    P0             48W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla V100-SXM3-32GB           On  |   00000000:36:00.0 Off |                    0 |
| N/A   33C    P0             47W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla V100-SXM3-32GB           On  |   00000000:39:00.0 Off |                    0 |
| N/A   35C    P0             53W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla V100-SXM3-32GB           On  |   00000000:3B:00.0 Off |                    0 |
| N/A   36C    P0             51W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla V100-SXM3-32GB           On  |   00000000:57:00.0 Off |                    0 |
| N/A   33C    P0             48W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla V100-SXM3-32GB           On  |   00000000:59:00.0 Off |                    0 |
| N/A   36C    P0             53W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla V100-SXM3-32GB           On  |   00000000:5C:00.0 Off |                    0 |
| N/A   34C    P0             51W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla V100-SXM3-32GB           On  |   00000000:5E:00.0 Off |                    0 |
| N/A   38C    P0             53W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   8  Tesla V100-SXM3-32GB           On  |   00000000:B7:00.0 Off |                    0 |
| N/A   34C    P0             50W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   9  Tesla V100-SXM3-32GB           On  |   00000000:B9:00.0 Off |                    0 |
| N/A   33C    P0             49W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  10  Tesla V100-SXM3-32GB           On  |   00000000:BC:00.0 Off |                    0 |
| N/A   36C    P0             51W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  11  Tesla V100-SXM3-32GB           On  |   00000000:BE:00.0 Off |                    0 |
| N/A   38C    P0             48W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  12  Tesla V100-SXM3-32GB           On  |   00000000:E0:00.0 Off |                    0 |
| N/A   36C    P0             48W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  13  Tesla V100-SXM3-32GB           On  |   00000000:E2:00.0 Off |                    0 |
| N/A   36C    P0             49W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  14  Tesla V100-SXM3-32GB           On  |   00000000:E5:00.0 Off |                    0 |
| N/A   39C    P0             51W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  15  Tesla V100-SXM3-32GB           On  |   00000000:E7:00.0 Off |                    0 |
| N/A   38C    P0             49W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

asmodeus

Thu Sep 18 14:33:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15              Driver Version: 570.86.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:01:00.0 Off |                    0 |
| N/A   27C    P0             62W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  |   00000000:41:00.0 Off |                    0 |
| N/A   29C    P0             58W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  |   00000000:81:00.0 Off |                    0 |
| N/A   27C    P0             60W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          On  |   00000000:C1:00.0 Off |                    0 |
| N/A   26C    P0             60W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

zariel

Thu Sep 18 16:33:02 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.03             Driver Version: 535.216.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:07:00.0 Off |                    0 |
| N/A   31C    P0              55W / 400W |      3MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  | 00000000:0F:00.0 Off |                    0 |
| N/A   30C    P0              52W / 400W |      3MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  | 00000000:47:00.0 Off |                    0 |
| N/A   28C    P0              51W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  | 00000000:4E:00.0 Off |                    0 |
| N/A   28C    P0              52W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  | 00000000:87:00.0 Off |                    0 |
| N/A   35C    P0              57W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  | 00000000:90:00.0 Off |                    0 |
| N/A   32C    P0              51W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  | 00000000:B7:00.0 Off |                    0 |
| N/A   32C    P0              51W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  | 00000000:BD:00.0 Off |                    0 |
| N/A   48C    P0              86W / 400W |  11745MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    7   N/A  N/A    880308      C   python                                    11736MiB |
+---------------------------------------------------------------------------------------+

demogorgon

Thu Sep 18 14:33:06 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                     On  |   00000000:01:00.0 Off |                    0 |
|  0%   43C    P0             91W /  300W |    3417MiB /  46068MiB |     19%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A40                     On  |   00000000:25:00.0 Off |                    0 |
|  0%   29C    P8             24W /  300W |       4MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A40                     On  |   00000000:41:00.0 Off |                    0 |
|  0%   30C    P8             31W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A40                     On  |   00000000:61:00.0 Off |                    0 |
|  0%   28C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A40                     On  |   00000000:81:00.0 Off |                    0 |
|  0%   30C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A40                     On  |   00000000:A1:00.0 Off |                    0 |
|  0%   29C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A40                     On  |   00000000:C1:00.0 Off |                    0 |
|  0%   31C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A40                     On  |   00000000:E1:00.0 Off |                    0 |
|  0%   31C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   3982942      C   python                                       3408MiB |
+-----------------------------------------------------------------------------------------+

kiaransalee

Thu Sep 18 14:33:09 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08             Driver Version: 550.127.08     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:26:00.0 Off |                    0 |
| N/A   60C    P0            635W /  700W |   66514MiB /  81559MiB |     86%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          On  |   00000000:2F:00.0 Off |                    0 |
| N/A   50C    P0            282W /  700W |   22237MiB /  81559MiB |    100%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          On  |   00000000:46:00.0 Off |                    0 |
| N/A   56C    P0            401W /  700W |   29810MiB /  81559MiB |     83%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          On  |   00000000:54:00.0 Off |                    0 |
| N/A   65C    P0            655W /  700W |   65570MiB /  81559MiB |     91%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          On  |   00000000:A6:00.0 Off |                    0 |
| N/A   68C    P0            351W /  700W |   10504MiB /  81559MiB |     83%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          On  |   00000000:AF:00.0 Off |                    0 |
| N/A   30C    P0             75W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          On  |   00000000:C6:00.0 Off |                   On |
| N/A   31C    P0             76W /  700W |      89MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          On  |   00000000:CF:00.0 Off |                   On |
| N/A   29C    P0            122W /  700W |    1138MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  6    1   0   0  |              51MiB / 40320MiB    | 64      0 |  4   0    4    0    4 |
|                  |                 0MiB / 65535MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  6    2   0   1  |              38MiB / 40320MiB    | 60      0 |  3   0    3    0    3 |
|                  |                 0MiB / 65535MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7    7   0   0  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7    8   0   1  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7    9   0   2  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   11   0   3  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   12   0   4  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   13   0   5  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   14   0   6  |            1061MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 6MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    983592      C   VLLM::EngineCore                            66504MiB |
|    1   N/A  N/A   3833152      C   python                                       5846MiB |
|    1   N/A  N/A   3836635      C   python                                      16376MiB |
|    2   N/A  N/A    816691      C   python                                      29800MiB |
|    3   N/A  N/A    913477      C   /opt/conda/bin/python                       65560MiB |
|    4   N/A  N/A    975701      C   ...f-mars-experiment/ollama/bin/ollama      10494MiB |
|    7   14    0     888634      C   ...nvironments/smda-project/bin/python        398MiB |
|    7   14    0     952670      C   ...nvironments/smda-project/bin/python        384MiB |
|    7   14    0     979314      C   ...nvironments/smda-project/bin/python        248MiB |
+-----------------------------------------------------------------------------------------+