Cluster status Fri, 10 Jan 2025 07:42:01 +0000 report from beholder02

Resource usage (overall)
Resource usage (by namespace)
Ceph file system status
Etcd cluster status
Detailed network health report
nVidia Driver and GPU reports
  Imp
  Dretch
  Belial
  Fierna
  Tiamat
  Vecna
  Asmodeus
  Zariel
  Demogorgon

Resource usage (overall)

 Resource                                                  Requested          Limit  Allocatable      Free 
  cpu                                                    (14%) 223.8    (22%) 363.5         1.6k      1.3k 
  ├─ asmodeus                                            (0%) 350.0m    (0%) 100.0m        256.0     255.7 
  │  ├─ dnsutils-asmodeus                                     100.0m         100.0m                        
  │  └─ kube-router-jzz5s                                     250.0m            0.0                        
  ├─ beholder01                                          (4%) 900.0m    (0%) 100.0m         24.0      23.1 
  │  ├─ dnsutils-beholder01                                   100.0m         100.0m                        
  │  ├─ kube-apiserver-beholder01                             250.0m            0.0                        
  │  ├─ kube-controller-manager-beholder01                    200.0m            0.0                        
  │  ├─ kube-router-9skrf                                     250.0m            0.0                        
  │  └─ kube-scheduler-beholder01                             100.0m            0.0                        
  ├─ beholder02                                             (8%) 1.9       (9%) 2.1         24.0      21.9 
  │  ├─ dnsutils-beholder02                                   100.0m         100.0m                        
  │  ├─ kube-apiserver-beholder02                             250.0m            0.0                        
  │  ├─ kube-controller-manager-beholder02                    200.0m            0.0                        
  │  ├─ kube-router-bqwwt                                     250.0m            0.0                        
  │  ├─ kube-scheduler-beholder02                             100.0m            0.0                        
  │  ├─ wordpress-8c8944c8d-cxbrq                             500.0m            1.0                        
  │  └─ wordpress-mariadb-565c547dc8-59fhr                    500.0m            1.0                        
  ├─ beholder03                                          (4%) 900.0m    (0%) 100.0m         24.0      23.1 
  │  ├─ dnsutils-beholder03                                   100.0m         100.0m                        
  │  ├─ kube-apiserver-beholder03                             250.0m            0.0                        
  │  ├─ kube-controller-manager-beholder03                    200.0m            0.0                        
  │  ├─ kube-router-vv4gb                                     250.0m            0.0                        
  │  └─ kube-scheduler-beholder03                             100.0m            0.0                        
  ├─ belial                                              (0%) 350.0m    (0%) 100.0m         80.0      79.7 
  │  ├─ dnsutils-belial                                       100.0m         100.0m                        
  │  └─ kube-router-2whgq                                     250.0m            0.0                        
  ├─ demogorgon                                          (0%) 350.0m    (0%) 100.0m         96.0      95.7 
  │  ├─ dnsutils-demogorgon                                   100.0m         100.0m                        
  │  └─ kube-router-69fp9                                     250.0m            0.0                        
  ├─ fierna                                              (0%) 350.0m    (0%) 100.0m         80.0      79.7 
  │  ├─ dnsutils-fierna                                       100.0m         100.0m                        
  │  └─ kube-router-9mwhj                                     250.0m            0.0                        
  ├─ kiaransalee                                          (47%) 90.3    (73%) 140.1        192.0      51.9 
  │  ├─ bash-pod                                                 8.0            8.0                        
  │  ├─ dnsutils-kiaransalee                                  100.0m         100.0m                        
  │  ├─ kube-router-jcv4r                                     250.0m            0.0                        
  │  ├─ tgi70b-pod-565b6cc77c-d4v64                             32.0           32.0                        
  │  └─ ubuntu-gpu1                                             50.0          100.0                        
  ├─ lolth                                               (1%) 450.0m       (3%) 1.1         40.0      38.9 
  │  ├─ dnsutils-lolth                                        100.0m         100.0m                        
  │  ├─ gatekeeper-controller-manager-66f474f785-czcr2        100.0m            1.0                        
  │  └─ kube-router-l46ld                                     250.0m            0.0                        
  ├─ mindflayer01                                           (2%) 1.4       (2%) 1.1         64.0      62.6 
  │  ├─ dex-65cb7757f7-hwlbt                                  250.0m            0.0                        
  │  ├─ dex-loginapp-fd64f94fc-v4sh9                          100.0m            0.0                        
  │  ├─ dnsutils-mindflayer01                                 100.0m         100.0m                        
  │  ├─ kube-router-nlrkl                                     250.0m            0.0                        
  │  ├─ mariadb-849498cb44-dldtm                              500.0m            1.0                        
  │  ├─ system-registry-8877cd57-gt8rg                        100.0m            0.0                        
  │  └─ user-registry-64bb8ff7cf-rvzbz                        100.0m            0.0                        
  ├─ mindflayer02                                         (19%) 11.9     (28%) 18.1         64.0      45.9 
  │  ├─ cool-pod                                                10.0           16.0                        
  │  ├─ dex-mysql-66b5885f7b-5h8hw                            100.0m            0.0                        
  │  ├─ dnsutils-mindflayer02                                 100.0m         100.0m                        
  │  ├─ kube-router-g446k                                     250.0m            0.0                        
  │  ├─ mediawiki-mariadb-5cc4866855-xffs4                    250.0m            0.0                        
  │  ├─ phpfpm-nginx-5db96d6895-97f2g                            1.0            2.0                        
  │  ├─ registry-65974f8f46-lbkhc                             100.0m            0.0                        
  │  └─ registry-auth-5c65dbf9dd-82qq7                        100.0m            0.0                        
  ├─ mindflayer03                                        (1%) 800.0m       (2%) 1.1         64.0      62.9 
  │  ├─ coredns-55cb58b774-4kpwk                              100.0m            0.0                        
  │  ├─ dnsutils-mindflayer03                                 100.0m         100.0m                        
  │  ├─ gatekeeper-controller-manager-66f474f785-5gv4n        100.0m            1.0                        
  │  ├─ kube-router-fjphj                                     250.0m            0.0                        
  │  └─ ldap-6987986dbc-2zr2c                                 250.0m            0.0                        
  ├─ tiamat                                                 (0%) 1.1       (1%) 3.1        256.0     252.9 
  │  ├─ coredns-55cb58b774-tt74d                              100.0m            0.0                        
  │  ├─ dnsutils-tiamat                                       100.0m         100.0m                        
  │  ├─ gatekeeper-audit-59d4b6fd4c-4qvjn                     100.0m            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-ndtdl        100.0m            1.0                        
  │  ├─ kube-router-bxj9w                                     250.0m            0.0                        
  │  └─ prometheus-deployment-b65d5d898-nfrw9                 500.0m            1.0                        
  ├─ vecna                                                (17%) 16.4   (104%) 100.1         96.0       0.0 
  │  ├─ dnsutils-vecna                                        100.0m         100.0m                        
  │  ├─ felix-petersen-job-20                                   16.0          100.0                        
  │  └─ kube-router-nzkdb                                     250.0m            0.0                        
  └─ zariel                                               (38%) 96.3     (38%) 96.1        256.0     159.7 
     ├─ dnsutils-zariel                                       100.0m         100.0m                        
     ├─ julian-welzel-job-1                                     96.0           96.0                        
     └─ kube-router-2crh9                                     250.0m            0.0                        
  ephemeral-storage                                         (0%) 0.0       (0%) 0.0        11.9T     11.9T 
  ├─ asmodeus                                               (0%) 0.0       (0%) 0.0        94.6G     94.6G 
  ├─ beholder01                                             (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ beholder02                                             (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ beholder03                                             (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ belial                                                 (0%) 0.0       (0%) 0.0       189.2G    189.2G 
  ├─ demogorgon                                             (0%) 0.0       (0%) 0.0       706.7G    706.7G 
  ├─ fierna                                                 (0%) 0.0       (0%) 0.0       189.2G    189.2G 
  ├─ kiaransalee                                            (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  ├─ lolth                                                  (0%) 0.0       (0%) 0.0       530.5G    530.5G 
  ├─ mindflayer01                                           (0%) 0.0       (0%) 0.0       211.5G    211.5G 
  ├─ mindflayer02                                           (0%) 0.0       (0%) 0.0       211.5G    211.5G 
  ├─ mindflayer03                                           (0%) 0.0       (0%) 0.0       211.5G    211.5G 
  ├─ tiamat                                                 (0%) 0.0       (0%) 0.0       164.4G    164.4G 
  ├─ vecna                                                  (0%) 0.0       (0%) 0.0       849.0G    849.0G 
  └─ zariel                                                 (0%) 0.0       (0%) 0.0         1.7T      1.7T 
  memory                                                 (6%) 836.4G    (21%) 2.7Ti       12.9Ti    10.2Ti 
  ├─ asmodeus                                           (0%) 350.0Mi   (0%) 100.0Mi        2.0Ti     2.0Ti 
  │  ├─ dnsutils-asmodeus                                    100.0Mi        100.0Mi                        
  │  └─ kube-router-jzz5s                                    250.0Mi            0.0                        
  ├─ beholder01                                         (0%) 350.0Mi   (0%) 100.0Mi       92.9Gi    92.5Gi 
  │  ├─ dnsutils-beholder01                                  100.0Mi        100.0Mi                        
  │  └─ kube-router-9skrf                                    250.0Mi            0.0                        
  ├─ beholder02                                         (1%) 862.0Mi     (1%) 1.1Gi       92.9Gi    91.8Gi 
  │  ├─ dnsutils-beholder02                                  100.0Mi        100.0Mi                        
  │  ├─ kube-router-bqwwt                                    250.0Mi            0.0                        
  │  ├─ wordpress-8c8944c8d-cxbrq                            256.0Mi        512.0Mi                        
  │  └─ wordpress-mariadb-565c547dc8-59fhr                   256.0Mi        512.0Mi                        
  ├─ beholder03                                         (0%) 350.0Mi   (0%) 100.0Mi       92.9Gi    92.5Gi 
  │  ├─ dnsutils-beholder03                                  100.0Mi        100.0Mi                        
  │  └─ kube-router-vv4gb                                    250.0Mi            0.0                        
  ├─ belial                                             (0%) 350.0Mi   (0%) 100.0Mi      754.4Gi   754.1Gi 
  │  ├─ dnsutils-belial                                      100.0Mi        100.0Mi                        
  │  └─ kube-router-2whgq                                    250.0Mi            0.0                        
  ├─ demogorgon                                         (0%) 350.0Mi   (0%) 100.0Mi        2.0Ti     2.0Ti 
  │  ├─ dnsutils-demogorgon                                  100.0Mi        100.0Mi                        
  │  └─ kube-router-69fp9                                    250.0Mi            0.0                        
  ├─ fierna                                             (0%) 350.0Mi   (0%) 100.0Mi      754.4Gi   754.1Gi 
  │  ├─ dnsutils-fierna                                      100.0Mi        100.0Mi                        
  │  └─ kube-router-9mwhj                                    250.0Mi            0.0                        
  ├─ kiaransalee                                        (34%) 544.8G  (43%) 656.1Gi        1.5Ti   855.3Gi 
  │  ├─ bash-pod                                              16.0Gi         16.0Gi                        
  │  ├─ dnsutils-kiaransalee                                 100.0Mi        100.0Mi                        
  │  ├─ jupyter-elenasolar                                      1.1G            0.0                        
  │  ├─ kube-router-jcv4r                                    250.0Mi            0.0                        
  │  ├─ tgi70b-pod-565b6cc77c-d4v64                          240.0Gi        240.0Gi                        
  │  └─ ubuntu-gpu1                                          250.0Gi        400.0Gi                        
  ├─ lolth                                              (0%) 606.0Mi   (0%) 612.0Mi      251.4Gi   250.8Gi 
  │  ├─ dnsutils-lolth                                       100.0Mi        100.0Mi                        
  │  ├─ gatekeeper-controller-manager-66f474f785-czcr2       256.0Mi        512.0Mi                        
  │  └─ kube-router-l46ld                                    250.0Mi            0.0                        
  ├─ mindflayer01                                       (0%) 606.0Mi   (0%) 612.0Mi      376.5Gi   375.9Gi 
  │  ├─ dnsutils-mindflayer01                                100.0Mi        100.0Mi                        
  │  ├─ kube-router-nlrkl                                    250.0Mi            0.0                        
  │  └─ mariadb-849498cb44-dldtm                             256.0Mi        512.0Mi                        
  ├─ mindflayer02                                       (17%) 64.8Gi   (17%) 65.1Gi      376.5Gi   311.4Gi 
  │  ├─ cool-pod                                              64.0Gi         64.0Gi                        
  │  ├─ dnsutils-mindflayer02                                100.0Mi        100.0Mi                        
  │  ├─ kube-router-g446k                                    250.0Mi            0.0                        
  │  └─ phpfpm-nginx-5db96d6895-97f2g                        512.0Mi          1.0Gi                        
  ├─ mindflayer03                                       (0%) 676.0Mi   (0%) 782.0Mi      376.5Gi   375.7Gi 
  │  ├─ coredns-55cb58b774-4kpwk                              70.0Mi        170.0Mi                        
  │  ├─ dnsutils-mindflayer03                                100.0Mi        100.0Mi                        
  │  ├─ gatekeeper-controller-manager-66f474f785-5gv4n       256.0Mi        512.0Mi                        
  │  └─ kube-router-fjphj                                    250.0Mi            0.0                        
  ├─ tiamat                                                (0%) 1.5G     (0%) 2.3Gi     1007.6Gi  1005.3Gi 
  │  ├─ coredns-55cb58b774-tt74d                              70.0Mi        170.0Mi                        
  │  ├─ dnsutils-tiamat                                      100.0Mi        100.0Mi                        
  │  ├─ gatekeeper-audit-59d4b6fd4c-4qvjn                    256.0Mi        512.0Mi                        
  │  ├─ gatekeeper-controller-manager-66f474f785-ndtdl       256.0Mi        512.0Mi                        
  │  ├─ kube-router-bxj9w                                    250.0Mi            0.0                        
  │  └─ prometheus-deployment-b65d5d898-nfrw9                 500.0M          1.0Gi                        
  ├─ vecna                                              (7%) 100.3Gi   (106%) 1.6Ti        1.5Ti       0.0 
  │  ├─ dnsutils-vecna                                       100.0Mi        100.0Mi                        
  │  ├─ felix-petersen-job-20                                100.0Gi          1.6Ti                        
  │  └─ kube-router-nzkdb                                    250.0Mi            0.0                        
  └─ zariel                                             (5%) 100.3Gi  (20%) 400.1Gi        2.0Ti     1.6Ti 
     ├─ dnsutils-zariel                                      100.0Mi        100.0Mi                        
     ├─ julian-welzel-job-1                                  100.0Gi        400.0Gi                        
     └─ kube-router-2crh9                                    250.0Mi            0.0                        
  nvidia.com/gpu                                          (41%) 26.0     (41%) 26.0         63.0      37.0 
  ├─ asmodeus                                               (0%) 0.0       (0%) 0.0          4.0       4.0 
  ├─ belial                                                 (0%) 0.0       (0%) 0.0          8.0       8.0 
  ├─ demogorgon                                             (0%) 0.0       (0%) 0.0          8.0       8.0 
  ├─ fierna                                                 (0%) 0.0       (0%) 0.0          8.0       8.0 
  ├─ kiaransalee                                           (86%) 6.0      (86%) 6.0          7.0       1.0 
  │  ├─ jupyter-elenasolar                                       1.0            1.0                        
  │  ├─ tgi70b-pod-565b6cc77c-d4v64                              4.0            4.0                        
  │  └─ ubuntu-gpu1                                              1.0            1.0                        
  ├─ tiamat                                                 (0%) 0.0       (0%) 0.0          4.0       4.0 
  ├─ vecna                                               (100%) 16.0    (100%) 16.0         16.0       0.0 
  │  └─ felix-petersen-job-20                                   16.0           16.0                        
  └─ zariel                                                (50%) 4.0      (50%) 4.0          8.0       4.0 
     └─ julian-welzel-job-1                                      4.0            4.0                        
  nvidia.com/mig-1g.10gb                                    (0%) 0.0       (0%) 0.0          7.0       7.0 
  └─ kiaransalee                                            (0%) 0.0       (0%) 0.0          7.0       7.0 
  pods                                                    (7%) 117.0     (7%) 117.0         1.6k      1.5k 
  ├─ asmodeus                                               (4%) 4.0       (4%) 4.0        110.0     106.0 
  │  ├─ dnsutils-asmodeus                                        1.0            1.0                        
  │  ├─ kube-proxy-rbzdb                                         1.0            1.0                        
  │  ├─ kube-router-jzz5s                                        1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-qlq9x                     1.0            1.0                        
  ├─ beholder01                                             (5%) 6.0       (5%) 6.0        110.0     104.0 
  │  ├─ dnsutils-beholder01                                      1.0            1.0                        
  │  ├─ kube-apiserver-beholder01                                1.0            1.0                        
  │  ├─ kube-controller-manager-beholder01                       1.0            1.0                        
  │  ├─ kube-proxy-7f5v4                                         1.0            1.0                        
  │  ├─ kube-router-9skrf                                        1.0            1.0                        
  │  └─ kube-scheduler-beholder01                                1.0            1.0                        
  ├─ beholder02                                             (7%) 8.0       (7%) 8.0        110.0     102.0 
  │  ├─ dnsutils-beholder02                                      1.0            1.0                        
  │  ├─ kube-apiserver-beholder02                                1.0            1.0                        
  │  ├─ kube-controller-manager-beholder02                       1.0            1.0                        
  │  ├─ kube-proxy-dlc5n                                         1.0            1.0                        
  │  ├─ kube-router-bqwwt                                        1.0            1.0                        
  │  ├─ kube-scheduler-beholder02                                1.0            1.0                        
  │  ├─ wordpress-8c8944c8d-cxbrq                                1.0            1.0                        
  │  └─ wordpress-mariadb-565c547dc8-59fhr                       1.0            1.0                        
  ├─ beholder03                                             (5%) 6.0       (5%) 6.0        110.0     104.0 
  │  ├─ dnsutils-beholder03                                      1.0            1.0                        
  │  ├─ kube-apiserver-beholder03                                1.0            1.0                        
  │  ├─ kube-controller-manager-beholder03                       1.0            1.0                        
  │  ├─ kube-proxy-jqcj7                                         1.0            1.0                        
  │  ├─ kube-router-vv4gb                                        1.0            1.0                        
  │  └─ kube-scheduler-beholder03                                1.0            1.0                        
  ├─ belial                                                 (5%) 5.0       (5%) 5.0        110.0     105.0 
  │  ├─ dnsutils-belial                                          1.0            1.0                        
  │  ├─ gpu-feature-discovery-5ft45                              1.0            1.0                        
  │  ├─ kube-proxy-nsltd                                         1.0            1.0                        
  │  ├─ kube-router-2whgq                                        1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-sc992                     1.0            1.0                        
  ├─ demogorgon                                             (4%) 4.0       (4%) 4.0        110.0     106.0 
  │  ├─ dnsutils-demogorgon                                      1.0            1.0                        
  │  ├─ kube-proxy-gknqq                                         1.0            1.0                        
  │  ├─ kube-router-69fp9                                        1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-7w6d2                     1.0            1.0                        
  ├─ fierna                                                 (5%) 5.0       (5%) 5.0        110.0     105.0 
  │  ├─ dnsutils-fierna                                          1.0            1.0                        
  │  ├─ gpu-feature-discovery-dkzfn                              1.0            1.0                        
  │  ├─ kube-proxy-9tg4f                                         1.0            1.0                        
  │  ├─ kube-router-9mwhj                                        1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-tvr8l                     1.0            1.0                        
  ├─ kiaransalee                                           (9%) 10.0      (9%) 10.0        110.0     100.0 
  │  ├─ bash-pod                                                 1.0            1.0                        
  │  ├─ continuous-image-puller-sb47g                            1.0            1.0                        
  │  ├─ dnsutils-kiaransalee                                     1.0            1.0                        
  │  ├─ gpu-feature-discovery-2dqmt                              1.0            1.0                        
  │  ├─ jupyter-elenasolar                                       1.0            1.0                        
  │  ├─ kube-proxy-65675                                         1.0            1.0                        
  │  ├─ kube-router-jcv4r                                        1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-tmgl5                     1.0            1.0                        
  │  ├─ tgi70b-pod-565b6cc77c-d4v64                              1.0            1.0                        
  │  └─ ubuntu-gpu1                                              1.0            1.0                        
  ├─ lolth                                                  (8%) 9.0       (8%) 9.0        110.0     101.0 
  │  ├─ dnsutils-lolth                                           1.0            1.0                        
  │  ├─ echo1-77fbfb54d-hdspr                                    1.0            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-czcr2           1.0            1.0                        
  │  ├─ hub-57d6577bf9-5z5md                                     1.0            1.0                        
  │  ├─ kube-proxy-cl9gc                                         1.0            1.0                        
  │  ├─ kube-router-l46ld                                        1.0            1.0                        
  │  ├─ local-path-provisioner-759479454f-8kh6v                  1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-xqrc5                     1.0            1.0                        
  │  └─ user-scheduler-c7db6c584-kcb95                           1.0            1.0                        
  ├─ mindflayer01                                         (15%) 17.0     (15%) 17.0        110.0      93.0 
  │  ├─ dex-65cb7757f7-hwlbt                                     1.0            1.0                        
  │  ├─ dex-loginapp-fd64f94fc-v4sh9                             1.0            1.0                        
  │  ├─ dnsutils-mindflayer01                                    1.0            1.0                        
  │  ├─ kube-proxy-d2xl4                                         1.0            1.0                        
  │  ├─ kube-router-nlrkl                                        1.0            1.0                        
  │  ├─ mariadb-849498cb44-dldtm                                 1.0            1.0                        
  │  ├─ memcached-578474d6f9-8dgb8                               1.0            1.0                        
  │  ├─ nginx-frontend-bc5b65957-s2gc2                           1.0            1.0                        
  │  ├─ nginx-ip-2023-78b9c84dbf-znhrx                           1.0            1.0                        
  │  ├─ nginx-k8s-7c8d949b5f-grwnx                               1.0            1.0                        
  │  ├─ nginx-rec-2023-79df8c77cd-gg87f                          1.0            1.0                        
  │  ├─ nginx-rsn-2024-7dbc6b668b-jbjk7                          1.0            1.0                        
  │  ├─ nginx-self-service-password-65b44f7547-dhvwp             1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-c5mbb                     1.0            1.0                        
  │  ├─ pdf-b647544f-snvmm                                       1.0            1.0                        
  │  ├─ system-registry-8877cd57-gt8rg                           1.0            1.0                        
  │  └─ user-registry-64bb8ff7cf-rvzbz                           1.0            1.0                        
  ├─ mindflayer02                                          (9%) 10.0      (9%) 10.0        110.0     100.0 
  │  ├─ cool-pod                                                 1.0            1.0                        
  │  ├─ dex-mysql-66b5885f7b-5h8hw                               1.0            1.0                        
  │  ├─ dnsutils-mindflayer02                                    1.0            1.0                        
  │  ├─ kube-proxy-zbwk4                                         1.0            1.0                        
  │  ├─ kube-router-g446k                                        1.0            1.0                        
  │  ├─ mediawiki-mariadb-5cc4866855-xffs4                       1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-l6hx2                     1.0            1.0                        
  │  ├─ phpfpm-nginx-5db96d6895-97f2g                            1.0            1.0                        
  │  ├─ registry-65974f8f46-lbkhc                                1.0            1.0                        
  │  └─ registry-auth-5c65dbf9dd-82qq7                           1.0            1.0                        
  ├─ mindflayer03                                           (7%) 8.0       (7%) 8.0        110.0     102.0 
  │  ├─ coredns-55cb58b774-4kpwk                                 1.0            1.0                        
  │  ├─ dnsutils-mindflayer03                                    1.0            1.0                        
  │  ├─ echo1-77fbfb54d-rcwzd                                    1.0            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-5gv4n           1.0            1.0                        
  │  ├─ kube-proxy-b68xq                                         1.0            1.0                        
  │  ├─ kube-router-fjphj                                        1.0            1.0                        
  │  ├─ ldap-6987986dbc-2zr2c                                    1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-fnkln                     1.0            1.0                        
  ├─ tiamat                                               (14%) 15.0     (14%) 15.0        110.0      95.0 
  │  ├─ coredns-55cb58b774-tt74d                                 1.0            1.0                        
  │  ├─ dnsutils-tiamat                                          1.0            1.0                        
  │  ├─ echo1-77fbfb54d-67t9q                                    1.0            1.0                        
  │  ├─ echo1-77fbfb54d-t79hv                                    1.0            1.0                        
  │  ├─ echo2-5d58759df-2zc5c                                    1.0            1.0                        
  │  ├─ gatekeeper-audit-59d4b6fd4c-4qvjn                        1.0            1.0                        
  │  ├─ gatekeeper-controller-manager-66f474f785-ndtdl           1.0            1.0                        
  │  ├─ gpu-feature-discovery-kbcdl                              1.0            1.0                        
  │  ├─ kube-proxy-nbjvl                                         1.0            1.0                        
  │  ├─ kube-router-bxj9w                                        1.0            1.0                        
  │  ├─ kube-state-metrics-8945855d-hn5v2                        1.0            1.0                        
  │  ├─ nvidia-device-plugin-daemonset-8r88p                     1.0            1.0                        
  │  ├─ prometheus-deployment-b65d5d898-nfrw9                    1.0            1.0                        
  │  ├─ proxy-5bc89cc587-jvkxw                                   1.0            1.0                        
  │  └─ user-scheduler-c7db6c584-k2cgj                           1.0            1.0                        
  ├─ vecna                                                  (5%) 5.0       (5%) 5.0        110.0     105.0 
  │  ├─ dnsutils-vecna                                           1.0            1.0                        
  │  ├─ felix-petersen-job-20                                    1.0            1.0                        
  │  ├─ kube-proxy-djtx4                                         1.0            1.0                        
  │  ├─ kube-router-nzkdb                                        1.0            1.0                        
  │  └─ nvidia-device-plugin-daemonset-8zv6j                     1.0            1.0                        
  └─ zariel                                                 (5%) 5.0       (5%) 5.0        110.0     105.0 
     ├─ dnsutils-zariel                                          1.0            1.0                        
     ├─ julian-welzel-job-1                                      1.0            1.0                        
     ├─ kube-proxy-c8frf                                         1.0            1.0                        
     ├─ kube-router-2crh9                                        1.0            1.0                        
     └─ nvidia-device-plugin-daemonset-sz94v                     1.0            1.0                        




Resource usage by namespace

 Resource                Requested    Limit  Allocatable  Free 
  auth                                                         
  ├─ mindflayer01                                              
  │  ├─ cpu                 350.0m      0.0                    
  │  └─ pods                   2.0      2.0                    
  ├─ mindflayer02                                              
  │  ├─ cpu                 300.0m      0.0                    
  │  └─ pods                   3.0      3.0                    
  └─ mindflayer03                                              
     ├─ cpu                 250.0m      0.0                    
     └─ pods                   1.0      1.0                    
  frontend                     4.0      4.0                    
  ├─ lolth                     1.0      1.0                    
  │  └─ pods                   1.0      1.0                    
  ├─ mindflayer01              2.0      2.0                    
  │  └─ pods                   2.0      2.0                    
  └─ tiamat                    1.0      1.0                    
     └─ pods                   1.0      1.0                    
  gatekeeper-system                                            
  ├─ lolth                                                     
  │  ├─ cpu                 100.0m      1.0                    
  │  ├─ memory             256.0Mi  512.0Mi                    
  │  └─ pods                   1.0      1.0                    
  ├─ mindflayer03                                              
  │  ├─ cpu                 100.0m      1.0                    
  │  ├─ memory             256.0Mi  512.0Mi                    
  │  └─ pods                   1.0      1.0                    
  └─ tiamat                                                    
     ├─ cpu                 200.0m      2.0                    
     ├─ memory             512.0Mi    1.0Gi                    
     └─ pods                   2.0      2.0                    
  gcpr-vmv-2022                                                
  └─ beholder02                                                
     ├─ cpu                    1.0      2.0                    
     ├─ memory             512.0Mi    1.0Gi                    
     └─ pods                   2.0      2.0                    
  jupyterhub                                                   
  ├─ kiaransalee                                               
  │  ├─ memory                1.1G      0.0                    
  │  ├─ nvidia.com/gpu         1.0      1.0                    
  │  └─ pods                   2.0      2.0                    
  ├─ lolth                     2.0      2.0                    
  │  └─ pods                   2.0      2.0                    
  └─ tiamat                    2.0      2.0                    
     └─ pods                   2.0      2.0                    
  kube-system                                                  
  ├─ asmodeus                                                  
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   4.0      4.0                    
  ├─ beholder01                                                
  │  ├─ cpu                 900.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   6.0      6.0                    
  ├─ beholder02                                                
  │  ├─ cpu                 900.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   6.0      6.0                    
  ├─ beholder03                                                
  │  ├─ cpu                 900.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   6.0      6.0                    
  ├─ belial                                                    
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   5.0      5.0                    
  ├─ demogorgon                                                
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   4.0      4.0                    
  ├─ fierna                                                    
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   5.0      5.0                    
  ├─ kiaransalee                                               
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   5.0      5.0                    
  ├─ lolth                                                     
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   4.0      4.0                    
  ├─ mindflayer01                                              
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   4.0      4.0                    
  ├─ mindflayer02                                              
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   4.0      4.0                    
  ├─ mindflayer03                                              
  │  ├─ cpu                 450.0m   100.0m                    
  │  ├─ memory             420.0Mi  270.0Mi                    
  │  └─ pods                   5.0      5.0                    
  ├─ tiamat                                                    
  │  ├─ cpu                 450.0m   100.0m                    
  │  ├─ memory             420.0Mi  270.0Mi                    
  │  └─ pods                   6.0      6.0                    
  ├─ vecna                                                     
  │  ├─ cpu                 350.0m   100.0m                    
  │  ├─ memory             350.0Mi  100.0Mi                    
  │  └─ pods                   4.0      4.0                    
  └─ zariel                                                    
     ├─ cpu                 350.0m   100.0m                    
     ├─ memory             350.0Mi  100.0Mi                    
     └─ pods                   4.0      4.0                    
  local-path-storage           1.0      1.0                    
  └─ lolth                     1.0      1.0                    
     └─ pods                   1.0      1.0                    
  monitoring                                                   
  └─ tiamat                                                    
     ├─ cpu                 500.0m      1.0                    
     ├─ memory              500.0M    1.0Gi                    
     └─ pods                   2.0      2.0                    
  registry                                                     
  └─ mindflayer01                                              
     ├─ cpu                 200.0m      0.0                    
     └─ pods                   2.0      2.0                    
  testing                      3.0      3.0                    
  ├─ mindflayer03              1.0      1.0                    
  │  └─ pods                   1.0      1.0                    
  └─ tiamat                    2.0      2.0                    
     └─ pods                   2.0      2.0                    
  user-alex-chan                                               
  └─ mindflayer02                                              
     ├─ cpu                   10.0     16.0                    
     ├─ memory              64.0Gi   64.0Gi                    
     └─ pods                   1.0      1.0                    
  user-andri-rutschmann                                        
  └─ kiaransalee                                               
     ├─ cpu                   40.0     40.0                    
     ├─ memory             256.0Gi  256.0Gi                    
     ├─ nvidia.com/gpu         4.0      4.0                    
     └─ pods                   2.0      2.0                    
  user-felix-petersen                                          
  └─ vecna                                                     
     ├─ cpu                   16.0    100.0                    
     ├─ memory             100.0Gi    1.6Ti                    
     ├─ nvidia.com/gpu        16.0     16.0                    
     └─ pods                   1.0      1.0                    
  user-julian-welzel                                           
  └─ zariel                                                    
     ├─ cpu                   96.0     96.0                    
     ├─ memory             100.0Gi  400.0Gi                    
     ├─ nvidia.com/gpu         4.0      4.0                    
     └─ pods                   1.0      1.0                    
  user-segun-aroyehun                                          
  └─ kiaransalee                                               
     ├─ cpu                   50.0    100.0                    
     ├─ memory             250.0Gi  400.0Gi                    
     ├─ nvidia.com/gpu         1.0      1.0                    
     └─ pods                   1.0      1.0                    
  user-v-time                                                  
  ├─ mindflayer01                                              
  │  ├─ cpu                 500.0m      1.0                    
  │  ├─ memory             256.0Mi  512.0Mi                    
  │  └─ pods                   1.0      1.0                    
  └─ mindflayer02                                              
     ├─ cpu                    1.0      2.0                    
     ├─ memory             512.0Mi    1.0Gi                    
     └─ pods                   1.0      1.0                    
  web                                                          
  ├─ mindflayer01              6.0      6.0                    
  │  └─ pods                   6.0      6.0                    
  └─ mindflayer02                                              
     ├─ cpu                 250.0m      0.0                    
     └─ pods                   1.0      1.0                    




Ceph file system report

  cluster:
    id:     3fee6f38-ba9f-11ec-9328-e188936dcafd
    health: HEALTH_WARN
            1 daemons have recently crashed
 
  services:
    mon: 5 daemons, quorum beholder03,beholder01,beholder02,mindflayer02,mindflayer03 (age 5d)
    mgr: beholder01.verxwn(active, since 5d), standbys: beholder02.akktmp, beholder03.nprqzk, mindflayer03.rzdvrr, mindflayer01.mkuopd, mindflayer02.ympgrs
    mds: 4/4 daemons up, 2 standby
    osd: 24 osds: 24 up (since 5d), 24 in (since 6M)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 545 pgs
    objects: 64.06M objects, 101 TiB
    usage:   203 TiB used, 79 TiB / 282 TiB avail
    pgs:     544 active+clean
             1   active+clean+scrubbing+deep
 
  io:
    client:   33 KiB/s wr, 0 op/s rd, 3 op/s wr
 
HEALTH_WARN 1 daemons have recently crashed
[WRN] RECENT_CRASH: 1 daemons have recently crashed
    osd.18 crashed on host mindflayer01 at 2025-01-04T10:25:35.528683Z




Etcd cluster

+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|     ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 192.168.1.1:4252 | 7126e7a3a9cc42ca |  3.4.30 |  156 MB |      true |      false |    302051 |  532171904 |          532171904 |        |
| 192.168.1.2:4252 | 39d72894bf6c7600 |  3.4.30 |  156 MB |     false |      false |    302051 |  532171904 |          532171904 |        |
| 192.168.1.3:4252 | bbf4a2b99c3fd692 |  3.4.30 |  156 MB |     false |      false |    302051 |  532171904 |          532171904 |        |
| 192.168.2.1:4252 | 5cb9997dd1c2246b |  3.3.25 |  156 MB |     false |      false |    302051 |  532171904 |                  0 |        |
| 192.168.2.3:4252 |  cbc1cf89959ea4e |  3.3.25 |  156 MB |     false |      false |    302051 |  532171904 |                  0 |        |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+




Detailed network health

API and web servers

beholder01
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-beholder01
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
beholder02
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-beholder02
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
beholder03
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-beholder03
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
lolth
SSH port open yes
Report available yes
External interface up ok
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-lolth
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes




Ceph osd nodes

mindflayer01
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-mindflayer01
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
mindflayer02
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-mindflayer02
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
mindflayer03
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs
local raid mounted /raid
Test pod responding dnsutils-mindflayer03
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes




Compute nodes

vecna
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-vecna
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
kiaransalee
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-kiaransalee
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
belial
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-belial
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
fierna
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-fierna
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
demogorgon
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-demogorgon
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
tiamat
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-tiamat
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
asmodeus
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-asmodeus
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes
zariel
SSH port open yes
Report available yes
Infiniband interface up ok
API servers reachable 1 2 3 4
Ceph monitors reachable 1 2 3
cephfs mounted /cephfs/abyss
local raid mounted /raid
Test pod responding dnsutils-zariel
Can reach kube-dns 10.96.0.10
Pod can reach kube-dns yes
Pod can reach internet yes




nVidia driver and GPU status

dretch

belial

Fri Jan 10 07:42:47 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro RTX 6000                Off |   00000000:1B:00.0 Off |                  Off |
| 33%   32C    P8              4W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Quadro RTX 6000                Off |   00000000:1C:00.0 Off |                  Off |
| 33%   31C    P8             11W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Quadro RTX 6000                Off |   00000000:1D:00.0 Off |                  Off |
| 33%   31C    P8              4W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Quadro RTX 6000                Off |   00000000:1E:00.0 Off |                  Off |
| 33%   32C    P8              4W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Quadro RTX 6000                Off |   00000000:3D:00.0 Off |                  Off |
| 33%   33C    P8             14W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Quadro RTX 6000                Off |   00000000:3F:00.0 Off |                  Off |
| 33%   32C    P8              6W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Quadro RTX 6000                Off |   00000000:40:00.0 Off |                  Off |
| 33%   32C    P8              4W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Quadro RTX 6000                Off |   00000000:41:00.0 Off |                  Off |
| 33%   33C    P8              4W /  260W |       2MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

fierna

Fri Jan 10 07:42:49 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro RTX 6000                Off |   00000000:1B:00.0 Off |                  Off |
| 33%   31C    P8             19W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Quadro RTX 6000                Off |   00000000:1C:00.0 Off |                  Off |
| 33%   31C    P8             16W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Quadro RTX 6000                Off |   00000000:1D:00.0 Off |                  Off |
| 33%   32C    P8              4W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Quadro RTX 6000                Off |   00000000:1E:00.0 Off |                  Off |
| 33%   31C    P8              4W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Quadro RTX 6000                Off |   00000000:3D:00.0 Off |                  Off |
| 33%   32C    P8             15W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Quadro RTX 6000                Off |   00000000:3F:00.0 Off |                  Off |
| 33%   32C    P8              4W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Quadro RTX 6000                Off |   00000000:40:00.0 Off |                  Off |
| 33%   32C    P8             12W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Quadro RTX 6000                Off |   00000000:41:00.0 Off |                  Off |
| 33%   33C    P8             16W /  260W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

tiamat

Fri Jan 10 07:42:51 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  |   00000000:01:00.0 Off |                    0 |
| N/A   30C    P0             50W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  |   00000000:41:00.0 Off |                    0 |
| N/A   31C    P0             53W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  |   00000000:81:00.0 Off |                    0 |
| N/A   30C    P0             50W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  |   00000000:C1:00.0 Off |                    0 |
| N/A   29C    P0             51W /  400W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

vecna

Fri Jan 10 08:42:54 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-SXM3-32GB           On  |   00000000:34:00.0 Off |                    0 |
| N/A   49C    P0            205W /  350W |   23938MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla V100-SXM3-32GB           On  |   00000000:36:00.0 Off |                    0 |
| N/A   50C    P0            207W /  350W |   23962MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla V100-SXM3-32GB           On  |   00000000:39:00.0 Off |                    0 |
| N/A   59C    P0            223W /  350W |   23962MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla V100-SXM3-32GB           On  |   00000000:3B:00.0 Off |                    0 |
| N/A   62C    P0            230W /  350W |   23842MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla V100-SXM3-32GB           On  |   00000000:57:00.0 Off |                    0 |
| N/A   52C    P0            230W /  350W |   23940MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla V100-SXM3-32GB           On  |   00000000:59:00.0 Off |                    0 |
| N/A   67C    P0            240W /  350W |   23964MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla V100-SXM3-32GB           On  |   00000000:5C:00.0 Off |                    0 |
| N/A   53C    P0            220W /  350W |   23964MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla V100-SXM3-32GB           On  |   00000000:5E:00.0 Off |                    0 |
| N/A   66C    P0            223W /  350W |   23844MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   8  Tesla V100-SXM3-32GB           On  |   00000000:B7:00.0 Off |                    0 |
| N/A   35C    P0             50W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   9  Tesla V100-SXM3-32GB           On  |   00000000:B9:00.0 Off |                    0 |
| N/A   35C    P0             49W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  10  Tesla V100-SXM3-32GB           On  |   00000000:BC:00.0 Off |                    0 |
| N/A   38C    P0             52W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  11  Tesla V100-SXM3-32GB           On  |   00000000:BE:00.0 Off |                    0 |
| N/A   39C    P0             49W /  350W |       1MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  12  Tesla V100-SXM3-32GB           On  |   00000000:E0:00.0 Off |                    0 |
| N/A   54C    P0            226W /  350W |   23940MiB /  32768MiB |     97%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  13  Tesla V100-SXM3-32GB           On  |   00000000:E2:00.0 Off |                    0 |
| N/A   53C    P0            231W /  350W |   23964MiB /  32768MiB |     97%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  14  Tesla V100-SXM3-32GB           On  |   00000000:E5:00.0 Off |                    0 |
| N/A   70C    P0            257W /  350W |   23964MiB /  32768MiB |     97%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|  15  Tesla V100-SXM3-32GB           On  |   00000000:E7:00.0 Off |                    0 |
| N/A   67C    P0            246W /  350W |   23844MiB /  32768MiB |     97%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     81514      C   python                                      23932MiB |
|    1   N/A  N/A     81562      C   python                                      23956MiB |
|    2   N/A  N/A     82038      C   python                                      23956MiB |
|    3   N/A  N/A     82505      C   python                                      23836MiB |
|    4   N/A  N/A     82834      C   python                                      23932MiB |
|    5   N/A  N/A     83008      C   python                                      23956MiB |
|    6   N/A  N/A     83123      C   python                                      23956MiB |
|    7   N/A  N/A     83657      C   python                                      23836MiB |
|   12   N/A  N/A     80878      C   python                                      23932MiB |
|   13   N/A  N/A     80655      C   python                                      23956MiB |
|   14   N/A  N/A     81097      C   python                                      23956MiB |
|   15   N/A  N/A     81249      C   python                                      23836MiB |
+-----------------------------------------------------------------------------------------+

asmodeus

Fri Jan 10 07:42:56 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:01:00.0 Off |                    0 |
| N/A   31C    P0             62W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  |   00000000:41:00.0 Off |                    0 |
| N/A   33C    P0             60W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  |   00000000:81:00.0 Off |                    0 |
| N/A   31C    P0             60W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          On  |   00000000:C1:00.0 Off |                    0 |
| N/A   30C    P0             61W /  500W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

zariel

Fri Jan 10 08:42:58 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.03             Driver Version: 535.216.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:07:00.0 Off |                    0 |
| N/A   34C    P0              56W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  | 00000000:0F:00.0 Off |                    0 |
| N/A   33C    P0              53W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  | 00000000:47:00.0 Off |                    0 |
| N/A   51C    P0             218W / 400W |  19375MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  | 00000000:4E:00.0 Off |                    0 |
| N/A   54C    P0             246W / 400W |  16647MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  | 00000000:87:00.0 Off |                    0 |
| N/A   58C    P0             247W / 400W |  19375MiB / 40960MiB |     98%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  | 00000000:90:00.0 Off |                    0 |
| N/A   57C    P0             244W / 400W |  16647MiB / 40960MiB |     98%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  | 00000000:B7:00.0 Off |                    0 |
| N/A   43C    P0              53W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  | 00000000:BD:00.0 Off |                    0 |
| N/A   43C    P0              59W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    2   N/A  N/A   1647322      C   python                                    19366MiB |
|    3   N/A  N/A   1709434      C   python                                    16638MiB |
|    4   N/A  N/A   1329388      C   python                                    19366MiB |
|    5   N/A  N/A   1660692      C   python                                    16638MiB |
+---------------------------------------------------------------------------------------+

demogorgon

Fri Jan 10 07:43:02 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                     Off |   00000000:01:00.0 Off |                    0 |
|  0%   30C    P8             15W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A40                     Off |   00000000:25:00.0 Off |                    0 |
|  0%   31C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A40                     Off |   00000000:41:00.0 Off |                    0 |
|  0%   32C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A40                     Off |   00000000:61:00.0 Off |                    0 |
|  0%   30C    P8             15W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A40                     Off |   00000000:81:00.0 Off |                    0 |
|  0%   30C    P8             15W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A40                     Off |   00000000:A1:00.0 Off |                    0 |
|  0%   30C    P8             15W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A40                     Off |   00000000:C1:00.0 Off |                    0 |
|  0%   33C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A40                     Off |   00000000:E1:00.0 Off |                    0 |
|  0%   33C    P8             24W /  300W |       1MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

kiaransalee

Fri Jan 10 07:43:05 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08             Driver Version: 550.127.08     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:26:00.0 Off |                    0 |
| N/A   39C    P0            145W /  700W |   74602MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off |   00000000:2F:00.0 Off |                    0 |
| N/A   48C    P0            142W /  700W |   74602MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:46:00.0 Off |                    0 |
| N/A   47C    P0            143W /  700W |   74602MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off |   00000000:54:00.0 Off |                    0 |
| N/A   41C    P0            146W /  700W |   74602MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:A6:00.0 Off |                    0 |
| N/A   33C    P0             79W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          Off |   00000000:AF:00.0 Off |                    0 |
| N/A   39C    P0             83W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:C6:00.0 Off |                    0 |
| N/A   43C    P0            137W /  700W |   53294MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          Off |   00000000:CF:00.0 Off |                   On |
| N/A   33C    P0             76W /  700W |      90MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  7    7   0   0  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7    8   0   1  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7    9   0   2  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   11   0   3  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   12   0   4  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   13   0   5  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  7   14   0   6  |              13MiB /  9984MiB    | 16      0 |  1   0    1    0    1 |
|                  |                 0MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   3220045      C   /opt/conda/bin/python3.11                   74592MiB |
|    1   N/A  N/A   3220043      C   /opt/conda/bin/python3.11                   74592MiB |
|    2   N/A  N/A   3220044      C   /opt/conda/bin/python3.11                   74592MiB |
|    3   N/A  N/A   3220047      C   /opt/conda/bin/python3.11                   74592MiB |
|    6   N/A  N/A   3420301      C   /opt/conda/bin/python                       53284MiB |
+-----------------------------------------------------------------------------------------+