[gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks
Rob Kudyba
rk3199 at columbia.edu
Thu Jun 6 22:34:53 BST 2024
>
> Are you seeing the issues across the whole file system or in certain
> areas?
>
Only with accounts in GPFS, local accounts and root do not gt this.
> That sounds like inode exhaustion to me (and based on it not being block
> exhaustion as you’ve demonstrated).
>
>
>
> What does a “df -i /cluster” show you?
>
We bumped it up a few weeks ago:
df -i /cluster
Filesystem Inodes IUsed IFree IUse% Mounted on
cluster 276971520 154807697 122163823 56% /cluster
> Or if this is only in a certain area you can “cd” into that directory and
> run a “df -i .”
>
As root on a login node;
df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda2 20971520 169536 20801984 1% /
devtmpfs 12169978 528 12169450 1% /dev
tmpfs 12174353 1832 12172521 1% /run
tmpfs 12174353 77 12174276 1% /dev/shm
tmpfs 12174353 15 12174338 1% /sys/fs/cgroup
/dev/sda1 0 0 0 - /boot/efi
/dev/sda3 52428800 2887 52425913 1% /var
/dev/sda7 277368832 35913 277332919 1% /local
/dev/sda5 104857600 398 104857202 1% /tmp
tmpfs 12174353 1 12174352 1% /run/user/551336
tmpfs 12174353 1 12174352 1% /run/user/0
moto 276971520 154807697 122163823 56% /cluster
tmpfs 12174353 3 12174350 1% /run/user/441245
tmpfs 12174353 12 12174341 1% /run/user/553562
tmpfs 12174353 1 12174352 1% /run/user/525583
tmpfs 12174353 1 12174352 1% /run/user/476374
tmpfs 12174353 1 12174352 1% /run/user/468934
tmpfs 12174353 5 12174348 1% /run/user/551200
tmpfs 12174353 1 12174352 1% /run/user/539143
tmpfs 12174353 1 12174352 1% /run/user/488676
tmpfs 12174353 1 12174352 1% /run/user/493713
tmpfs 12174353 1 12174352 1% /run/user/507831
tmpfs 12174353 1 12174352 1% /run/user/549822
tmpfs 12174353 1 12174352 1% /run/user/500569
tmpfs 12174353 1 12174352 1% /run/user/443748
tmpfs 12174353 1 12174352 1% /run/user/543676
tmpfs 12174353 1 12174352 1% /run/user/451446
tmpfs 12174353 1 12174352 1% /run/user/497945
tmpfs 12174353 6 12174347 1% /run/user/554672
tmpfs 12174353 32 12174321 1% /run/user/554653
tmpfs 12174353 1 12174352 1% /run/user/30094
tmpfs 12174353 1 12174352 1% /run/user/470790
tmpfs 12174353 59 12174294 1% /run/user/553037
tmpfs 12174353 1 12174352 1% /run/user/554670
tmpfs 12174353 1 12174352 1% /run/user/548236
tmpfs 12174353 1 12174352 1% /run/user/547288
tmpfs 12174353 1 12174352 1% /run/user/547289
>
>
> You may need to allocate more inodes to an independent inode fileset
> somewhere. Especially with something as old as 4.2.3 you won’t have
> auto-inode expansion for the filesets.
>
Do we have to restart any service after upping the inode count?
>
> Best,
>
>
>
> J.D. Maloney
>
> Lead HPC Storage Engineer | Storage Enabling Technologies Group
>
> National Center for Supercomputing Applications (NCSA)
>
Ho JD I took an intermediate LCI workshop with you at Univ of Cincinnati!
>
>
> *From: *gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of
> Rob Kudyba <rk3199 at columbia.edu>
> *Date: *Thursday, June 6, 2024 at 3:50 PM
> *To: *gpfsug-discuss at gpfsug.org <gpfsug-discuss at gpfsug.org>
> *Subject: *[gpfsug-discuss] No space left on device, but plenty of quota
> space for inodes and blocks
>
> Running GPFS 4.2.3 on a DDN GridScaler and users are getting the No space
> left on device message when trying to write to a file. In /var/adm/ras/mmfs.log
> the only recent errors are this:
>
>
>
> 2024-06-06_15:51:22.311-0400: mmcommon getContactNodes cluster failed.
> Return code -1.
> 2024-06-06_15:51:22.311-0400: The previous error was detected on node
> x.x.x.x (headnode).
> 2024-06-06_15:53:25.088-0400: mmcommon getContactNodes cluster failed.
> Return code -1.
> 2024-06-06_15:53:25.088-0400: The previous error was detected on node
> x.x.x.x (headnode).
>
>
>
> according to
> https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615
> <https://urldefense.com/v3/__https:/www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615__;!!DZ3fjg!4ZyUNmTiGNp6C3Yls1wqW-RdRGa8n-ZmfZ0y0i-y6pce_ZIFSaefpOWvKIYIXspKjfREPtf3BRuO5VqAS6Y9UXQ$>
>
>
>
>
> Check the preceding messages, and consult the earlier chapters of this
> document. A frequent cause for such errors is lack of space in /var.
>
>
>
> We have plenty of space left.
>
>
>
> /usr/lpp/mmfs/bin/mmlsdisk cluster
> disk driver sector failure holds holds
> storage
> name type size group metadata data status
> availability pool
> ------------ -------- ------ ----------- -------- ----- -------------
> ------------ ------------
> S01_MDT200_1 nsd 4096 200 Yes No ready up
> system
> S01_MDT201_1 nsd 4096 201 Yes No ready up
> system
> S01_DAT0001_1 nsd 4096 100 No Yes ready up
> data1
> S01_DAT0002_1 nsd 4096 101 No Yes ready up
> data1
> S01_DAT0003_1 nsd 4096 100 No Yes ready up
> data1
> S01_DAT0004_1 nsd 4096 101 No Yes ready up
> data1
> S01_DAT0005_1 nsd 4096 100 No Yes ready up
> data1
> S01_DAT0006_1 nsd 4096 101 No Yes ready up
> data1
> S01_DAT0007_1 nsd 4096 100 No Yes ready up
> data1
>
>
>
> /usr/lpp/mmfs/bin/mmdf headnode
> disk disk size failure holds holds free KB
> free KB
> name in KB group metadata data in full blocks
> in fragments
> --------------- ------------- -------- -------- ----- --------------------
> -------------------
> Disks in storage pool: system (Maximum disk size allowed is 14 TB)
> S01_MDT200_1 1862270976 200 Yes No 969134848 ( 52%)
> 2948720 ( 0%)
> S01_MDT201_1 1862270976 201 Yes No 969126144 ( 52%)
> 2957424 ( 0%)
> ------------- --------------------
> -------------------
> (pool total) 3724541952 1938260992 ( 52%)
> 5906144 ( 0%)
>
> Disks in storage pool: data1 (Maximum disk size allowed is 578 TB)
> S01_DAT0007_1 77510737920 100 No Yes 21080752128 ( 27%)
> 897723392 ( 1%)
> S01_DAT0005_1 77510737920 100 No Yes 14507212800 ( 19%)
> 949412160 ( 1%)
> S01_DAT0001_1 77510737920 100 No Yes 14503620608 ( 19%)
> 951327680 ( 1%)
> S01_DAT0003_1 77510737920 100 No Yes 14509205504 ( 19%)
> 949340544 ( 1%)
> S01_DAT0002_1 77510737920 101 No Yes 14504585216 ( 19%)
> 948377536 ( 1%)
> S01_DAT0004_1 77510737920 101 No Yes 14503647232 ( 19%)
> 952892480 ( 1%)
> S01_DAT0006_1 77510737920 101 No Yes 14504486912 ( 19%)
> 949072512 ( 1%)
> ------------- --------------------
> -------------------
> (pool total) 542575165440 108113510400 ( 20%)
> 6598146304 ( 1%)
>
> ============= ====================
> ===================
> (data) 542575165440 108113510400 ( 20%)
> 6598146304 ( 1%)
> (metadata) 3724541952 1938260992 ( 52%)
> 5906144 ( 0%)
> ============= ====================
> ===================
> (total) 546299707392 110051771392 ( 22%)
> 6604052448 ( 1%)
>
> Inode Information
> -----------------
> Total number of used inodes in all Inode spaces: 154807668
> Total number of free inodes in all Inode spaces: 12964492
> Total number of allocated inodes in all Inode spaces: 167772160
> Total of Maximum number of inodes in all Inode spaces: 276971520
>
>
>
> On the head node:
>
>
>
> df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda4 430G 216G 215G 51% /
> devtmpfs 47G 0 47G 0% /dev
> tmpfs 47G 0 47G 0% /dev/shm
> tmpfs 47G 4.1G 43G 9% /run
> tmpfs 47G 0 47G 0% /sys/fs/cgroup
> /dev/sda1 504M 114M 365M 24% /boot
> /dev/sda2 100M 9.9M 90M 10% /boot/efi
> x.x.x.:/nfs-share 430G 326G 105G 76% /nfs-share
> cluster 506T 405T 101T 81% /cluster
> tmpfs 9.3G 0 9.3G 0% /run/user/443748
> tmpfs 9.3G 0 9.3G 0% /run/user/547288
> tmpfs 9.3G 0 9.3G 0% /run/user/551336
> tmpfs 9.3G 0 9.3G 0% /run/user/547289
>
>
>
> The login nodes have plenty of space in /var:
>
> /dev/sda3 50G 8.7G 42G 18% /var
>
>
>
> What else should we check? We are just at 81% on the GPFS mounted file
> system but that should be enough for more space without these errors. Any
> recommended service(s) that we can restart?
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240606/b21a3808/attachment.htm>
More information about the gpfsug-discuss
mailing list