[gpfsug-discuss] No space left on device, but plenty of quota space for inodes and blocks

Thu Jun 6 23:47:23 BST 2024

I guess I have my answer:

/usr/lpp/mmfs/bin/mmlsfileset cluster home -L
Filesets in file system 'cluster':
Name                            Id      RootInode  ParentId Created
             InodeSpace      MaxInodes    AllocInodes Comment
home                             1        1048579         0 Thu Nov 29
15:21:52 2018        1             20971520       20971520

However on some of the other filesets the AllocInodes is 0?
/usr/lpp/mmfs/bin/mmlsfileset cluster groupa -L -i
Collecting fileset usage information ...
Filesets in file system 'moto':
Name                            Id      RootInode  ParentId Created
             InodeSpace      MaxInodes    AllocInodes     UsedInodes Comment
stats                            8         181207         0 Fri Nov 30
12:27:25 2018        0                    0              0        7628733

Yes we realize it's old and it'll be retired at the end of 2024.

On Thu, Jun 6, 2024 at 6:15 PM Fred Stock <stockf at us.ibm.com> wrote:

> You should check the inode counts for each of the filesets using the
> mmlsfileset command.  You should check the local disk space on all the
> nodes.
>
>
>
> I presume you are aware that Scale 4.2.3 has been out of support for 4
> years.
>
>
>
> Fred
>
>
>
> Fred Stock, Spectrum Scale Development Advocacy
>
> stockf at us.ibm.com | 720-430-8821
>
>
>
>
>
>
>
> *From: *gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of
> Rob Kudyba <rk3199 at columbia.edu>
> *Date: *Thursday, June 6, 2024 at 5:39 PM
> *To: *gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
> *Subject: *[EXTERNAL] Re: [gpfsug-discuss] No space left on device, but
> plenty of quota space for inodes and blocks
>
> Are you seeing the issues across the whole file system or in certain
> areas? Only with accounts in GPFS, local accounts and root do not gt this.
> That sounds like inode exhaustion to me (and based on it not being block
> exhaustion as you’ve demonstrated).
>
> ZjQcmQRYFpfptBannerStart
>
> *This Message Is From an Untrusted Sender *
>
> You have not previously corresponded with this sender.
>
>
> <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-XFVHHiymkdb2PX5Ys9u3xcIH6Vd3Ap1CobKrLSv4AEKLaxWTmX-SIXo5pwXtsG8GuxP6yYyms8BE2p0j0YYMsauSua4xvEzG7v8C4nNZ8q-8rr50pPoh5DWHA$>
>
> Report Suspicious
> <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-XFVHHiymkdb2PX5Ys9u3xcIH6Vd3Ap1CobKrLSv4AEKLaxWTmX-SIXo5pwXtsG8GuxP6yYyms8BE2p0j0YYMsauSua4xvEzG7v8C4nNZ8q-8rr50pPoh5DWHA$>
>
>
>
>
>
> ZjQcmQRYFpfptBannerEnd
>
> Are you seeing the issues across the whole file system or in certain
> areas?
>
>
>
> Only with accounts in GPFS, local accounts and root do not gt this.
>
>
>
> That sounds like inode exhaustion to me (and based on it not being block
> exhaustion as you’ve demonstrated).
>
>
>
> What does a “df -i /cluster” show you?
>
>
>
> We bumped it up a few weeks ago:
>
> df -i /cluster
> Filesystem        Inodes     IUsed     IFree IUse% Mounted on
> cluster           276971520 154807697 122163823   56% /cluster
>
>
>
>
>
> Or if this is only in a certain area you can “cd” into that directory and
> run a “df -i .”
>
>
>
> As root on a login node;
>
> df -i
> Filesystem        Inodes     IUsed     IFree IUse% Mounted on
> /dev/sda2       20971520    169536  20801984    1% /
> devtmpfs        12169978       528  12169450    1% /dev
> tmpfs           12174353      1832  12172521    1% /run
> tmpfs           12174353        77  12174276    1% /dev/shm
> tmpfs           12174353        15  12174338    1% /sys/fs/cgroup
> /dev/sda1              0         0         0     - /boot/efi
> /dev/sda3       52428800      2887  52425913    1% /var
> /dev/sda7      277368832     35913 277332919    1% /local
> /dev/sda5      104857600       398 104857202    1% /tmp
> tmpfs           12174353         1  12174352    1% /run/user/551336
> tmpfs           12174353         1  12174352    1% /run/user/0
> moto           276971520 154807697 122163823   56% /cluster
> tmpfs           12174353         3  12174350    1% /run/user/441245
> tmpfs           12174353        12  12174341    1% /run/user/553562
> tmpfs           12174353         1  12174352    1% /run/user/525583
> tmpfs           12174353         1  12174352    1% /run/user/476374
> tmpfs           12174353         1  12174352    1% /run/user/468934
> tmpfs           12174353         5  12174348    1% /run/user/551200
> tmpfs           12174353         1  12174352    1% /run/user/539143
> tmpfs           12174353         1  12174352    1% /run/user/488676
> tmpfs           12174353         1  12174352    1% /run/user/493713
> tmpfs           12174353         1  12174352    1% /run/user/507831
> tmpfs           12174353         1  12174352    1% /run/user/549822
> tmpfs           12174353         1  12174352    1% /run/user/500569
> tmpfs           12174353         1  12174352    1% /run/user/443748
> tmpfs           12174353         1  12174352    1% /run/user/543676
> tmpfs           12174353         1  12174352    1% /run/user/451446
> tmpfs           12174353         1  12174352    1% /run/user/497945
> tmpfs           12174353         6  12174347    1% /run/user/554672
> tmpfs           12174353        32  12174321    1% /run/user/554653
> tmpfs           12174353         1  12174352    1% /run/user/30094
> tmpfs           12174353         1  12174352    1% /run/user/470790
> tmpfs           12174353        59  12174294    1% /run/user/553037
> tmpfs           12174353         1  12174352    1% /run/user/554670
> tmpfs           12174353         1  12174352    1% /run/user/548236
> tmpfs           12174353         1  12174352    1% /run/user/547288
>
> tmpfs           12174353         1  12174352    1% /run/user/547289
>
>
>
> You may need to allocate more inodes to an independent inode fileset
> somewhere.  Especially with something as old as 4.2.3 you won’t have
> auto-inode expansion for the filesets.
>
>
>
> Do we have to restart any service after upping the inode count?
>
>
>
>
>
> Best,
>
>
>
> J.D. Maloney
>
> Lead HPC Storage Engineer | Storage Enabling Technologies Group
>
> National Center for Supercomputing Applications (NCSA)
>
>
>
> Ho JD I took an intermediate LCI workshop with you at Univ of Cincinnati!
>
>
>
>
>
> *From: *gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of
> Rob Kudyba <rk3199 at columbia.edu>
> *Date: *Thursday, June 6, 2024 at 3:50 PM
> *To: *gpfsug-discuss at gpfsug.org <gpfsug-discuss at gpfsug.org>
> *Subject: *[gpfsug-discuss] No space left on device, but plenty of quota
> space for inodes and blocks
>
> Running GPFS 4.2.3 on a DDN GridScaler and users are getting the No space
> left on device message when trying to write to a file. In /var/adm/ras/mmfs.log
> the only recent errors are this:
>
>
>
> 2024-06-06_15:51:22.311-0400: mmcommon getContactNodes cluster failed.
> Return code -1.
> 2024-06-06_15:51:22.311-0400: The previous error was detected on node
> x.x.x.x (headnode).
> 2024-06-06_15:53:25.088-0400: mmcommon getContactNodes cluster failed.
> Return code -1.
> 2024-06-06_15:53:25.088-0400: The previous error was detected on node
> x.x.x.x (headnode).
>
>
>
> according to
> https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615
>
>
>
> Check the preceding messages, and consult the earlier chapters of this
> document. A frequent cause for such errors is lack of space in /var.
>
>
>
> We have plenty of space left.
>
>
>
>  /usr/lpp/mmfs/bin/mmlsdisk cluster
> disk         driver   sector     failure holds    holds
>          storage
> name         type       size       group metadata data  status
>  availability pool
> ------------ -------- ------ ----------- -------- ----- -------------
> ------------ ------------
> S01_MDT200_1 nsd        4096         200 Yes      No    ready         up
>         system
> S01_MDT201_1 nsd        4096         201 Yes      No    ready         up
>         system
> S01_DAT0001_1 nsd        4096         100 No       Yes   ready         up
>           data1
> S01_DAT0002_1 nsd        4096         101 No       Yes   ready         up
>           data1
> S01_DAT0003_1 nsd        4096         100 No       Yes   ready         up
>           data1
> S01_DAT0004_1 nsd        4096         101 No       Yes   ready         up
>           data1
> S01_DAT0005_1 nsd        4096         100 No       Yes   ready         up
>           data1
> S01_DAT0006_1 nsd        4096         101 No       Yes   ready         up
>           data1
> S01_DAT0007_1 nsd        4096         100 No       Yes   ready         up
>           data1
>
>
>
>  /usr/lpp/mmfs/bin/mmdf headnode
> disk                disk size  failure holds    holds              free KB
>             free KB
> name                    in KB    group metadata data        in full blocks
>        in fragments
> --------------- ------------- -------- -------- ----- --------------------
> -------------------
> Disks in storage pool: system (Maximum disk size allowed is 14 TB)
> S01_MDT200_1       1862270976      200 Yes      No        969134848 ( 52%)
>       2948720 ( 0%)
> S01_MDT201_1       1862270976      201 Yes      No        969126144 ( 52%)
>       2957424 ( 0%)
>                 -------------                         --------------------
> -------------------
> (pool total)       3724541952                            1938260992 ( 52%)
>       5906144 ( 0%)
>
> Disks in storage pool: data1 (Maximum disk size allowed is 578 TB)
> S01_DAT0007_1     77510737920      100 No       Yes     21080752128 ( 27%)
>     897723392 ( 1%)
> S01_DAT0005_1     77510737920      100 No       Yes     14507212800 ( 19%)
>     949412160 ( 1%)
> S01_DAT0001_1     77510737920      100 No       Yes     14503620608 ( 19%)
>     951327680 ( 1%)
> S01_DAT0003_1     77510737920      100 No       Yes     14509205504 ( 19%)
>     949340544 ( 1%)
> S01_DAT0002_1     77510737920      101 No       Yes     14504585216 ( 19%)
>     948377536 ( 1%)
> S01_DAT0004_1     77510737920      101 No       Yes     14503647232 ( 19%)
>     952892480 ( 1%)
> S01_DAT0006_1     77510737920      101 No       Yes     14504486912 ( 19%)
>     949072512 ( 1%)
>                 -------------                         --------------------
> -------------------
> (pool total)     542575165440                          108113510400 ( 20%)
>    6598146304 ( 1%)
>
>                 =============                         ====================
> ===================
> (data)           542575165440                          108113510400 ( 20%)
>    6598146304 ( 1%)
> (metadata)         3724541952                            1938260992 ( 52%)
>       5906144 ( 0%)
>                 =============                         ====================
> ===================
> (total)          546299707392                          110051771392 ( 22%)
>    6604052448 ( 1%)
>
> Inode Information
> -----------------
> Total number of used inodes in all Inode spaces:          154807668
> Total number of free inodes in all Inode spaces:           12964492
> Total number of allocated inodes in all Inode spaces:     167772160
> Total of Maximum number of inodes in all Inode spaces:    276971520
>
>
>
> On the head node:
>
>
>
> df -h
> Filesystem                Size  Used Avail Use% Mounted on
> /dev/sda4                 430G  216G  215G  51% /
> devtmpfs                   47G     0   47G   0% /dev
> tmpfs                      47G     0   47G   0% /dev/shm
> tmpfs                      47G  4.1G   43G   9% /run
> tmpfs                      47G     0   47G   0% /sys/fs/cgroup
> /dev/sda1                 504M  114M  365M  24% /boot
> /dev/sda2                 100M  9.9M   90M  10% /boot/efi
> x.x.x.:/nfs-share  430G  326G  105G  76% /nfs-share
> cluster                      506T  405T  101T  81% /cluster
> tmpfs                     9.3G     0  9.3G   0% /run/user/443748
> tmpfs                     9.3G     0  9.3G   0% /run/user/547288
> tmpfs                     9.3G     0  9.3G   0% /run/user/551336
> tmpfs                     9.3G     0  9.3G   0% /run/user/547289
>
>
>
> The login nodes have plenty of space in /var:
>
> /dev/sda3        50G  8.7G   42G  18% /var
>
>
>
> What else should we check? We are just at 81% on the GPFS mounted file
> system but that should be enough for more space without these errors. Any
> recommended service(s) that we can restart?
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20240606/77bbecbf/attachment.htm>