[gpfsug-discuss] mmbackup valiating the backups
Wahl, Edward
ewahl at osc.edu
Fri Jan 16 14:31:11 GMT 2026
I've been hesitant to reply to this thread as I am not a TSM/Storage Protect expert even after dealing with it for more than a decade and a half, but we've been backing up some moderately sized GPFS file systems (8-16P) with it for many years now so I do have some experience with it. We've had all the problems you can have from a corrupted DB2, to issues with a zillion tiny files taking more than 24 hours, to the once a week or so issues with users uploading/creating file with the dreaded "newline" or control characters in the names.
>If You run mmbackup on a large file system where a lot of changes are ongoing, we always see in the final report of mmbackup that some files failed.
As this thread started out asking about "validating" the backups let's start here and address a couple different issues from different folks.
Are you not using snapshots to do the mmbackup? Other than 'bad characters' which GPFS allows and SP does not(GPFS is MUCH more permissive than TSM about that even with things like WILDCARDSARELITERAL/QUOTESARELITERAL), there shouldn't really be any issues. I HIGHLY recommend using a snapshot at the minimum for backups. This will bypass any 'file not found' issues at least. Though this DOES introduce the whole 'issues with the snaps' you might see if compute nodes are too overloaded and will not pause for the snaps to be created/deleted.
Snapshots will also allow the validation testing to have a perfect copy of the backed up data to validate with. Backups and snapshots are both 'point in time' on a live/production file system, so validation is difficult without being at that point in time. Just keep the snapshot around as long as you can stand, for the tests. (I realize not everyone can keep snapshots for a long time due to inode/block issues) Maybe a scripted 'xxx#sum' for everything backed up? Up to you how you do this.
For the comment about sending the data to multiple initial SP servers, that sounds like a major shadow file problem waiting to happen. Which server was it built with last? Which is the source of truth?, etc. Either send to a single SP server and have it replicate (disaster recovery is your friend), and/or divide up the file system by filesets and send them to different servers. Using snapshots with multiple SP servers and gpfs filesets allows for much better parallelization of the backup itself. As we already use independent filesets by 'project code' for quotas, this is quite helpful for us.
Apologies if I used GPFS or TSM in here and folks were confused. Just replace those with Storage Scale and Storage Protect.
Ed Wahl
Ohio Supercomputer Center
________________________________
From: gpfsug-discuss <gpfsug-discuss-bounces at gpfsug.org> on behalf of Stephan Graf <st.graf at fz-juelich.de>
Sent: Friday, January 16, 2026 4:29 AM
To: gpfsug-discuss at gpfsug.org <gpfsug-discuss at gpfsug.org>
Subject: Re: [gpfsug-discuss] mmbackup valiating the backups
Hi,
I also recommend to use the '-q' option in a regularly way to be sure
not to get some kind of 'split brain' situation: The TSM DB content and
the shadow database.
If You run mmbackup on a large file system where a lot of changes are
ongoing, we always see in the final report of mmbackup that some files
failed.
As far as I remember the situation was, that the file was deleted by the
user before the dsmc was started.
But it is hard to check it for all files so we ignore those failures.
But it could be that something different is the root cause of not
backing up this file. In the past we saw it when users used strange
characters in their file name.
For this it would be nice to get some kind of better report from mmbackup.
So for example a filelist of all files which failed. May be with the reason
- file not found
- dsmc backup failed
- ...
This is a question to the IBM Devs: What do You think? Is it worth of
opening an IBM IDEA?
Stephan
On 1/16/26 07:53, Timm Stamer wrote:
> Hi ,
>
> we're running mmbackup with -q option once a quarter to be sure
> everything is backed up.
> I think this is much simpler than your approach but maybe does not
> cover all your needs.
>
>
> -q
> Performs a query operation before issuing mmbackup. The IBM Storage
> Protect server might have data stored already that is not recognized as
> having been backed up by mmbackup and its own shadow database. To
> properly compute the set of files that currently need to be backed up,
> mmbackup can perform an IBM Storage Protect query and process the
> results to update its shadow database. Use the -q switch to perform
> this query and then immediately commence the requested backup
> operation.
>
> https://urldefense.com/v3/__https://www.ibm.com/docs/en/storage-scale/5.2.3?topic=reference-mmbackup-command__;!!KGKeukY!2Gc-96JLXSQApClIceidsjnn3c3yGp2w_RSrcXcylKv3vSYzH4zqmFX0o222MdyEb07sy4QfsZ9bKkKyvQNr$
>
>
> [...]
> we=$(LC_TIME=C date +%A)
> dm=$(date +%d)
> dmonth=$(date +%m)
> checkMonths=("01" "04" "07" "10")
> [[ ${checkMonths[@]} =~ $dmonth ]] && [ "$we" = "Friday" ] && [ "$dm" -le 7 ] && QUERYBACKUP="-q"
>
> mmbackup ... ${QUERYBACKUP} ...
> [...]
>
>
> Kind regards
>
> Timm Stamer
>
>
>
>
> Am Donnerstag, dem 15.01.2026 um 18:21 +0000 schrieb Peter Childs:
>>
>> Hi All,
>>
>> We use mmbackup to backup our Scale Storage to two IBM Protect
>> Instances.
>>
>> We would like to validate our backups to ensure that the files we
>> have really are backed up and we don't have any problems.
>>
>> So far I've worked out I can query the backups using "query backup --
>> detail" in Protect and get times and dates, From this I can compare
>> with `stat` and `ls` (or using mmapplypolicy (which should be
>> faster)) to check the three indexes match, and check the contents of
>> each directory. (Using --subdir=yes looks great on paper but the
>> output takes days to appear and checking it file by file can be done
>> incrementally and I can use mmapplypolicy to run the report.
>>
>> Given the stats from Protect the two backup servers are reporting
>> different occupancy figures which suggests to me we may have some
>> inconsistencies. (We're talking a 1/2 TB difference between the two
>> servers according to Protect Query Occupancy) but I'm aware that
>> Protect's figures are estimates and not always accurate.
>>
>> Verify the backup is always a good plan even if your 101% sure its
>> correct anyway (and I'm not maybe). (Your backups are only as good as
>> the last time you recalled them and all that)
>>
>> We could use the shadow database. As this looks to be what Storage
>> Archive does to say yes this file is backed up before it is archived.
>>
>> Does anyone know the format of the shadow database, which fields are
>> which, as I think knowing the format might allow us to at least know
>> what the differences; and increase our confidence in the backup.
>>
>> Thanks in advance.
>>
>> Peter Childs
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!KGKeukY!2Gc-96JLXSQApClIceidsjnn3c3yGp2w_RSrcXcylKv3vSYzH4zqmFX0o222MdyEb07sy4QfsZ9bKh7-Yo7f$
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> https://urldefense.com/v3/__http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org__;!!KGKeukY!2Gc-96JLXSQApClIceidsjnn3c3yGp2w_RSrcXcylKv3vSYzH4zqmFX0o222MdyEb07sy4QfsZ9bKh7-Yo7f$
--
Stephan Graf
#GernePerDu
HPC, Cloud and Data Systems and Services
Jülich Supercomputing Centre
Phone: +49-2461-61-6578
Fax: +49-2461-61-6656
E-mail: st.graf at fz-juelich.de
WWW: https://urldefense.com/v3/__http://www.fz-juelich.de/jsc/__;!!KGKeukY!2Gc-96JLXSQApClIceidsjnn3c3yGp2w_RSrcXcylKv3vSYzH4zqmFX0o222MdyEb07sy4QfsZ9bKkMvcWKi$
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Dr. Stephanie Bauer (stellv. Vorsitzende)
Prof. Dr. Ir. Pieter Jansens, Prof. Dr. Laurens Kuipers
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20260116/a04a5b17/attachment-0001.html>
More information about the gpfsug-discuss
mailing list