-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat: integrity check jobs (missing files, orphaned files, checksums) #24205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
refactor: rename checksum_fail to checksum_mismatched
refactor: combine the stream query
chore: lint & format
feat: update summary after job runs
| }); | ||
| }); | ||
|
|
||
| describe('POST /integrity/summary (& jobs)', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to separate test file
| }); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to separate controller
| export class MaintenanceAuthDto { | ||
| username!: string; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create separate dto file
|
|
||
| case ManualJobName.IntegrityMissingFiles: { | ||
| return { name: JobName.IntegrityMissingFilesQueueAll }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityOrphanFiles: { | ||
| return { name: JobName.IntegrityOrphanedFilesQueueAll }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityChecksumFiles: { | ||
| return { name: JobName.IntegrityChecksumFiles }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityMissingFilesRefresh: { | ||
| return { name: JobName.IntegrityMissingFilesQueueAll, data: { refreshOnly: true } }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityOrphanFilesRefresh: { | ||
| return { name: JobName.IntegrityOrphanedFilesQueueAll, data: { refreshOnly: true } }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityChecksumFilesRefresh: { | ||
| return { name: JobName.IntegrityChecksumFiles, data: { refreshOnly: true } }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityMissingFilesDeleteAll: { | ||
| return { name: JobName.IntegrityReportDelete, data: { type: IntegrityReportType.MissingFile } }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityOrphanFilesDeleteAll: { | ||
| return { name: JobName.IntegrityReportDelete, data: { type: IntegrityReportType.OrphanFile } }; | ||
| } | ||
|
|
||
| case ManualJobName.IntegrityChecksumFilesDeleteAll: { | ||
| return { name: JobName.IntegrityReportDelete, data: { type: IntegrityReportType.ChecksumFail } }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be its own route? createJob only accepts the name right now
| Unique, | ||
| } from 'src/sql-tools'; | ||
|
|
||
| @Table('integrity_report') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be called integrity_report_files or something, in case we want to then handle integrity_report_database_rows or other report types which don't deal with files?
| // debug: run on boot | ||
| setTimeout(() => { | ||
| void this.jobRepository.queue({ | ||
| name: JobName.IntegrityOrphanedFilesQueueAll, | ||
| data: {}, | ||
| }); | ||
|
|
||
| void this.jobRepository.queue({ | ||
| name: JobName.IntegrityMissingFilesQueueAll, | ||
| data: {}, | ||
| }); | ||
|
|
||
| void this.jobRepository.queue({ | ||
| name: JobName.IntegrityChecksumFiles, | ||
| data: {}, | ||
| }); | ||
| }, 1000); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this before merge.
|
|
||
| await this.eventRepository.emit('AssetTrashAll', { | ||
| assetIds: ids, | ||
| userId: '', // ??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing user ID
| for (const property of properties) { | ||
| const reports = this.integrityRepository.streamIntegrityReportsByProperty(property, type); | ||
| for await (const report of chunk(reports, JOBS_LIBRARY_PAGINATION_SIZE)) { | ||
| // todo: queue sub-job here instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unimplemented
| // .onRef('integrity_report.path', '=', 'allPaths.path') | ||
| ) | ||
| .select([ | ||
| 'allPaths.path as path', | ||
| 'allPaths.assetId', | ||
| 'allPaths.fileAssetId', | ||
| 'integrity_report.path as reportId', | ||
| ]) | ||
| .stream(); | ||
| } | ||
|
|
||
| @GenerateSql({ params: [DummyValue.DATE, DummyValue.DATE], stream: true }) | ||
| streamAssetChecksums(startMarker?: Date, endMarker?: Date) { | ||
| return this.db | ||
| .selectFrom('asset') | ||
| .leftJoin('integrity_report', (join) => | ||
| join | ||
| .onRef('integrity_report.assetId', '=', 'asset.id') | ||
| // .onRef('integrity_report.path', '=', 'asset.originalPath') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray comments
| eb.ref('asset.id').$castTo<string | null>().as('assetId'), | ||
| sql<string | null>`null::uuid`.as('fileAssetId'), | ||
| ]) | ||
| .unionAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May have a major performance impact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current impact:
Type: Nested Loop (Left); ; Cost: 0.00 - 51.62
Type: Append; ; Cost: 0.00 - 41.43
Type: Seq Scan; Rel: asset ; Cost: 0.00 - 11.60
Type: Seq Scan; Rel: asset ; Cost: 0.00 - 12.00
Type: Seq Scan; Rel: asset_file ; Cost: 0.00 - 15.20
Type: Materialize; ; Cost: 0.00 - 1.05
Type: Seq Scan; Rel: integrity_report ; Cost: 0.00 - 1.05
Without the union:
Type: Hash Join (Left); ; Cost: 1.06 - 18.22
Type: Seq Scan; Rel: asset_file ; Cost: 0.00 - 15.20
Type: Hash; ; Cost: 1.05 - 1.05
Type: Seq Scan; Rel: integrity_report ; Cost: 0.00 - 1.05
Doesn't appear much performance is left on the table
- No indication the entire
assetorasset_filetable is loaded into memory but I could be wrong - Integrity report is loaded entirely into memory in either case
- Likely negligible performance difference between Nested Loop/Hash Join? This seems like something the database optimises itself on the fly, I'm not too knowledgeable on this -- https://stackoverflow.com/a/49024533
PR desc TODO
Description
Fixes # (issue)
How Has This Been Tested?
Screenshots (if appropriate)
Checklist:
src/services/uses repositories implementations for database calls, filesystem operations, etc.src/repositories/is pretty basic/simple and does not have any immich specific logic (that belongs insrc/services/)Please describe to which degree, if any, an LLM was used in creating this pull request.
...