Skip to content

Conversation

@insertish
Copy link
Contributor

@insertish insertish commented Nov 26, 2025

PR desc TODO

Description

Fixes # (issue)

How Has This Been Tested?

  • Test A
  • Test B

Screenshots (if appropriate)

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Please describe to which degree, if any, an LLM was used in creating this pull request.

...

@insertish insertish changed the title feat: integrity check jobs (missing files, orphaned files, checksums, more?) feat: integrity check jobs (missing files, orphaned files, checksums) Dec 2, 2025
});
});

describe('POST /integrity/summary (& jobs)', async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to separate test file

});
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to separate controller

export class MaintenanceAuthDto {
username!: string;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create separate dto file

Comment on lines 36 to +71

case ManualJobName.IntegrityMissingFiles: {
return { name: JobName.IntegrityMissingFilesQueueAll };
}

case ManualJobName.IntegrityOrphanFiles: {
return { name: JobName.IntegrityOrphanedFilesQueueAll };
}

case ManualJobName.IntegrityChecksumFiles: {
return { name: JobName.IntegrityChecksumFiles };
}

case ManualJobName.IntegrityMissingFilesRefresh: {
return { name: JobName.IntegrityMissingFilesQueueAll, data: { refreshOnly: true } };
}

case ManualJobName.IntegrityOrphanFilesRefresh: {
return { name: JobName.IntegrityOrphanedFilesQueueAll, data: { refreshOnly: true } };
}

case ManualJobName.IntegrityChecksumFilesRefresh: {
return { name: JobName.IntegrityChecksumFiles, data: { refreshOnly: true } };
}

case ManualJobName.IntegrityMissingFilesDeleteAll: {
return { name: JobName.IntegrityReportDelete, data: { type: IntegrityReportType.MissingFile } };
}

case ManualJobName.IntegrityOrphanFilesDeleteAll: {
return { name: JobName.IntegrityReportDelete, data: { type: IntegrityReportType.OrphanFile } };
}

case ManualJobName.IntegrityChecksumFilesDeleteAll: {
return { name: JobName.IntegrityReportDelete, data: { type: IntegrityReportType.ChecksumFail } };
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be its own route? createJob only accepts the name right now

Unique,
} from 'src/sql-tools';

@Table('integrity_report')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be called integrity_report_files or something, in case we want to then handle integrity_report_database_rows or other report types which don't deal with files?

Comment on lines +100 to +117
// debug: run on boot
setTimeout(() => {
void this.jobRepository.queue({
name: JobName.IntegrityOrphanedFilesQueueAll,
data: {},
});

void this.jobRepository.queue({
name: JobName.IntegrityMissingFilesQueueAll,
data: {},
});

void this.jobRepository.queue({
name: JobName.IntegrityChecksumFiles,
data: {},
});
}, 1000);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this before merge.


await this.eventRepository.emit('AssetTrashAll', {
assetIds: ids,
userId: '', // ???
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing user ID

for (const property of properties) {
const reports = this.integrityRepository.streamIntegrityReportsByProperty(property, type);
for await (const report of chunk(reports, JOBS_LIBRARY_PAGINATION_SIZE)) {
// todo: queue sub-job here instead?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unimplemented

Comment on lines 173 to 191
// .onRef('integrity_report.path', '=', 'allPaths.path')
)
.select([
'allPaths.path as path',
'allPaths.assetId',
'allPaths.fileAssetId',
'integrity_report.path as reportId',
])
.stream();
}

@GenerateSql({ params: [DummyValue.DATE, DummyValue.DATE], stream: true })
streamAssetChecksums(startMarker?: Date, endMarker?: Date) {
return this.db
.selectFrom('asset')
.leftJoin('integrity_report', (join) =>
join
.onRef('integrity_report.assetId', '=', 'asset.id')
// .onRef('integrity_report.path', '=', 'asset.originalPath')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray comments

eb.ref('asset.id').$castTo<string | null>().as('assetId'),
sql<string | null>`null::uuid`.as('fileAssetId'),
])
.unionAll(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May have a major performance impact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current impact:

Type: Nested Loop (Left); ; Cost: 0.00 - 51.62
	Type: Append; ; Cost: 0.00 - 41.43
		Type: Seq Scan; Rel: asset ; Cost: 0.00 - 11.60
		Type: Seq Scan; Rel: asset ; Cost: 0.00 - 12.00
		Type: Seq Scan; Rel: asset_file ; Cost: 0.00 - 15.20
	Type: Materialize; ; Cost: 0.00 - 1.05
		Type: Seq Scan; Rel: integrity_report ; Cost: 0.00 - 1.05

Without the union:

Type: Hash Join (Left); ; Cost: 1.06 - 18.22
	Type: Seq Scan; Rel: asset_file ; Cost: 0.00 - 15.20
	Type: Hash; ; Cost: 1.05 - 1.05
		Type: Seq Scan; Rel: integrity_report ; Cost: 0.00 - 1.05

Doesn't appear much performance is left on the table

  • No indication the entire asset or asset_file table is loaded into memory but I could be wrong
  • Integrity report is loaded entirely into memory in either case
  • Likely negligible performance difference between Nested Loop/Hash Join? This seems like something the database optimises itself on the fly, I'm not too knowledgeable on this -- https://stackoverflow.com/a/49024533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants