Skip to content

Indexing "invalid" HNSW vectors does not trigger error until merge and/or CheckIndex #15540

@hossman

Description

@hossman

Description

IndexWriter will happily allow applications to index documents containing KnnByteVectorField (and presumably KnnFloatVectorField) instances containing "invalid" values.

This invalid vectors will not trigger an Exception from either IndexWriter.addDocument() nor IndexWriter.commit() -- they will only cause problem down the road during index merges, or when running CheckIndex.

A trivial test case can be found in: lucene.invalid-vector-indexing-with-out-failure.test.patch (which uses COSINE sim and indexes new byte[] {0,0,0,...,0}... I'm not sure if similar problems will happen with other vector+sim combos and/or non-normalized vectors when using DOC_PRODUCT?)

AFAIK this test should fail on any system regardless of seed.

The nature of the failure can be changed by modifying tests.asserts to influence whether:

The problem triggers an assertion in in the KnnVectorsWriter.merge call stack.

   >     java.lang.AssertionError: Nodes are added in the incorrect order! Comparing NaN to [1.0]
   >         at __randomizedtesting.SeedInfo.seed([75D48A8DF27FDF07:8BF93348CADA4842]:0)
   >         at org.apache.lucene.util.hnsw.NeighborArray.addInOrder(NeighborArray.java:80)
   >         at org.apache.lucene.util.hnsw.HnswGraphBuilder.popToScratch(HnswGraphBuilder.java:461)
   >         at org.apache.lucene.util.hnsw.HnswGraphBuilder.addGraphNodeInternal(HnswGraphBuilder.java:286)
   >         at org.apache.lucene.util.hnsw.HnswGraphBuilder.addGraphNode(HnswGraphBuilder.java:325)
   >         at org.apache.lucene.util.hnsw.MergingHnswGraphBuilder.updateGraph(MergingHnswGraphBuilder.java:153)
   >         at org.apache.lucene.util.hnsw.MergingHnswGraphBuilder.build(MergingHnswGraphBuilder.java:128)
   >         at org.apache.lucene.util.hnsw.IncrementalHnswGraphMerger.merge(IncrementalHnswGraphMerger.java:214)
   >         at org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsWriter.mergeOneField(Lucene99HnswVectorsWriter.java:444)
   >         at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.mergeOneField(PerFieldKnnVectorsFormat.java:128)
   >         at org.apache.lucene.codecs.KnnVectorsWriter.merge(KnnVectorsWriter.java:105)
   >         at org.apache.lucene.index.SegmentMerger.mergeVectorValues(SegmentMerger.java:272)
   >         at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:315)
   >         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:159)
   >         at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5276)
   >         at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4739)
   >         at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6538)
   >         at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:38)
   >         at org.apache.lucene.index.IndexWriter.executeMerge(IndexWriter.java:2333)
   >         at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2328)
   >         at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:2163)
   >         at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:2111)
   >         at org.apache.lucene.util.hnsw.TestZeroVectorHnswGraphIndexing.testIndexingAndMerging(TestZeroVectorHnswGraphIndexing.java:53)
...
    Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.util.hnsw.TestZeroVectorHnswGraphIndexing.testIndexingAndMerging" -Ptests.asserts=true -Ptests.file.encoding=ISO-8859-1 -Ptests.gui=false "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.jvms=5 -Ptests.seed=75D48A8DF27FDF07 -Ptests.vectorsize=512

OR ... The problem sneaks through all indexing & merging and only causes an issue when MockDirectoryWrapper.close() invokes CheckIndex

   >     org.apache.lucene.index.CheckIndex$CheckIndexException: Field "bytes" failed to search k nearest neighbors
   >         at __randomizedtesting.SeedInfo.seed([9F1658AC2C860676:613BE16914239133]:0)
   >         at app//org.apache.lucene.index.CheckIndex.checkByteVectorValues(CheckIndex.java:3162)
   >         at app//org.apache.lucene.index.CheckIndex.testVectors(CheckIndex.java:2855)
   >         at app//org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1123)
   >         at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:823)
   >         at app//org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:593)
   >         at app//org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:333)
   >         at app//org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:917)
   >         at app//org.apache.lucene.util.hnsw.TestZeroVectorHnswGraphIndexing.testIndexingAndMerging(TestZeroVectorHnswGraphIndexing.java:56)
...
    Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.util.hnsw.TestZeroVectorHnswGraphIndexing.testIndexingAndMerging" -Ptests.asserts=false -Ptests.file.encoding=US-ASCII -Ptests.gui=false "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.jvms=5 -Ptests.seed=9F1658AC2C860676 -Ptests.vectorsize=128

Version and environment details

This problem affects main, and branch_10x, back (at least) as far as 10.3.2 where it was discovered due to a randomized Solr test that could inadvertently generate an "all zero" vector (SOLR-17736)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions