Skip to content

Conversation

@renatgalimov
Copy link

When the second vector fails to parse in ensure_vector_match(), the cleanup function for the first vector was called with 'a' (void**) instead of 'a' (void). This caused sqlite3_free to be called with a stack address instead of the heap-allocated vector, resulting in a crash:

malloc: Non-aligned pointer being freed                                                                                                                                                                           
Fatal error 6: Aborted                                                                                                                                                                                            

The fix dereferences the pointer correctly, matching how cleanup is done in other error paths.

vlasky and others added 28 commits November 28, 2025 17:10
Implements vec0Rename() to properly rename vec0 virtual tables and all
associated shadow tables (chunks, info, rowids, vector_chunks, auxiliary,
metadatachunks, metadatatext).

Merged from upstream PR asg017#203 by wilbertharriman.
Fixes issue asg017#43.

Co-Authored-By: Wilbert Harriman <wilbert@harriman.id>
Implements distance_cosine_bit() to calculate cosine similarity for bit
vectors using popcount operations. Previously, cosine distance would error
on binary vectors.

Uses optimized u64 popcount when dimensions are divisible by 64, otherwise
falls back to u8 hamming table lookup.

Merged from upstream PR asg017#212 by wilbertharriman.

Co-Authored-By: Wilbert Harriman <wilbert@harriman.id>
Implements proper cleanup when deleting rows to prevent memory/storage leaks:
- vec0Update_Delete_ClearRowid(): Zeros out rowid slot in chunks.rowids blob
- vec0Update_Delete_ClearVectors(): Zeros out vector data in all vector_chunks

Additional improvements:
- Add -undefined dynamic_lookup flag for macOS builds (standard for SQLite extensions)
- Add test-snapshots-update Makefile target
- Add comprehensive tests to verify delete properly clears bytes
- Add .venv to .gitignore

Merged from upstream PR asg017#243 by marcus-pousette.
Fixes issues asg017#54, asg017#178, asg017#220.

Co-Authored-By: Marcus Pousette <marcus.pousette@gmail.com>
Updates GitHub Actions workflows to fix deprecated runner issues and
improve ARM64 builds:

Release workflow:
- Fix x86_64 Linux: ubuntu-20.04 → ubuntu-latest (20.04 is deprecated)
- Use native ARM64 runner: ubuntu-24.04-arm instead of cross-compilation
- Remove gcc-aarch64-linux-gnu cross-compiler dependency
- Build natively on ARM for better SIMD support and reliability

Test workflow:
- Fix x86_64 Linux: ubuntu-20.04 → ubuntu-latest (20.04 is deprecated)

Benefits:
- Prevents build failures from deprecated runners
- Native ARM builds are more reliable and faster
- Better testing of ARM NEON SIMD optimizations
- Easier debugging if ARM-specific issues arise

Merged from upstream PR asg017#228 by anuraaga.

Co-Authored-By: Anuraag (Rag) Agrawal <anuraaga@gmail.com>
Implements a custom 'optimize' command (similar to SQLite FTS5) that allows
reclaiming disk space after DELETE operations:

  INSERT INTO vec_table(vec_table) VALUES ('optimize');
  VACUUM;

How it works:
- Identifies fragmented chunks from deletions
- Migrates all vectors to new, contiguous chunks
- Preserves partition keys and metadata during migration
- Deletes old fragmented chunks
- Allows VACUUM to reclaim freed disk space

Implementation details:
- Adds hidden 'table_name' column to trigger special insert commands
- vec0Update_SpecialInsert_Optimize(): Main optimization logic
  - Iterates all rows and copies to new chunks
  - Copies metadata values to new chunk positions
  - Cleans up old chunks and vector data
- vec0Update_SpecialInsert_OptimizeCopyMetadata(): Handles metadata migration

Schema improvements:
- Change PRIMARY KEY → INTEGER PRIMARY KEY in shadow tables
- Makes rowid an alias instead of separate index
- Reduces storage overhead and improves performance

Use cases:
- After bulk deletions to reclaim disk space
- Periodic maintenance to defragment vector storage
- Before backups to minimize database file size

Caveats:
- Can be slow on large tables (rebuilds all chunks)
- Should be run during maintenance windows
- Not transaction-safe for concurrent reads
- Requires VACUUM afterward to actually free space

Merged from upstream PR asg017#210 by wilbertharriman.
Fixes issue asg017#185.

Co-Authored-By: Wilbert Harriman <wilbert@harriman.id>
Implements WHERE constraints on the distance column in KNN queries, enabling
cursor-based pagination and range queries. Based on upstream PR asg017#166 by Alex
Garcia with completion and enhancements.

Features:
- Supports GT, GE, LT, LE operators on distance column
- Works with all vector types (float32, int8, bit)
- Compatible with partition keys, metadata, and auxiliary columns
- Multiple constraints can be combined (e.g., distance >= 3.0 AND distance <= 6.0)

Implementation:
- Added VEC0_IDXSTR_KIND_KNN_DISTANCE_CONSTRAINT to idxStr encoding
- Distance filtering applied during KNN search before top-k selection
- Cast f64 to f32 for comparison to match internal precision

Enhancements over original PR:
- Fixed variable shadowing in inner loops (i -> j)
- Added comprehensive test coverage (15 tests)
- Fixed bit/int8 vector type handling in tests
- Documented precision handling and pagination caveats

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Alex Garcia <alex@alex.garcia>
Co-Authored-By: Claude <noreply@anthropic.com>
Enables direct installation from GitHub for multiple languages:

Package Configurations Added:
- Go bindings/go/cgo: CGO-based bindings with Auto() and serialization helpers
- pyproject.toml + setup.py: Python package (pip install git+...)
- package.json: Node.js package (npm install vlasky/sqlite-vec)
- sqlite-vec.gemspec: Ruby gem (gem install from git)
- Cargo.toml + src/lib.rs: Rust crate (cargo add --git)

All packages support:
- Installing latest code from main branch
- Installing specific versions via git tags (e.g., v0.2.0-alpha)

Documentation:
- Updated README.md with installation table showing both latest and versioned installs
- Added Python note about loadable extension support requirement (many Python builds lack --enable-loadable-sqlite-extensions)
- Recommended using 'uv' for virtual environments as it uses system Python with extension support
- Created CHANGELOG.md documenting all merged PRs and improvements
- Updated CLAUDE.md for fork (version, bindings, release process)
- Bumped VERSION to 0.2.0-alpha

README Improvements:
- Added vector type syntax examples (float[N], int8[N], bit[N])
- Expanded examples from 4 to 20 vectors for meaningful demonstrations
- Added explanation of MATCH operator
- Added note about k/LIMIT requirement with rationale
- Improved "What's New" section with distance constraints, pagination, and optimize examples
- All examples tested and verified with actual output
- Clarified KNN terminology (k parameter vs LIMIT)

Languages now installable from GitHub:
✅ Go (CGO bindings via go get)
✅ Python (via pip + git)
✅ Rust (via Cargo.toml git dependency)
✅ Node.js (via npm + git)
✅ Ruby (via Gemfile git dependency)

All package configurations tested and validated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
__popcnt64 intrinsic is not available on 32-bit Windows.
Split u64 into two u32 values and use __popcnt on each half.
The ARM/ARM64 fallback was taking unsigned int instead of u64,
which truncated the upper 32 bits when called from distance_hamming_u64.
This fix makes it consistent with the x86 32-bit implementation.
Implements two-path LIKE filtering with proper case-insensitive matching:
- Fast path: Prefix-only patterns ('pattern%') use 12-byte cache optimization
- Slow path: Complex patterns ('%pattern', 'a%b', etc.) use sqlite3_strlike()

Changes:
- Added VEC0_METADATA_OPERATOR_LIKE enum value
- Added SQLITE_INDEX_CONSTRAINT_LIKE handling in xBestIndex
- Implemented vec0_is_prefix_only_like_pattern() helper
- Implemented cache-optimized fast path using sqlite3_strnicmp() for case-insensitive matching
- Implemented full-string fetch slow path for complex patterns
- Added validation: LIKE only allowed on TEXT metadata columns

Tests:
- Added test_like() covering all pattern types (prefix, suffix, contains, wildcards)
- Added test_like_case_insensitive() verifying SQLite's case-insensitive semantics
- Added test_like_boundary_conditions() testing 12-byte cache boundary edge cases
- Updated test_knn() to demonstrate LIKE working
- All tests pass (75 passed, 4 skipped, 0 failures)
strtod() respects LC_NUMERIC locale, causing JSON parsing to fail in
non-C locales (French, German, etc.) where comma is the decimal separator.

Implemented custom locale-independent strtod_c() parser:
- Always uses '.' as decimal separator per JSON spec
- Handles sign, integer, fractional, and exponent parts
- No platform dependencies or thread-safety issues
- Simple and portable (~87 lines)

Added test_vec0_locale_independent() to verify parsing works under
non-C locales. All tests pass (73 passed, 4 skipped).

Fixes asg017#241 and asg017#168
Remove platform-specific typedef fallback block that caused redefinition
errors on musl-based systems. Since stdint.h is already included and
provides uint8_t/uint16_t/uint64_t on all modern platforms, the fallback
is unnecessary.

Tested on Alpine Linux (musl) and Ubuntu (glibc).

Related to upstream PR asg017#199
Updates version across all package files and documentation.
Updates test snapshot for shadow table ordering.

Co-Authored-By: Claude <noreply@anthropic.com>
Implements GLOB pattern matching for TEXT metadata columns with:
- Prefix-only optimization for patterns like 'abc*'
- Full pattern support using sqlite3_strglob()
- Case-sensitive matching (unlike LIKE)
- Single-character wildcard '?' support
- Error handling for non-TEXT columns

Tests cover prefix patterns, complex patterns, case sensitivity,
boundary conditions, and error cases.

Co-Authored-By: Claude <noreply@anthropic.com>
…olumns

Implements Issue asg017#190 by adding syntactic support for IS operators without
adding full NULL support to the metadata system:
- IS behaves like = for non-NULL values
- IS NOT behaves like != for non-NULL values
- IS NULL always returns false (metadata doesn't support NULL)
- IS NOT NULL always returns true (metadata doesn't support NULL)

Implementation:
- Added operator constants: VEC0_METADATA_OPERATOR_IS, ISNOT, ISNULL, ISNOTNULL
- Modified vec0BestIndex to recognize the four SQLITE_INDEX_CONSTRAINT types
- Updated boolean validation to allow IS operators
- Implemented filtering for INTEGER, FLOAT, BOOLEAN, and TEXT metadata
- Added unreachable IS/ISNOT cases in text filtering to eliminate -Wswitch warnings

Tests added to test-metadata.py cover all metadata types, long text strings,
and verify IS/= and IS NOT/!= equivalence.

Co-Authored-By: Claude <noreply@anthropic.com>
Fixes 13 compilation warnings including 1 critical bug:

Critical fix:
- Line 6509: Changed metadataInIdx from size_t to int to fix logic bug
  where -1 wrapped to SIZE_MAX, causing error check to always fail

Sign comparison fixes:
- Line 6282: Added size_t casts for assert comparison
- Lines 6810, 6814: Cast sizeof results to int
- Line 7189: Cast sizeof result to i64
- Line 7445: Cast strlen result to int in assert

Uninitialized variable fixes:
- Line 7283: Initialize result to 0.0f
- Lines 8439-8440: Initialize n and offset to 0, add default case

Build now produces zero warnings. All tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
Updates version across documentation files and header.

Changes in v0.2.2-alpha:
- GLOB operator for text metadata columns (asg017#191)
- IS/IS NOT/IS NULL/IS NOT NULL operators (asg017#190)
- All compilation warnings fixed

Co-Authored-By: Claude <noreply@anthropic.com>
Move the -lm flag from CFLAGS to a new LDLIBS variable and place it
at the end of the linker command. This ensures libm is properly linked
on Linux systems.

The linker processes arguments left-to-right, so library flags must
come after source files that reference their symbols. Previously,
-lm appeared before the source file, causing "undefined symbol: sqrtf"
errors on some Linux distributions.

Cherry-picked from upstream PR asg017#252

Co-Authored-By: wardviaene <ward.viaene@gmail.com>
- Add configurable install paths via INSTALL_PREFIX, INSTALL_LIB_DIR,
  INSTALL_INCLUDE_DIR, and INSTALL_BIN_DIR variables
- Add EXT_CFLAGS to capture user-provided CFLAGS and CPPFLAGS
- Hide internal symbols with -fvisibility=hidden, exposing only the
  public API (sqlite3_vec_init, sqlite3_vec_numpy_init,
  sqlite3_vec_static_blobs_init)
- Remove sudo from install target (users run sudo make install if needed)

Inspired by asg017#149

Co-Authored-By: Harry-Chen <cjc-2008@hotmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This supports upstream PR asg017#254 which needs to pass
-Wl,-z,max-page-size=16384 for Android 16KB page support. Using
LDFLAGS instead of CFLAGS prevents linker flags from leaking into
compile-only steps, which could cause warnings or errors with
stricter toolchains.

Adds EXT_LDFLAGS variable to the loadable and cli targets, enabling
users to pass linker-specific flags separately from compiler flags.

Co-Authored-By: Oscar Franco <ospfranco@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Complete unfinished sentence in KNN docs describing manual method
  trade-offs (slower, more space, but more flexible)
- Fill in TODO placeholders in Matryoshka docs with paper date, title,
  and explanation of the naming origin (Russian nesting dolls)

Cherry-picked from upstream PRs asg017#208 and asg017#209

Co-Authored-By: punkish <punkish@users.noreply.github.com>
/examples/simple-lua/ contains a demo script and runner.

Incorporates upstream PR asg017#237 with the following bugfixes:

Extension loading:
- Fix return value check: lsqlite3's load_extension returns true on
  success, not sqlite3.OK (which is 0). Changed from `if ok then` to
  `if ok and result then` to properly detect successful loads.
- Add vec0 naming paths alongside sqlite-vec paths for this fork.

IEEE 754 float serialization (float_to_bytes):
- Switch from half-round-up to round-half-to-even (banker's rounding)
  for IEEE 754 compliance. This prevents systematic bias when
  processing large datasets where half-values accumulate.
- Handle special cases: NaN, Inf, -Inf, and -0.0 which the original
  implementation did not support.
- Fix subnormal number encoding: corrected formula from 2^(exp+126)
  to 2^(exp+127) so minimum subnormal 2^(-149) encodes correctly.
- Add mantissa overflow carry: when rounding causes mantissa >= 2^23,
  carry into exponent field.
- Add exponent overflow handling: values too large now return ±Inf
  instead of producing corrupted output.
- Use epsilon comparison (1e-9) for 0.5 tie detection to handle
  floating-point precision issues.

JSON serialization (serialize_json):
- Error on NaN and Infinity values which are not valid JSON.
- Convert -0.0 to 0.0 for JSON compatibility.

Co-Authored-By: karminski <code.karminski@outlook.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Incorporates upstream PR asg017#215 with additional fix:
- Fix extra closing bracket (PR asg017#215)
- Fix incorrect variable name: `query` → `embedding`

Co-Authored-By: Nicolas Buduroi <nbuduroi@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Cargo.toml: 0.2.1-alpha → 0.2.4-alpha
- bindings/rust/Cargo.toml: 0.2.0-alpha → 0.2.4-alpha
- package.json: 0.2.1-alpha → 0.2.4-alpha
- pyproject.toml: 0.2.0a0 → 0.2.4a0
- sqlite-vec.gemspec: 0.2.0.alpha → 0.2.4.alpha

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When the second vector fails to parse in ensure_vector_match(), the cleanup function for the first vector was called with 'a' (void**) instead of '*a' (void*). This caused sqlite3_free to be called with a stack address instead of the heap-allocated vector, resulting in a crash:

    malloc: Non-aligned pointer being freed
    Fatal error 6: Aborted

The fix dereferences the pointer correctly, matching how cleanup is
done in other error paths.

This fix has a unit test that will crash without the patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants