-
Notifications
You must be signed in to change notification settings - Fork 270
fix: pass correct pointer to cleanup in ensure_vector_match error path #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
renatgalimov
wants to merge
29
commits into
asg017:main
Choose a base branch
from
renatgalimov:patch-1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implements vec0Rename() to properly rename vec0 virtual tables and all associated shadow tables (chunks, info, rowids, vector_chunks, auxiliary, metadatachunks, metadatatext). Merged from upstream PR asg017#203 by wilbertharriman. Fixes issue asg017#43. Co-Authored-By: Wilbert Harriman <wilbert@harriman.id>
Implements distance_cosine_bit() to calculate cosine similarity for bit vectors using popcount operations. Previously, cosine distance would error on binary vectors. Uses optimized u64 popcount when dimensions are divisible by 64, otherwise falls back to u8 hamming table lookup. Merged from upstream PR asg017#212 by wilbertharriman. Co-Authored-By: Wilbert Harriman <wilbert@harriman.id>
Implements proper cleanup when deleting rows to prevent memory/storage leaks: - vec0Update_Delete_ClearRowid(): Zeros out rowid slot in chunks.rowids blob - vec0Update_Delete_ClearVectors(): Zeros out vector data in all vector_chunks Additional improvements: - Add -undefined dynamic_lookup flag for macOS builds (standard for SQLite extensions) - Add test-snapshots-update Makefile target - Add comprehensive tests to verify delete properly clears bytes - Add .venv to .gitignore Merged from upstream PR asg017#243 by marcus-pousette. Fixes issues asg017#54, asg017#178, asg017#220. Co-Authored-By: Marcus Pousette <marcus.pousette@gmail.com>
Updates GitHub Actions workflows to fix deprecated runner issues and improve ARM64 builds: Release workflow: - Fix x86_64 Linux: ubuntu-20.04 → ubuntu-latest (20.04 is deprecated) - Use native ARM64 runner: ubuntu-24.04-arm instead of cross-compilation - Remove gcc-aarch64-linux-gnu cross-compiler dependency - Build natively on ARM for better SIMD support and reliability Test workflow: - Fix x86_64 Linux: ubuntu-20.04 → ubuntu-latest (20.04 is deprecated) Benefits: - Prevents build failures from deprecated runners - Native ARM builds are more reliable and faster - Better testing of ARM NEON SIMD optimizations - Easier debugging if ARM-specific issues arise Merged from upstream PR asg017#228 by anuraaga. Co-Authored-By: Anuraag (Rag) Agrawal <anuraaga@gmail.com>
Implements a custom 'optimize' command (similar to SQLite FTS5) that allows
reclaiming disk space after DELETE operations:
INSERT INTO vec_table(vec_table) VALUES ('optimize');
VACUUM;
How it works:
- Identifies fragmented chunks from deletions
- Migrates all vectors to new, contiguous chunks
- Preserves partition keys and metadata during migration
- Deletes old fragmented chunks
- Allows VACUUM to reclaim freed disk space
Implementation details:
- Adds hidden 'table_name' column to trigger special insert commands
- vec0Update_SpecialInsert_Optimize(): Main optimization logic
- Iterates all rows and copies to new chunks
- Copies metadata values to new chunk positions
- Cleans up old chunks and vector data
- vec0Update_SpecialInsert_OptimizeCopyMetadata(): Handles metadata migration
Schema improvements:
- Change PRIMARY KEY → INTEGER PRIMARY KEY in shadow tables
- Makes rowid an alias instead of separate index
- Reduces storage overhead and improves performance
Use cases:
- After bulk deletions to reclaim disk space
- Periodic maintenance to defragment vector storage
- Before backups to minimize database file size
Caveats:
- Can be slow on large tables (rebuilds all chunks)
- Should be run during maintenance windows
- Not transaction-safe for concurrent reads
- Requires VACUUM afterward to actually free space
Merged from upstream PR asg017#210 by wilbertharriman.
Fixes issue asg017#185.
Co-Authored-By: Wilbert Harriman <wilbert@harriman.id>
Implements WHERE constraints on the distance column in KNN queries, enabling cursor-based pagination and range queries. Based on upstream PR asg017#166 by Alex Garcia with completion and enhancements. Features: - Supports GT, GE, LT, LE operators on distance column - Works with all vector types (float32, int8, bit) - Compatible with partition keys, metadata, and auxiliary columns - Multiple constraints can be combined (e.g., distance >= 3.0 AND distance <= 6.0) Implementation: - Added VEC0_IDXSTR_KIND_KNN_DISTANCE_CONSTRAINT to idxStr encoding - Distance filtering applied during KNN search before top-k selection - Cast f64 to f32 for comparison to match internal precision Enhancements over original PR: - Fixed variable shadowing in inner loops (i -> j) - Added comprehensive test coverage (15 tests) - Fixed bit/int8 vector type handling in tests - Documented precision handling and pagination caveats 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Alex Garcia <alex@alex.garcia> Co-Authored-By: Claude <noreply@anthropic.com>
Enables direct installation from GitHub for multiple languages: Package Configurations Added: - Go bindings/go/cgo: CGO-based bindings with Auto() and serialization helpers - pyproject.toml + setup.py: Python package (pip install git+...) - package.json: Node.js package (npm install vlasky/sqlite-vec) - sqlite-vec.gemspec: Ruby gem (gem install from git) - Cargo.toml + src/lib.rs: Rust crate (cargo add --git) All packages support: - Installing latest code from main branch - Installing specific versions via git tags (e.g., v0.2.0-alpha) Documentation: - Updated README.md with installation table showing both latest and versioned installs - Added Python note about loadable extension support requirement (many Python builds lack --enable-loadable-sqlite-extensions) - Recommended using 'uv' for virtual environments as it uses system Python with extension support - Created CHANGELOG.md documenting all merged PRs and improvements - Updated CLAUDE.md for fork (version, bindings, release process) - Bumped VERSION to 0.2.0-alpha README Improvements: - Added vector type syntax examples (float[N], int8[N], bit[N]) - Expanded examples from 4 to 20 vectors for meaningful demonstrations - Added explanation of MATCH operator - Added note about k/LIMIT requirement with rationale - Improved "What's New" section with distance constraints, pagination, and optimize examples - All examples tested and verified with actual output - Clarified KNN terminology (k parameter vs LIMIT) Languages now installable from GitHub: ✅ Go (CGO bindings via go get) ✅ Python (via pip + git) ✅ Rust (via Cargo.toml git dependency) ✅ Node.js (via npm + git) ✅ Ruby (via Gemfile git dependency) All package configurations tested and validated. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
__popcnt64 intrinsic is not available on 32-bit Windows. Split u64 into two u32 values and use __popcnt on each half.
The ARM/ARM64 fallback was taking unsigned int instead of u64, which truncated the upper 32 bits when called from distance_hamming_u64. This fix makes it consistent with the x86 32-bit implementation.
Implements two-path LIKE filtering with proper case-insensitive matching:
- Fast path: Prefix-only patterns ('pattern%') use 12-byte cache optimization
- Slow path: Complex patterns ('%pattern', 'a%b', etc.) use sqlite3_strlike()
Changes:
- Added VEC0_METADATA_OPERATOR_LIKE enum value
- Added SQLITE_INDEX_CONSTRAINT_LIKE handling in xBestIndex
- Implemented vec0_is_prefix_only_like_pattern() helper
- Implemented cache-optimized fast path using sqlite3_strnicmp() for case-insensitive matching
- Implemented full-string fetch slow path for complex patterns
- Added validation: LIKE only allowed on TEXT metadata columns
Tests:
- Added test_like() covering all pattern types (prefix, suffix, contains, wildcards)
- Added test_like_case_insensitive() verifying SQLite's case-insensitive semantics
- Added test_like_boundary_conditions() testing 12-byte cache boundary edge cases
- Updated test_knn() to demonstrate LIKE working
- All tests pass (75 passed, 4 skipped, 0 failures)
strtod() respects LC_NUMERIC locale, causing JSON parsing to fail in non-C locales (French, German, etc.) where comma is the decimal separator. Implemented custom locale-independent strtod_c() parser: - Always uses '.' as decimal separator per JSON spec - Handles sign, integer, fractional, and exponent parts - No platform dependencies or thread-safety issues - Simple and portable (~87 lines) Added test_vec0_locale_independent() to verify parsing works under non-C locales. All tests pass (73 passed, 4 skipped). Fixes asg017#241 and asg017#168
Remove platform-specific typedef fallback block that caused redefinition errors on musl-based systems. Since stdint.h is already included and provides uint8_t/uint16_t/uint64_t on all modern platforms, the fallback is unnecessary. Tested on Alpine Linux (musl) and Ubuntu (glibc). Related to upstream PR asg017#199
Updates version across all package files and documentation. Updates test snapshot for shadow table ordering. Co-Authored-By: Claude <noreply@anthropic.com>
Implements GLOB pattern matching for TEXT metadata columns with: - Prefix-only optimization for patterns like 'abc*' - Full pattern support using sqlite3_strglob() - Case-sensitive matching (unlike LIKE) - Single-character wildcard '?' support - Error handling for non-TEXT columns Tests cover prefix patterns, complex patterns, case sensitivity, boundary conditions, and error cases. Co-Authored-By: Claude <noreply@anthropic.com>
…olumns Implements Issue asg017#190 by adding syntactic support for IS operators without adding full NULL support to the metadata system: - IS behaves like = for non-NULL values - IS NOT behaves like != for non-NULL values - IS NULL always returns false (metadata doesn't support NULL) - IS NOT NULL always returns true (metadata doesn't support NULL) Implementation: - Added operator constants: VEC0_METADATA_OPERATOR_IS, ISNOT, ISNULL, ISNOTNULL - Modified vec0BestIndex to recognize the four SQLITE_INDEX_CONSTRAINT types - Updated boolean validation to allow IS operators - Implemented filtering for INTEGER, FLOAT, BOOLEAN, and TEXT metadata - Added unreachable IS/ISNOT cases in text filtering to eliminate -Wswitch warnings Tests added to test-metadata.py cover all metadata types, long text strings, and verify IS/= and IS NOT/!= equivalence. Co-Authored-By: Claude <noreply@anthropic.com>
Fixes 13 compilation warnings including 1 critical bug: Critical fix: - Line 6509: Changed metadataInIdx from size_t to int to fix logic bug where -1 wrapped to SIZE_MAX, causing error check to always fail Sign comparison fixes: - Line 6282: Added size_t casts for assert comparison - Lines 6810, 6814: Cast sizeof results to int - Line 7189: Cast sizeof result to i64 - Line 7445: Cast strlen result to int in assert Uninitialized variable fixes: - Line 7283: Initialize result to 0.0f - Lines 8439-8440: Initialize n and offset to 0, add default case Build now produces zero warnings. All tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
Updates version across documentation files and header. Changes in v0.2.2-alpha: - GLOB operator for text metadata columns (asg017#191) - IS/IS NOT/IS NULL/IS NOT NULL operators (asg017#190) - All compilation warnings fixed Co-Authored-By: Claude <noreply@anthropic.com>
Move the -lm flag from CFLAGS to a new LDLIBS variable and place it at the end of the linker command. This ensures libm is properly linked on Linux systems. The linker processes arguments left-to-right, so library flags must come after source files that reference their symbols. Previously, -lm appeared before the source file, causing "undefined symbol: sqrtf" errors on some Linux distributions. Cherry-picked from upstream PR asg017#252 Co-Authored-By: wardviaene <ward.viaene@gmail.com>
- Add configurable install paths via INSTALL_PREFIX, INSTALL_LIB_DIR, INSTALL_INCLUDE_DIR, and INSTALL_BIN_DIR variables - Add EXT_CFLAGS to capture user-provided CFLAGS and CPPFLAGS - Hide internal symbols with -fvisibility=hidden, exposing only the public API (sqlite3_vec_init, sqlite3_vec_numpy_init, sqlite3_vec_static_blobs_init) - Remove sudo from install target (users run sudo make install if needed) Inspired by asg017#149 Co-Authored-By: Harry-Chen <cjc-2008@hotmail.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This supports upstream PR asg017#254 which needs to pass -Wl,-z,max-page-size=16384 for Android 16KB page support. Using LDFLAGS instead of CFLAGS prevents linker flags from leaking into compile-only steps, which could cause warnings or errors with stricter toolchains. Adds EXT_LDFLAGS variable to the loadable and cli targets, enabling users to pass linker-specific flags separately from compiler flags. Co-Authored-By: Oscar Franco <ospfranco@users.noreply.github.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Complete unfinished sentence in KNN docs describing manual method trade-offs (slower, more space, but more flexible) - Fill in TODO placeholders in Matryoshka docs with paper date, title, and explanation of the naming origin (Russian nesting dolls) Cherry-picked from upstream PRs asg017#208 and asg017#209 Co-Authored-By: punkish <punkish@users.noreply.github.com>
/examples/simple-lua/ contains a demo script and runner. Incorporates upstream PR asg017#237 with the following bugfixes: Extension loading: - Fix return value check: lsqlite3's load_extension returns true on success, not sqlite3.OK (which is 0). Changed from `if ok then` to `if ok and result then` to properly detect successful loads. - Add vec0 naming paths alongside sqlite-vec paths for this fork. IEEE 754 float serialization (float_to_bytes): - Switch from half-round-up to round-half-to-even (banker's rounding) for IEEE 754 compliance. This prevents systematic bias when processing large datasets where half-values accumulate. - Handle special cases: NaN, Inf, -Inf, and -0.0 which the original implementation did not support. - Fix subnormal number encoding: corrected formula from 2^(exp+126) to 2^(exp+127) so minimum subnormal 2^(-149) encodes correctly. - Add mantissa overflow carry: when rounding causes mantissa >= 2^23, carry into exponent field. - Add exponent overflow handling: values too large now return ±Inf instead of producing corrupted output. - Use epsilon comparison (1e-9) for 0.5 tie detection to handle floating-point precision issues. JSON serialization (serialize_json): - Error on NaN and Infinity values which are not valid JSON. - Convert -0.0 to 0.0 for JSON compatibility. Co-Authored-By: karminski <code.karminski@outlook.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Incorporates upstream PR asg017#215 with additional fix: - Fix extra closing bracket (PR asg017#215) - Fix incorrect variable name: `query` → `embedding` Co-Authored-By: Nicolas Buduroi <nbuduroi@gmail.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Cargo.toml: 0.2.1-alpha → 0.2.4-alpha - bindings/rust/Cargo.toml: 0.2.0-alpha → 0.2.4-alpha - package.json: 0.2.1-alpha → 0.2.4-alpha - pyproject.toml: 0.2.0a0 → 0.2.4a0 - sqlite-vec.gemspec: 0.2.0.alpha → 0.2.4.alpha Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When the second vector fails to parse in ensure_vector_match(), the cleanup function for the first vector was called with 'a' (void**) instead of '*a' (void*). This caused sqlite3_free to be called with a stack address instead of the heap-allocated vector, resulting in a crash:
malloc: Non-aligned pointer being freed
Fatal error 6: Aborted
The fix dereferences the pointer correctly, matching how cleanup is
done in other error paths.
This fix has a unit test that will crash without the patch.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When the second vector fails to parse in ensure_vector_match(), the cleanup function for the first vector was called with 'a' (void**) instead of 'a' (void). This caused sqlite3_free to be called with a stack address instead of the heap-allocated vector, resulting in a crash:
The fix dereferences the pointer correctly, matching how cleanup is done in other error paths.