Skip to content

Conversation

@Abdulkbk
Copy link
Contributor

fixes #10337 (partially)

Change Description

In this PR, we rewrite the query for IsPublicV1Node, which returns a boolean based on if a node has a public or private channel. This particularly fixes an issue sqlite backend as it struggles to efficiently use multiple indexes across seperate columns in one query.

Step to test

cd graph/db

go test -tags test_db_sqlite  -run TestNodeIsPublic -v # ensure we still correctly distinguish between pub & priv

@gemini-code-assist
Copy link

Summary of Changes

Hello @Abdulkbk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a significant performance bottleneck in the IsPublicV1Node query, which was particularly inefficient for sqlite databases due to its struggle with complex OR conditions across multiple indexed columns. By restructuring the query to use UNION ALL, the change optimizes how the database utilizes indexes, leading to more efficient execution and resolving the identified performance issue.

Highlights

  • Query Optimization: The IsPublicV1Node query has been rewritten to improve its efficiency, specifically targeting performance issues with the sqlite backend.
  • SQL Query Refactoring: The original query's OR condition across joined tables (n.id = c.node_id_1 OR n.id = c.node_id_2) has been refactored into two separate SELECT statements combined with UNION ALL. This change allows for better utilization of database indexes.
  • Dependency Management: A local replace directive for github.com/lightningnetwork/lnd/sqldb has been added to go.mod, pointing to the local sqldb directory.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the IsPublicV1Node SQL query to improve performance, particularly for the SQLite backend. The change replaces an OR condition in a JOIN with a UNION ALL construct, which is a sound optimization strategy to help the query planner use indexes more effectively. The change is logical and well-explained. I have one minor suggestion regarding SQL formatting to ensure consistency.

@Abdulkbk Abdulkbk force-pushed the ispublicnode-perf branch 2 times, most recently from 9c698ba to ca19c8e Compare November 10, 2025 12:32
@ziggie1984
Copy link
Collaborator

can you create some stress tests to see whether union is actually better ?

@saubyk saubyk added this to the v0.21.0 milestone Nov 10, 2025
@saubyk saubyk added this to v0.21 Nov 10, 2025
@saubyk saubyk moved this to In progress in v0.21 Nov 10, 2025
@Abdulkbk
Copy link
Contributor Author

can you create some stress tests to see whether union is actually better ?

I added a benchmark as the first commit, the following is what I found after running:

go test -tags=test_db_sqlite -bench=BenchmarkIsPublicNode  > sqlite_or.txt

# then

go test -tags=test_db_sqlite -bench=BenchmarkIsPublicNode  > sqlite_union.txt

# finally
benchstat sqlite_or.txt sqlite_union.txt 
Query Type Iterations (b.N) Time per op (ns/op)
UNION ALL 388 3,009,488 ns/op (~3ms)
OR 289 4,063,903 ns/op (~4ms)

The fact that this benchmark is run with approximately 8k nodes means that, if the number of nodes is increased to around ~80k, the difference will become more noticeable.

Copy link
Contributor

@MPins MPins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Abdulkbk, I ran the benchmark as is and got the following numbers.

go test -tags=test_db_sqlite -bench=BenchmarkIsPublicNode

*** SELECT WITH UNION ***

goos: linux
goarch: amd64
pkg: github.com/lightningnetwork/lnd/graph/db
cpu: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics     
BenchmarkIsPublicNode-16    	     122	   9719683 ns/op
--- BENCH: BenchmarkIsPublicNode-16
    test_sqlite.go:49: Creating new SQLite DB for testing
PASS
ok  	github.com/lightningnetwork/lnd/graph/db	119.003s

Then changed the go.mod line 216:

from replace github.com/lightningnetwork/lnd/sqldb => ./sqldb
to //replace github.com/lightningnetwork/lnd/sqldb => ./sqldb

And ran it again :

*** SELECT USING OR ***

goos: linux
goarch: amd64
pkg: github.com/lightningnetwork/lnd/graph/db
cpu: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics     
BenchmarkIsPublicNode-16    	      93	  11786240 ns/op
--- BENCH: BenchmarkIsPublicNode-16
    test_sqlite.go:49: Creating new SQLite DB for testing
PASS
ok  	github.com/lightningnetwork/lnd/graph/db	120.685s

The BenchmarkIsPublicNode using SELECT WITH UNION took 9.7ms and using SELECT WITH OR 11.7ms.

@Abdulkbk Abdulkbk force-pushed the ispublicnode-perf branch 2 times, most recently from 398364b to 004d49c Compare December 10, 2025 10:54
@Abdulkbk
Copy link
Contributor Author

Thanks @MPins, for taking a look. I've addressed the feedback.

Copy link
Contributor

@GustavoStingelin GustavoStingelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ran an EXPLAIN ANALYZE btw the approaches for PostgreSQL

    graph_explain_postgres_test.go:108: old_or_join EXPLAIN ANALYZE output:
        Result  (cost=15.65..15.66 rows=1 width=1) (actual time=0.036..0.036 rows=1 loops=1)
          Buffers: shared hit=8
          InitPlan 1 (returns $1)
            ->  Nested Loop  (cost=8.95..62.51 rows=8 width=0) (actual time=0.035..0.035 rows=1 loops=1)
                  Buffers: shared hit=8
                  ->  Index Scan using graph_nodes_unique on graph_nodes n  (cost=0.28..8.30 rows=1 width=8) (actual time=0.016..0.016 rows=1 loops=1)
                        Index Cond: (pub_key = '\x02470e769bf7c452d32e804976916efbe2b88fb3f66b554c6f615af7ecd30cc3dd'::bytea)
                        Buffers: shared hit=3
                  ->  Bitmap Heap Scan on graph_channels c  (cost=8.67..54.09 rows=12 width=16) (actual time=0.018..0.019 rows=1 loops=1)
                        Recheck Cond: ((n.id = node_id_1) OR (n.id = node_id_2))
                        Filter: ((bitcoin_1_signature IS NOT NULL) AND (version = 1))
                        Heap Blocks: exact=1
                        Buffers: shared hit=5
                        ->  BitmapOr  (cost=8.67..8.67 rows=12 width=0) (actual time=0.006..0.006 rows=0 loops=1)
                              Buffers: shared hit=4
                              ->  Bitmap Index Scan on graph_channels_node_id_1_idx  (cost=0.00..4.33 rows=6 width=0) (actual time=0.003..0.003 rows=0 loops=1)
                                    Index Cond: (n.id = node_id_1)
                                    Buffers: shared hit=2
                              ->  Bitmap Index Scan on graph_channels_node_id_2_idx  (cost=0.00..4.33 rows=6 width=0) (actual time=0.003..0.003 rows=8 loops=1)
                                    Index Cond: (n.id = node_id_2)
                                    Buffers: shared hit=2
        Planning Time: 0.329 ms
        Execution Time: 0.056 ms
    graph_explain_postgres_test.go:108: new_union_all EXPLAIN ANALYZE output:
        Result  (cost=4.71..4.72 rows=1 width=1) (actual time=0.028..0.029 rows=1 loops=1)
          Buffers: shared hit=11
          InitPlan 1 (returns $2)
            ->  Append  (cost=0.57..33.66 rows=8 width=4) (actual time=0.028..0.028 rows=1 loops=1)
                  Buffers: shared hit=11
                  ->  Nested Loop  (cost=0.57..16.77 rows=4 width=4) (actual time=0.020..0.020 rows=0 loops=1)
                        Buffers: shared hit=5
                        ->  Index Scan using graph_nodes_unique on graph_nodes n  (cost=0.28..8.30 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1)
                              Index Cond: (pub_key = '\x02470e769bf7c452d32e804976916efbe2b88fb3f66b554c6f615af7ecd30cc3dd'::bytea)
                              Buffers: shared hit=3
                        ->  Index Scan using graph_channels_node_id_1_idx on graph_channels c  (cost=0.29..8.41 rows=6 width=8) (actual time=0.017..0.017 rows=0 loops=1)
                              Index Cond: (node_id_1 = n.id)
                              Filter: ((bitcoin_1_signature IS NOT NULL) AND (version = 1))
                              Buffers: shared hit=2
                  ->  Nested Loop  (cost=0.57..16.77 rows=4 width=4) (actual time=0.008..0.008 rows=1 loops=1)
                        Buffers: shared hit=6
                        ->  Index Scan using graph_nodes_unique on graph_nodes n_1  (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
                              Index Cond: (pub_key = '\x02470e769bf7c452d32e804976916efbe2b88fb3f66b554c6f615af7ecd30cc3dd'::bytea)
                              Buffers: shared hit=3
                        ->  Index Scan using graph_channels_node_id_2_idx on graph_channels c_1  (cost=0.29..8.41 rows=6 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                              Index Cond: (node_id_2 = n_1.id)
                              Filter: ((bitcoin_1_signature IS NOT NULL) AND (version = 1))
                              Buffers: shared hit=3
        Planning Time: 0.213 ms
        Execution Time: 0.050 ms

@Abdulkbk
Copy link
Contributor Author

Thanks for taking a look @GustavoStingelin. From the output, we can see that indeed the union_all performs better, took less time to plan and execute.

Copy link
Contributor

@MPins MPins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@ziggie1984
Copy link
Collaborator

yeah I think having a union call makes sense, can land after the cache PR if we still see major bottlenecks, however I think the cache will already solve our problems. The speed gain with a union is probably around ~20-30% given that node ids will be equally queried meaning that in the UnionCall we will likely also have to scan the table 2 times for 50% the node announcements we receive. I think planning plans are cached in PG so the planning time is not really super critical in a query.

@ziggie1984
Copy link
Collaborator

ziggie1984 commented Dec 19, 2025

Given @yyforyongyu comment and another analysis I run on a bigger channel graph, it seems these queries compare O(n) vs O(log n) so might might prioritize this one first before the cache.

   graph_sql_test.go:330:
        ======================================================================
    graph_sql_test.go:331: QUERY 1: Original with OR in JOIN
    graph_sql_test.go:332: ======================================================================
    graph_sql_test.go:353: Result  (cost=75.68..75.69 rows=1 width=1) (actual time=0.074..0.074 rows=1 loops=1)
    graph_sql_test.go:353:   Output: $0
    graph_sql_test.go:353:   Buffers: shared hit=41
    graph_sql_test.go:353:   InitPlan 1 (returns $0)
    graph_sql_test.go:353:     ->  Nested Loop  (cost=8.74..75.68 rows=1 width=0) (actual time=0.073..0.074 rows=1 loops=1)
    graph_sql_test.go:353:           Join Filter: ((n.id = c.node_id_1) OR (n.id = c.node_id_2))
    graph_sql_test.go:353:           Rows Removed by Join Filter: 100
    graph_sql_test.go:353:           Buffers: shared hit=41
    graph_sql_test.go:353:           ->  Bitmap Heap Scan on public.graph_channels c  (cost=4.40..51.43 rows=16 width=16) (actual time=0.041..0.054 rows=101 loops=1)
    graph_sql_test.go:353:                 Output: c.id, c.version, c.scid, c.node_id_1, c.node_id_2, c.outpoint, c.capacity, c.bitcoin_key_1, c.bitcoin_key_2, c.node_1_signature, c.node_2_signature, c.bitcoin_1_signature, c.bitcoin_2_signature
    graph_sql_test.go:353:                 Recheck Cond: (c.version = 1)
    graph_sql_test.go:353:                 Filter: (c.bitcoin_1_signature IS NOT NULL)
    graph_sql_test.go:353:                 Heap Blocks: exact=7
    graph_sql_test.go:353:                 Buffers: shared hit=38
    graph_sql_test.go:353:                 ->  Bitmap Index Scan on graph_channels_version_outpoint_idx  (cost=0.00..4.40 rows=16 width=0) (actual time=0.034..0.034 rows=2000 loops=1)
    graph_sql_test.go:353:                       Index Cond: (c.version = 1)
    graph_sql_test.go:353:                       Buffers: shared hit=31
    graph_sql_test.go:353:           ->  Materialize  (cost=4.33..22.31 rows=7 width=8) (actual time=0.000..0.000 rows=1 loops=101)
    graph_sql_test.go:353:                 Output: n.id
    graph_sql_test.go:353:                 Buffers: shared hit=3
    graph_sql_test.go:353:                 ->  Bitmap Heap Scan on public.graph_nodes n  (cost=4.33..22.28 rows=7 width=8) (actual time=0.002..0.002 rows=1 loops=1)
    graph_sql_test.go:353:                       Output: n.id
    graph_sql_test.go:353:                       Recheck Cond: (n.pub_key = '\x034704866e705c6b45fab5b7adc6f8ac431a6ca6b38d9f9038ca83bf1d801a3357'::bytea)
    graph_sql_test.go:353:                       Heap Blocks: exact=1
    graph_sql_test.go:353:                       Buffers: shared hit=3
    graph_sql_test.go:353:                       ->  Bitmap Index Scan on graph_nodes_unique  (cost=0.00..4.33 rows=7 width=0) (actual time=0.001..0.001 rows=1 loops=1)
    graph_sql_test.go:353:                             Index Cond: (n.pub_key = '\x034704866e705c6b45fab5b7adc6f8ac431a6ca6b38d9f9038ca83bf1d801a3357'::bytea)
    graph_sql_test.go:353:                             Buffers: shared hit=2
    graph_sql_test.go:353: Planning Time: 0.099 ms
    graph_sql_test.go:353: Execution Time: 0.092 ms
    graph_sql_test.go:357:
        ======================================================================
    graph_sql_test.go:358: QUERY 2: UNION ALL approach
    graph_sql_test.go:359: ======================================================================
    graph_sql_test.go:380: Result  (cost=87.23..87.24 rows=1 width=1) (actual time=0.049..0.050 rows=1 loops=1)
    graph_sql_test.go:380:   Output: $0
    graph_sql_test.go:380:   Buffers: shared hit=38
    graph_sql_test.go:380:   InitPlan 1 (returns $0)
    graph_sql_test.go:380:     ->  Append  (cost=26.77..147.70 rows=2 width=4) (actual time=0.049..0.050 rows=1 loops=1)
    graph_sql_test.go:380:           Buffers: shared hit=38
    graph_sql_test.go:380:           ->  Hash Join  (cost=26.77..73.84 rows=1 width=4) (actual time=0.049..0.049 rows=1 loops=1)
    graph_sql_test.go:380:                 Output: 1
    graph_sql_test.go:380:                 Inner Unique: true
    graph_sql_test.go:380:                 Hash Cond: (c.node_id_1 = n.id)
    graph_sql_test.go:380:                 Buffers: shared hit=38
    graph_sql_test.go:380:                 ->  Bitmap Heap Scan on public.graph_channels c  (cost=4.40..51.43 rows=16 width=8) (actual time=0.033..0.040 rows=101 loops=1)
    graph_sql_test.go:380:                       Output: c.id, c.version, c.scid, c.node_id_1, c.node_id_2, c.outpoint, c.capacity, c.bitcoin_key_1, c.bitcoin_key_2, c.node_1_signature, c.node_2_signature, c.bitcoin_1_signature, c.bitcoin_2_signature
    graph_sql_test.go:380:                       Recheck Cond: (c.version = 1)
    graph_sql_test.go:380:                       Filter: (c.bitcoin_1_signature IS NOT NULL)
    graph_sql_test.go:380:                       Heap Blocks: exact=7
    graph_sql_test.go:380:                       Buffers: shared hit=35
    graph_sql_test.go:380:                       ->  Bitmap Index Scan on graph_channels_version_outpoint_idx  (cost=0.00..4.40 rows=16 width=0) (actual time=0.028..0.028 rows=2000 loops=1)
    graph_sql_test.go:380:                             Index Cond: (c.version = 1)
    graph_sql_test.go:380:                             Buffers: shared hit=28
    graph_sql_test.go:380:                 ->  Hash  (cost=22.28..22.28 rows=7 width=8) (actual time=0.003..0.003 rows=1 loops=1)
    graph_sql_test.go:380:                       Output: n.id
    graph_sql_test.go:380:                       Buckets: 1024  Batches: 1  Memory Usage: 9kB
    graph_sql_test.go:380:                       Buffers: shared hit=3
    graph_sql_test.go:380:                       ->  Bitmap Heap Scan on public.graph_nodes n  (cost=4.33..22.28 rows=7 width=8) (actual time=0.002..0.002 rows=1 loops=1)
    graph_sql_test.go:380:                             Output: n.id
    graph_sql_test.go:380:                             Recheck Cond: (n.pub_key = '\x034704866e705c6b45fab5b7adc6f8ac431a6ca6b38d9f9038ca83bf1d801a3357'::bytea)
    graph_sql_test.go:380:                             Heap Blocks: exact=1
    graph_sql_test.go:380:                             Buffers: shared hit=3
    graph_sql_test.go:380:                             ->  Bitmap Index Scan on graph_nodes_unique  (cost=0.00..4.33 rows=7 width=0) (actual time=0.001..0.001 rows=1 loops=1)
    graph_sql_test.go:380:                                   Index Cond: (n.pub_key = '\x034704866e705c6b45fab5b7adc6f8ac431a6ca6b38d9f9038ca83bf1d801a3357'::bytea)
    graph_sql_test.go:380:                                   Buffers: shared hit=2
    graph_sql_test.go:380:           ->  Hash Join  (cost=26.77..73.84 rows=1 width=4) (never executed)
    graph_sql_test.go:380:                 Output: 1
    graph_sql_test.go:380:                 Inner Unique: true
    graph_sql_test.go:380:                 Hash Cond: (c_1.node_id_2 = n_1.id)
    graph_sql_test.go:380:                 ->  Bitmap Heap Scan on public.graph_channels c_1  (cost=4.40..51.43 rows=16 width=8) (never executed)
    graph_sql_test.go:380:                       Output: c_1.id, c_1.version, c_1.scid, c_1.node_id_1, c_1.node_id_2, c_1.outpoint, c_1.capacity, c_1.bitcoin_key_1, c_1.bitcoin_key_2, c_1.node_1_signature, c_1.node_2_signature, c_1.bitcoin_1_signature, c_1.bitcoin_2_signature
    graph_sql_test.go:380:                       Recheck Cond: (c_1.version = 1)
    graph_sql_test.go:380:                       Filter: (c_1.bitcoin_1_signature IS NOT NULL)
    graph_sql_test.go:380:                       ->  Bitmap Index Scan on graph_channels_version_outpoint_idx  (cost=0.00..4.40 rows=16 width=0) (never executed)
    graph_sql_test.go:380:                             Index Cond: (c_1.version = 1)
    graph_sql_test.go:380:                 ->  Hash  (cost=22.28..22.28 rows=7 width=8) (never executed)
    graph_sql_test.go:380:                       Output: n_1.id
    graph_sql_test.go:380:                       ->  Bitmap Heap Scan on public.graph_nodes n_1  (cost=4.33..22.28 rows=7 width=8) (never executed)
    graph_sql_test.go:380:                             Output: n_1.id
    graph_sql_test.go:380:                             Recheck Cond: (n_1.pub_key = '\x034704866e705c6b45fab5b7adc6f8ac431a6ca6b38d9f9038ca83bf1d801a3357'::bytea)
    graph_sql_test.go:380:                             ->  Bitmap Index Scan on graph_nodes_unique  (cost=0.00..4.33 rows=7 width=0) (never executed)
    graph_sql_test.go:380:                                   Index Cond: (n_1.pub_key = '\x034704866e705c6b45fab5b7adc6f8ac431a6ca6b38d9f9038ca83bf1d801a3357'::bytea)
    graph_sql_test.go:380: Planning Time: 0.098 ms
    graph_sql_test.go:380: Execution Time: 0.071 ms

here the OR query could not effectively use a bitmap and had to scan through the whole channel table.

@ziggie1984
Copy link
Collaborator

can you fix the linter and add 20.1 release notes here @Abdulkbk

@Abdulkbk Abdulkbk force-pushed the ispublicnode-perf branch 2 times, most recently from a0c35a1 to 98ec010 Compare December 19, 2025 22:03
In this commit we add a benchmark to test the performance of
IsPublicNode query.
In this commit we updated the IsPublicV1Node query to use UNION
instead of OR, since sqlite struggles to efficiently use
multiple indexes in a single query involving OR conditions across
different columns.

We use UNION ALL since the query doesn't care about duplicates.
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ziggie1984
Copy link
Collaborator

ziggie1984 commented Dec 24, 2025

@Abdulkbk could you analyze why the itest on windows failed ? It is not related to the PR but we should make sure we at least document flakes.

@lightninglabs-deploy
Copy link

@Abdulkbk, remember to re-request review from reviewers when ready

@ziggie1984 ziggie1984 removed the request for review from yyforyongyu January 5, 2026 07:04
Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@ziggie1984 ziggie1984 modified the milestones: v0.21.0, v0.20.1 Jan 5, 2026
@ziggie1984 ziggie1984 removed this from v0.21 Jan 5, 2026
@ziggie1984 ziggie1984 added the backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. label Jan 5, 2026
@ziggie1984 ziggie1984 merged commit ac006d7 into lightningnetwork:master Jan 5, 2026
42 of 44 checks passed
@github-project-automation github-project-automation bot moved this to Done in lnd v0.20 Jan 5, 2026
@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Successfully created backport PR for v0.20.x-branch:

ziggie1984 added a commit that referenced this pull request Jan 5, 2026
…20.x-branch

[v0.20.x-branch] Backport #10356: graph: fix inefficient query for IsPublicNode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. performance sql

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[bug]: Graph SQL implementation results in some performance issues

7 participants