Skip to content

Conversation

@TheBlueMatt
Copy link
Contributor

No description provided.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Nov 13, 2025

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR a response to a user report?

funding_amount_sats, \
announcement_signed \
) VALUES ($1, $2, $3) ON CONFLICT (short_channel_id) DO NOTHING", &[
) VALUES ($1, $2, $3) ON CONFLICT (short_channel_id) DO UPDATE SET funding_amount_sats = $2", &[

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this guarantee that everything will be backfilled and the described unwrap cannot fail anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not fully, but it should make it much more robust. Surprisingly I'd never seen it, but other folks have 🤷‍♂️

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully, but it is safe still? I was looking for the mentioned unwrap, but wasn't completely sure which one it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its here -

let funding_sats = current_announcement_row.get::<_, i64>("funding_amount_sats") as u64;
it should be almost safe, since we're only fetching rows that are in our network graph (we make sure all the fields are set for anything in our network graph on startup or that is new), but I imagine there may still be some rare races hiding somewhere, not 100% sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a safe get then for the final 1% and maybe just skip the row?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda confused why its coming up now to begin with - the commit was landed months ago and I assume most users have updated, so this should just get the stragglers working again. We can revisit it if there's still issues.

The DB persistence loop is responsible for taking new gossip off of
the persistence queue and writing it to Postgres. If it gets
stalled, the logic pushing gossip onto the event queue might also
stall. In the next commit, we'll move the gossip persistence logic
into a separate tokio runtime to avoid deadlocks, but in order for
that to work it has to be the case that the gossip persistence
task can never block waiting on a task on the main runtime, which
would be possible via the `NetworkGraph` mutexes. Thus, here, we
move the graph writing to its own task in the main runtime.
When we have many peers but relatively few threads, and postgres
writes fall behind enough to block the async stream of gossip, we
can end up deadlocking on the `PeerManager` `peers` thread.
Specifically, because `PeerManager::process_events` holds a
`peers` (read) lock while fetching messages (during which time we
`block_in_place` trying to push gossip to the async stream), any
other tasks which try to touch the `peers` lock in write mode (e.g.
connecting to a peer or processing a disconnection) we'll hang
until some postgres writes finish. If we get enough threads
blocked or we block the tokio reactor and the postgres writes will
stall, fully deadlocking.

To avoid this, we here move the postgres writes to their own tokio
runtime, ensuring that that reactor cannot be blocked.
In c02f2ec we added
`funding_amount_sats` to `channel_announcements`, making it
`NOT NULL` for new DBs, but `NULL` for old ones. When querying,
however, we (indirectly) `unwrap`ed the values.

Here, we try to backfill by looking at the local network graph on
startup and updating the existing entries, rather than ignoring
conflicting writes.
@TheBlueMatt
Copy link
Contributor Author

Is this PR a response to a user report?

Yes, see discord.

@TheBlueMatt TheBlueMatt force-pushed the main branch 3 times, most recently from 736dc32 to 406cf7f Compare November 13, 2025 16:27
@TheBlueMatt TheBlueMatt force-pushed the main branch 2 times, most recently from 5c6eb95 to b9204d3 Compare November 13, 2025 16:56
joostjager
joostjager previously approved these changes Nov 13, 2025
Copy link

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few small non-blocking remarks.

GossipPersister::new(self.logger.clone()).await;
log_info!(self.logger, "Starting gossip db persistence listener");
tokio::spawn(async move { persister.persist_gossip().await; });
let runtime = Builder::new_multi_thread()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It disappointing that quite a few issues around async seem to require another runtime 😞 Maybe copy the explanation in the commit message to a code comment here. Don't think it'll be obvious otherwise why this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, async is really such a strongly-colored function :(

funding_amount_sats, \
announcement_signed \
) VALUES ($1, $2, $3) ON CONFLICT (short_channel_id) DO NOTHING", &[
) VALUES ($1, $2, $3) ON CONFLICT (short_channel_id) DO UPDATE SET funding_amount_sats = $2", &[

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a safe get then for the final 1% and maybe just skip the row?

@TheBlueMatt
Copy link
Contributor Author

Pushed an additional comment.

@TheBlueMatt TheBlueMatt merged commit 1ef1e32 into lightningdevkit:main Nov 14, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants