Skip to content

OOM kill with 0.30.0 and Lighthouse 8.0.1 #227

@remyroy

Description

@remyroy

I'm having a lot of issues running 0.30.0 on Mainnet with Lighthouse 8.0.1 since Fusaka. checkpointz consumes a large amount of memory and it just get OOM killed.

This is what my systemd service logs looks like:

Started checkpointz (Mainnet).
time="2025-12-05T20:44:30Z" level=info msg="loading config" cfgFile=/etc/checkpointz/config-mainnet.yaml
time="2025-12-05T20:44:30Z" level=info msg="Starting Checkpointz server (v0.30.0-f15f76b)"
time="2025-12-05T20:44:30Z" level=info msg="Serving http at localhost:5555"
time="2025-12-05T20:44:30Z" level=info msg="Starting Finality provider in full mode" module=beacon/default
time="2025-12-05T20:44:30Z" level=error msg="Waiting for a healthy, non-syncing node before beginning.." error="no nodes found" module=beacon/default
time="2025-12-05T20:44:30Z" level=info msg="Starting beacon..." module=consensus/beacon upstream="Geth + Lighthouse (self-hosted)"
time="2025-12-05T20:44:30Z" level=info msg="Serving metrics at localhost:9092"
time="2025-12-05T20:44:30Z" level=info msg="Node has a new finalized checkpoint" epoch=411829 module=beacon/default node="Geth + Lighthouse (self-hosted)" reason=serving_updater root=0x0f490cfa63b3c59c2b0155cc9c789b45ca3bb7ea6e97efec16b392db04a152d9
time="2025-12-05T20:44:30Z" level=error msg="Failed to check finality" error="failed to decide majority finality: no majority finality found" module=beacon/default node="Geth + Lighthouse (self-hosted)" reason=serving_updater
time="2025-12-05T20:44:30Z" level=error msg="Subscriber error" error="failed to decide majority finality: no majority finality found" module=consensus/beacon topic=finality_checkpoint_updated upstream="Geth + Lighthouse (self-hosted)"
time="2025-12-05T20:44:30Z" level=info msg="Beacon started!" module=consensus/beacon upstream="Geth + Lighthouse (self-hosted)"
time="2025-12-05T20:44:35Z" level=error msg="Waiting for a healthy, non-syncing node before beginning.." error="no nodes found" module=beacon/default
time="2025-12-05T20:44:40Z" level=error msg="Failed to check for serving checkpoint" error="head finalized checkpoint is unknown" module=beacon/default
time="2025-12-05T20:44:40Z" level=info msg="New finalized head checkpoint" epoch=411829 module=beacon/default root=0x0f490cfa63b3c59c2b0155cc9c789b45ca3bb7ea6e97efec16b392db04a152d9
time="2025-12-05T20:44:40Z" level=info msg="Fetching bundle from node Geth + Lighthouse (self-hosted) with root 0x4d611d5b93fdab69013a7f0a2f961caca0c853f87cfe9595fe50038163079360" module=beacon/default
time="2025-12-05T20:44:40Z" level=info msg="Fetched beacon block" module=beacon/default root=0x4d611d5b93fdab69013a7f0a2f961caca0c853f87cfe9595fe50038163079360 slot=0 state_root=0x7e76880eb67bbdc86250aa578958e9d0675e64e714337855204fb5abaaf82c2b
time="2025-12-05T20:44:40Z" level=error msg="Failed to check for genesis bundle" error="failed to store block: genesis time is unknown" module=beacon/default
time="2025-12-05T20:44:40Z" level=info msg="Fetched genesis time" module=beacon/default
time="2025-12-05T20:44:45Z" level=info msg="Serving bundle is unknown, downloading" head_epoch=411829 head_root=0x0f490cfa63b3c59c2b0155cc9c789b45ca3bb7ea6e97efec16b392db04a152d9 module=beacon/default
time="2025-12-05T20:44:45Z" level=info msg="Downloading serving checkpoint" epoch=411829 fork_name=fulu module=beacon/default
time="2025-12-05T20:44:45Z" level=info msg="Fetching bundle from node Geth + Lighthouse (self-hosted) with root 0x0f490cfa63b3c59c2b0155cc9c789b45ca3bb7ea6e97efec16b392db04a152d9" module=beacon/default
time="2025-12-05T20:44:45Z" level=info msg="Fetched beacon block" module=beacon/default root=0x0f490cfa63b3c59c2b0155cc9c789b45ca3bb7ea6e97efec16b392db04a152d9 slot=13178528 state_root=0xe82f42ac75c691bc4f3a8cfffc1512a27d14a0da4c8108a7708d6a4f08ed5b18
time="2025-12-05T20:44:49Z" level=warning msg="failed to download and store deposit snapshot" error="status code: 404" module=beacon/default
time="2025-12-05T20:44:49Z" level=info msg="Successfully fetched bundle from Geth + Lighthouse (self-hosted)" block_root=0x0f490cfa63b3c59c2b0155cc9c789b45ca3bb7ea6e97efec16b392db04a152d9 module=beacon/default state_root=0xe82f42ac75c691bc4f3a8cfffc1512a27d14a0da4c8108a7708d6a4f08ed5b18
time="2025-12-05T20:44:49Z" level=info msg="Serving a new finalized checkpoint bundle" epoch=411829 module=beacon/default root=0x0f490cfa63b3c59c2b0155cc9c789b45ca3bb7ea6e97efec16b392db04a152d9
time="2025-12-05T20:44:55Z" level=info msg="Fetching bundle from node Geth + Lighthouse (self-hosted) with root 0x4d611d5b93fdab69013a7f0a2f961caca0c853f87cfe9595fe50038163079360" module=beacon/default
time="2025-12-05T20:44:56Z" level=info msg="Fetched beacon block" module=beacon/default root=0x4d611d5b93fdab69013a7f0a2f961caca0c853f87cfe9595fe50038163079360 slot=0 state_root=0x7e76880eb67bbdc86250aa578958e9d0675e64e714337855204fb5abaaf82c2b
time="2025-12-05T20:44:56Z" level=info msg="Downloaded and stored block for slot 0" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0x4d611d5b93fdab69013a7f0a2f961caca0c853f87cfe9595fe50038163079360 slot=0 state_root=0x7e76880eb67bbdc86250aa578958e9d0675e64e714337855204fb5abaaf82c2b
time="2025-12-05T20:44:57Z" level=info msg="Downloaded and stored block for slot 13178496" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0x542e84bfdd833ff01ce8e5332f196be583bf2c42becd1b99e5e71949c11577f3 slot=13178496 state_root=0xea5df9b7159ccd20bf4922a8c71da22737056f4fbfc51b92c3ae806f390d7701
time="2025-12-05T20:44:57Z" level=info msg="Successfully fetched bundle from Geth + Lighthouse (self-hosted)" block_root=0x4d611d5b93fdab69013a7f0a2f961caca0c853f87cfe9595fe50038163079360 module=beacon/default state_root=0x7e76880eb67bbdc86250aa578958e9d0675e64e714337855204fb5abaaf82c2b
time="2025-12-05T20:44:57Z" level=info msg="Fetched genesis bundle" module=beacon/default root=0x4d611d5b93fdab69013a7f0a2f961caca0c853f87cfe9595fe50038163079360
time="2025-12-05T20:44:58Z" level=info msg="Downloaded and stored block for slot 13178304" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0xfbaa6b16f7626bd42989089d486250e060a75ba42807d91638029ceb21072a6b slot=13178304 state_root=0xe96cbeb2bf97d700aedd648a127bf906bd3ca6776838e632f17cb31155abb4af
time="2025-12-05T20:44:58Z" level=info msg="Downloaded and stored block for slot 13178080" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0xa57931808f74b27331a1c07595e1e0bf42b42ab9e3215428928201d6f3e0bff5 slot=13178080 state_root=0x46b54cb091b719216c45901f1f0675e619bcefa032358d30ac4361a0a4ce6b11
time="2025-12-05T20:44:59Z" level=info msg="Downloaded and stored block for slot 13178016" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0x43b3706e73d37e79f190ba6581db8ec39cfab4c1db35cfd73168d077dffbda08 slot=13178016 state_root=0x4731a637d8e481a2814b34e7e2008655ba7c1b89fc514095ac9982861e950d4f
time="2025-12-05T20:45:00Z" level=info msg="Downloaded and stored block for slot 13177952" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0xa5b0b39c299141b078f7901694a460b87ee8385c56405ba2a2f4118f8424cff5 slot=13177952 state_root=0x4ae5a1518491dd33f33994da676dbaee97c8cebdd7b0081cecf64affff5fb04f
time="2025-12-05T20:45:02Z" level=info msg="Downloaded and stored block for slot 13178464" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0xccb8dd8daeb5f6bbe87251c0f30ef43df4b8065e053208e2b178dc1987efb64b slot=13178464 state_root=0x3827fea2b92f8db76e097cda06f2312ba5101dac9b798182435001b12e38c22f
time="2025-12-05T20:45:03Z" level=error msg="Failed to write response" error="write tcp 127.0.0.1:5555->127.0.0.1:52088: write: broken pipe" module=api
time="2025-12-05T20:45:03Z" level=info msg="Downloaded and stored block for slot 13178432" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0xf83df7b27c5aefbc0ce9cc975c6240773ccaefd8c1ba457a91b18069a54b7d42 slot=13178432 state_root=0x62661f39602cc47fea95f4dcd46b3468815528a57efe4d2079e5b4138ce30f8f
time="2025-12-05T20:45:06Z" level=info msg="Downloaded and stored block for slot 13178400" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0xec54ce04ceaa9c34fbe5dff9d06321ed24df29540acbd8a99b4912d494971ee6 slot=13178400 state_root=0x3a9e68f4e741e8d35203acc4c7e79364220bb45d640db9cbd9d4ce427fcd0194
time="2025-12-05T20:45:07Z" level=info msg="Downloaded and stored block for slot 13178208" module=beacon/default node="Geth + Lighthouse (self-hosted)" root=0xa566b80b1270e759be26fcb2c9da4bc77225067ab02e3b11cfff8df962d3f164 slot=13178208 state_root=0x17a13c6f685d2aec3112bd1ed0a6ea406bf7432185f49f7d54eee5fde1501ec6
time="2025-12-05T20:45:40Z" level=error msg="Failed to download historical block" error="GET failed with status 500: {\"code\":500,\"message\":\"UNHANDLED_ERROR: ExecutionLayerErrorPayloadReconstruction(0x48661abe4746309d1c0c75c1faa56a9160a574f8f960c052282c47ae6b4bf747, EngineError(Api { error: HttpClient(url: http://127.0.0.1:8551/, kind: request, detail: connection error: Connection reset by peer (os error 104)) }))\",\"stacktraces\":[]}" failure_count=1 module=beacon/default slot=13178144
time="2025-12-05T20:45:42Z" level=error msg="Failed to download historical block" error="GET failed with status 500: {\"code\":500,\"message\":\"UNHANDLED_ERROR: ExecutionLayerErrorPayloadReconstruction(0x6d0f5eebc3bf8ce06ade2ea0bac21fe8dcefd6d0a23fe89a58dc4dedde21191d, ApiError(HttpClient(url: http://127.0.0.1:8551/, kind: request, detail: error trying to connect: tcp connect error: Connection refused (os error 111))))\",\"stacktraces\":[]}" failure_count=1 module=beacon/default slot=13177984
time="2025-12-05T20:45:44Z" level=error msg="Failed to download historical block" error="GET failed with status 500: {\"code\":500,\"message\":\"UNHANDLED_ERROR: ExecutionLayerErrorPayloadReconstruction(0xc02cd255bd5cda2f870fbf3081d7d23d992b90ebd518038dfc9861554f603dec, ApiError(HttpClient(url: http://127.0.0.1:8551/, kind: request, detail: error trying to connect: tcp connect error: Connection refused (os error 111))))\",\"stacktraces\":[]}" failure_count=1 module=beacon/default slot=13178272
time="2025-12-05T20:45:45Z" level=error msg="Failed to download historical block" error="GET failed with status 500: {\"code\":500,\"message\":\"UNHANDLED_ERROR: ExecutionLayerErrorPayloadReconstruction(0xe10881cb17de66b8146617f6c8ca51c0585d8850afe934424754c282efe4fda8, ApiError(HttpClient(url: http://127.0.0.1:8551/, kind: request, detail: error trying to connect: tcp connect error: Connection refused (os error 111))))\",\"stacktraces\":[]}" failure_count=1 module=beacon/default slot=13178176
time="2025-12-05T20:45:48Z" level=error msg="Failed to download historical block" error="GET failed with status 500: {\"code\":500,\"message\":\"UNHANDLED_ERROR: ExecutionLayerErrorPayloadReconstruction(0x0957e94b38c2d25753b1e61d057354b2b0fb6a109d07c9c4b62b5c26849509ca, ApiError(HttpClient(url: http://127.0.0.1:8551/, kind: request, detail: error trying to connect: tcp connect error: Connection refused (os error 111))))\",\"stacktraces\":[]}" failure_count=1 module=beacon/default slot=13178368
time="2025-12-05T20:45:53Z" level=error msg="Failed to write response" error="write tcp 127.0.0.1:5555->127.0.0.1:52126: write: broken pipe" module=api
time="2025-12-05T20:45:55Z" level=error msg="Failed to write response" error="write tcp 127.0.0.1:5555->127.0.0.1:52174: write: broken pipe" module=api
time="2025-12-05T20:45:57Z" level=error msg="Failed to write response" error="write tcp 127.0.0.1:5555->127.0.0.1:52156: write: broken pipe" module=api
time="2025-12-05T20:45:58Z" level=error msg="Failed to write response" error="write tcp 127.0.0.1:5555->127.0.0.1:52160: write: broken pipe" module=api
time="2025-12-05T20:45:58Z" level=error msg="Failed to write response" error="write tcp 127.0.0.1:5555->127.0.0.1:52232: write: broken pipe" module=api
checkpointz.service: A process of this unit has been killed by the OOM killer.
checkpointz.service: Main process exited, code=killed, status=9/KILL
checkpointz.service: Failed with result 'oom-kill'.
checkpointz.service: Consumed 26.778s CPU time.
checkpointz.service: Scheduled restart job, restart counter is at 1.
Stopped checkpointz (Mainnet).
checkpointz.service: Consumed 26.778s CPU time.
Started checkpointz (Mainnet).
time="2025-12-05T20:46:11Z" level=info msg="loading config" cfgFile=/etc/checkpointz/config-mainnet.yaml
time="2025-12-05T20:46:11Z" level=info msg="Starting Checkpointz server (v0.30.0-f15f76b)"
time="2025-12-05T20:46:11Z" level=info msg="Serving http at localhost:5555"
time="2025-12-05T20:46:11Z" level=info msg="Starting Finality provider in full mode" module=beacon/default
time="2025-12-05T20:46:11Z" level=info msg="Serving metrics at localhost:9092"
time="2025-12-05T20:46:11Z" level=error msg="Waiting for a healthy, non-syncing node before beginning.." error="no nodes found" module=beacon/default
time="2025-12-05T20:46:11Z" level=info msg="Starting beacon..." module=consensus/beacon upstream="Geth + Lighthouse (self-hosted)"
time="2025-12-05T20:46:11Z" level=info msg="Node has a new finalized checkpoint" epoch=411830 module=beacon/default node="Geth + Lighthouse (self-hosted)" reason=serving_updater root=0xde7df949bc490b4bd22bed035881dbae8ed3fb27bc361b285fb1cfeaa9af23ba
time="2025-12-05T20:46:11Z" level=error msg="Failed to check finality" error="failed to decide majority finality: no majority finality found" module=beacon/default node="Geth + Lighthouse (self-hosted)" reason=serving_updater
time="2025-12-05T20:46:11Z" level=error msg="Subscriber error" error="failed to decide majority finality: no majority finality found" module=consensus/beacon topic=finality_checkpoint_updated upstream="Geth + Lighthouse (self-hosted)"
time="2025-12-05T20:46:11Z" level=info msg="Beacon started!" module=consensus/beacon upstream="Geth + Lighthouse (self-hosted)"
time="2025-12-05T20:46:16Z" level=error msg="Waiting for a healthy, non-syncing node before beginning.." error="no nodes found" module=beacon/default
time="2025-12-05T20:46:21Z" level=error msg="Waiting for a healthy, non-syncing node before beginning.." error="no nodes found" module=beacon/default
time="2025-12-05T20:46:26Z" level=error msg="Waiting for a healthy, non-syncing node before beginning.." error="no nodes found" module=beacon/default
time="2025-12-05T20:46:31Z" level=error msg="Waiting for a healthy, non-syncing node before beginning.." error="no nodes found" module=beacon/default
2025/12/05 20:46:32 Caught signal: terminated
Stopping checkpointz (Mainnet)...
checkpointz.service: Deactivated successfully.
Stopped checkpointz (Mainnet).

Anything I should change or check to improve this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions