Skip to content

Ingesters stopped triggering tsdb compaction #6668

@jakirpatel

Description

@jakirpatel

Describe the bug
Ingesters stopped triggering tsdb compactions causing the OOM issue and data loss because of no push to remote storage (google cloud storage)

To Reproduce

  1. Consul restart due to OOM killed
  2. Ingester Ring became unhealthy
Image

Expected behavior

  1. Ingester should not stop triggering the tsdb compaction.

Environment:

  • Infrastructure: Kubernetes v1.26.7, Cortex v1.15.3
  • Deployment tool: Kustomize

Additional Context
Server logs of consul

[Mon Mar 24 09:33:13 2025] Code: Bad RIP value.
[Mon Mar 24 09:33:13 2025] RSP: 002b:000000c00009df18 EFLAGS: 00010202
[Mon Mar 24 09:33:13 2025] RAX: 0000000000000000 RBX: 0000000000004e20 RCX: 00000000004698dd
[Mon Mar 24 09:33:13 2025] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000c00009df18
[Mon Mar 24 09:33:13 2025] RBP: 000000c00009df28 R08: 000000007645c2a4 R09: 00007ffea5d690b0
[Mon Mar 24 09:33:13 2025] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000439c60
[Mon Mar 24 09:33:13 2025] R13: 0000000000000000 R14: 00000000036e71dc R15: 0000000000000000
[Mon Mar 24 09:33:13 2025] Task in /kubepods/burstable/pod19144e2d-5344-4ea2-a161-fd1e4e57fab1/1f289fc88a99539f34d90c61b7eade3a341bd8fa0fe870c2f6f0f8001949efc4 killed as a result of limit of /kubepods/burstable/pod19144e2d-5344-4ea2-a161-fd1e4e57fab1
[Mon Mar 24 09:33:13 2025] memory: usage 524288kB, limit 524288kB, failcnt 1913986
[Mon Mar 24 09:33:13 2025] memory+swap: usage 524204kB, limit 9007199254740988kB, failcnt 0
[Mon Mar 24 09:33:13 2025] kmem: usage 21224kB, limit 9007199254740988kB, failcnt 0
[Mon Mar 24 09:33:13 2025] Memory cgroup stats for /kubepods/burstable/pod19144e2d-5344-4ea2-a161-fd1e4e57fab1: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[Mon Mar 24 09:33:13 2025] Memory cgroup stats for /kubepods/burstable/pod19144e2d-5344-4ea2-a161-fd1e4e57fab1/17b14f8338505345e097052aa04c04b3a0db60980bda3fe253e4cd58dcccff24: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:36KB inactive_file:0KB active_file:0KB unevictable:0KB
[Mon Mar 24 09:33:13 2025] Memory cgroup stats for /kubepods/burstable/pod19144e2d-5344-4ea2-a161-fd1e4e57fab1/3d01ee86d45aa5dc52c06cd2144b02dd652c5828c55b4a62c070c1cc766468ed: cache:227528KB rss:0KB rss_huge:0KB shmem:228068KB mapped_file:50688KB dirty:0KB writeback:0KB swap:0KB inactive_anon:3976KB active_anon:223684KB inactive_file:0KB active_file:0KB unevictable:0KB
[Mon Mar 24 09:33:13 2025] Memory cgroup stats for /kubepods/burstable/pod19144e2d-5344-4ea2-a161-fd1e4e57fab1/1f289fc88a99539f34d90c61b7eade3a341bd8fa0fe870c2f6f0f8001949efc4: cache:2616KB rss:271908KB rss_huge:0KB shmem:2196KB mapped_file:660KB dirty:0KB writeback:0KB swap:0KB inactive_anon:96KB active_anon:273872KB inactive_file:1076KB active_file:152KB unevictable:0KB
[Mon Mar 24 09:33:13 2025] Tasks state (memory values in pages):
[Mon Mar 24 09:33:13 2025] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Mon Mar 24 09:33:13 2025] [  25913]     0 25913      242        1    28672        0          -998 pause
[Mon Mar 24 09:33:13 2025] [   2779]     0  2779       52        2    20480        0           985 docker-entrypoi
[Mon Mar 24 09:33:13 2025] [   2797]   100  2797   331313    79132  1064960        0           985 consul
[Mon Mar 24 09:33:13 2025] [  20375]     0 20375      397       16    45056        0           985 sh
[Mon Mar 24 09:33:13 2025] [  20609]     0 20609     1181       16    40960        0           985 curl
[Mon Mar 24 09:33:13 2025] [  20631]     0 20631      394       13    32768        0           985 grep
[Mon Mar 24 09:33:13 2025] [  20971]     0 20971      394        2    32768        0           985 sh
[Mon Mar 24 09:33:13 2025] Memory cgroup out of memory: Kill process 2797 (consul) score 1590 or sacrifice child
[Mon Mar 24 09:33:13 2025] Killed process 2797 (consul) total-vm:1325252kB, anon-rss:265244kB, file-rss:0kB, shmem-rss:51284kB
[Mon Mar 24 09:33:13 2025] oom_reaper: reaped process 2797 (consul), now anon-rss:0kB, file-rss:0kB, shmem-rss:51284kB
[Mon Mar 24 09:33:17 2025] TCP: request_sock_TCP: Possible SYN flooding on port 8500. Sending cookies.  Check SNMP counters.
[Mon Mar 24 14:58:40 2025] IPv6: ADDRCONF(NETDEV_UP): cali350c831b699: link is not ready
[Mon Mar 24 14:58:40 2025] IPv6: ADDRCONF(NETDEV_CHANGE): cali350c831b699: link becomes ready
[Mon Mar 24 16:28:41 2025] IPv6: ADDRCONF(NETDEV_UP): cali28c7cc3caa3: link is not ready
[Mon Mar 24 16:28:41 2025] IPv6: ADDRCONF(NETDEV_CHANGE): cali28c7cc3caa3: link becomes ready
[Mon Mar 24 16:58:39 2025] IPv6: ADDRCONF(NETDEV_UP): cali9cc360364f7: link is not ready
[Mon Mar 24 16:58:39 2025] IPv6: ADDRCONF(NETDEV_CHANGE): cali9cc360364f7: link becomes ready
[Mon Mar 24 18:28:42 2025] IPv6: ADDRCONF(NETDEV_UP): cali42981617b56: link is not ready
[Mon Mar 24 18:28:42 2025] IPv6: ADDRCONF(NETDEV_CHANGE): cali42981617b56: link becomes ready
[Mon Mar 24 19:58:42 2025] IPv6: ADDRCONF(NETDEV_UP): califadadf0982a: link is not ready
[Mon Mar 24 19:58:42 2025] IPv6: ADDRCONF(NETDEV_CHANGE): califadadf0982a: link becomes ready
[Mon Mar 24 20:28:41 2025] IPv6: ADDRCONF(NETDEV_UP): cali21ba95b6eca: link is not ready
[Mon Mar 24 20:28:41 2025] IPv6: ADDRCONF(NETDEV_CHANGE): cali21ba95b6eca: link becomes ready
[Mon Mar 24 22:28:43 2025] IPv6: ADDRCONF(NETDEV_UP): caliba27763a131: link is not ready
[Mon Mar 24 22:28:43 2025] IPv6: ADDRCONF(NETDEV_CHANGE): caliba27763a131: link becomes ready
[Mon Mar 24 22:58:39 2025] IPv6: ADDRCONF(NETDEV_UP): cali06cb01d420f: link is not ready
[Mon Mar 24 22:58:39 2025] IPv6: ADDRCONF(NETDEV_CHANGE): cali06cb01d420f: link becomes ready
[Mon Mar 24 22:58:40 2025] IPv6: ADDRCONF(NETDEV_UP): cali745f5a04bdc: link is not ready
[Mon Mar 24 22:58:40 2025] IPv6: ADDRCONF(NETDEV_CHANGE): cali745f5a04bdc: link becomes ready
[Tue Mar 25 00:58:41 2025] IPv6: ADDRCONF(NETDEV_UP): calif0e472cf564: link is not ready
[Tue Mar 25 00:58:41 2025] IPv6: ADDRCONF(NETDEV_CHANGE): calif0e472cf564: link becomes ready

Image

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions