Skip to content

Conversation

@adir-torq
Copy link

@adir-torq adir-torq commented Dec 10, 2025

Fix: Serialize interval columns as HH:MM:SS strings during backfill

Problem

During backfill, Sequin uses SELECT queries and processes results through a loader that normalizes values for JSON serialization. The Postgrex.Interval struct returned by Postgrex lacked a Jason.Encoder implementation, causing the loader to emit null for interval columns. In contrast, live CDC decodes WAL tuples as text strings, so interval values were emitted correctly in streaming mode.

This discrepancy meant:

  • Historical data with interval columns couldn't be properly replicated via backfill
  • Transforms couldn't fix the issue since data arrived as null before transform execution
  • Inconsistent output between backfill and streaming modes

Solution

Added a Jason.Encoder implementation for Postgrex.Interval that serializes intervals as simple HH:MM:SS time strings. All interval components (months, days, seconds) are converted to a total time representation.

Conversion rules:

  • 1 month = 30 days (PostgreSQL's default for interval arithmetic)
  • 1 day = 24 hours
  • Microseconds are appended with trailing zeros removed (e.g., .5 for 500000 microsecs)

Examples:

Interval Output
1 second "00:00:01"
1 hour 1 min 1 sec "01:01:01"
1 day "24:00:00"
1 day + 1 hour "25:00:00"
1 month "720:00:00"
1 month + 2 days + 1h1m1s "769:01:01"
-1 day "-24:00:00"
100 hours "100:00:00"
1.5 seconds "00:00:01.5"

Changes

  • lib/sequin/postgrex/encoders.ex: Added Jason.Encoder implementation for Postgrex.Interval that converts intervals to HH:MM:SS format
  • test/sequin/postgrex/encoders_test.exs: Added comprehensive tests for interval encoding including edge cases (zero values, negative values, microseconds, large intervals, days, months)

Related Issues

Similar to the approach used for daterange in GitHub Issue #816.

Testing

mix test test/sequin/postgrex/encoders_test.exsAll 15 tests pass, covering:

  • Simple time intervals (HH:MM:SS format)
  • Zero intervals
  • Days converted to hours (24:00:00)
  • Months converted to hours (720:00:00)
  • Combined months, days, and time
  • Microsecond precision
  • Negative intervals
  • Large intervals (100+ hours)

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Dec 10, 2025
@adir-torq adir-torq changed the title Issue 2094 Add interval encoder for backfill Fix: Serialize interval columns as PostgreSQL strings during backfill Dec 10, 2025
@acco
Copy link
Contributor

acco commented Dec 15, 2025

cross-posting this:

Thanks! Thinking through what would be best here. We could return as an ISO 8601 duration string, eg PT3H15M. But those aren't super common, so could see it being confusing.

I'm leaning towards a JSON object of parts, eg:

{
  "months": 14,
  "days": 3,
  "microseconds": 3600000000
}

Which matches how Postgres stores it. I think this will be pretty clear to users. What do you think?

@acco
Copy link
Contributor

acco commented Dec 25, 2025

Thanks again, @adir-torq ! Fix in e2c0283, lmk if we can improve further

@acco acco closed this Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants