Skip to content

fn.select + groupBy combination doesn't work correctly #1189

@KyleAMathews

Description

@KyleAMathews

Summary

When using fn.select() with groupBy(), the query returns only group keys ({ __key_0: ... }) instead of the computed fields from the functional select. This prevents using $selected in having/fn.having clauses after fn.select + groupBy.

Current Behavior

const result = createLiveQueryCollection({
  query: (q) =>
    q
      .from({ sessions: sessionsCollection })
      .groupBy(({ sessions }) => sessions.customer_id)
      .fn.select((row) => ({
        taskId: row.sessions.customer_id,
        latestActivity: max(row.sessions.date),
        sessionCount: count(row.sessions.id),
      }))
      .orderBy(({ $selected }) => $selected.latestActivity),
})

// Returns: [{ __key_0: 1 }, { __key_0: 2 }]
// Expected: [{ taskId: 2, latestActivity: Date, sessionCount: 1 }, ...]

Expected Behavior

The query should return the computed fields from fn.select, similar to how regular .select() works with groupBy.

Root Cause Analysis

The issue is in how processGroupBy() handles queries with fnSelect:

  1. fn.select runs before groupBy (index.ts lines 223-236): Each row gets $selected with computed values from the functional select.

  2. processGroupBy() doesn't receive fnSelect (index.ts lines 260-267):

    pipeline = processGroupBy(
      pipeline,
      query.groupBy,
      query.having,
      query.select,      // ← Only passes regular select, not fnSelect
      query.fnHaving,
    )
  3. Without selectClause, only keys are returned (group-by.ts lines 242-247):

    } else {
      // No SELECT clause - just use the group keys
      for (let i = 0; i < groupByClause.length; i++) {
        finalResults[`__key_${i}`] = aggregatedRow[`__key_${i}`]
      }
    }
  4. The $selected from fnSelect is lost because the else branch ignores selectResults (which contains the pre-groupBy fnSelect values).

Additional Complexity

Even if we preserve $selected from fnSelect, there's a semantic issue:

  • With groupBy, multiple rows are grouped together
  • But fnSelect runs before groupBy, so each row has its own $selected values
  • Which row's $selected should be used for the group? The first one?

For aggregates like count(), sum(), max() used in fn.select:

  • These return expression objects when fnSelect runs (before groupBy)
  • The actual aggregation happens in the groupBy operator
  • But processGroupBy() doesn't know about these aggregates because they're inside a JS function, not an expression tree it can analyze

Failing Unit Tests

These tests demonstrate the expected behavior:

test(`orderBy can reference aggregate field from fn.select`, () => {
  const sessionStats = createLiveQueryCollection({
    startSync: true,
    query: (q) =>
      q
        .from({ sessions: sessionsCollection })
        .where(({ sessions }) => eq(sessions.status, `completed`))
        .groupBy(({ sessions }) => sessions.customer_id)
        .fn.select((row) => ({
          taskId: row.sessions.customer_id,
          latestActivity: max(row.sessions.date),
          sessionCount: count(row.sessions.id),
        }))
        .orderBy(({ $selected }) => $selected.latestActivity),
  })

  // FAILS: Returns [{ __key_0: 1 }, { __key_0: 2 }]
  expect(sessionStats.toArray).toEqual([
    { taskId: 2, latestActivity: new Date(`2023-02-01`), sessionCount: 1 },
    { taskId: 1, latestActivity: new Date(`2023-03-01`), sessionCount: 3 },
  ])
})

test(`HAVING can reference aggregate field from fn.select`, () => {
  const sessionStats = createLiveQueryCollection({
    startSync: true,
    query: (q) =>
      q
        .from({ sessions: sessionsCollection })
        .where(({ sessions }) => eq(sessions.status, `completed`))
        .groupBy(({ sessions }) => sessions.customer_id)
        .fn.select((row) => ({
          taskId: row.sessions.customer_id,
          latestActivity: max(row.sessions.date),
          sessionCount: count(row.sessions.id),
        }))
        .having(({ $selected }) => gt($selected.sessionCount, 2)),
  })

  // FAILS: Returns []
  expect(sessionStats.toArray).toEqual([
    { taskId: 1, latestActivity: new Date(`2023-03-01`), sessionCount: 3 },
  ])
})

test(`fn.having can reference aggregate field from fn.select`, () => {
  const sessionStats = createLiveQueryCollection({
    startSync: true,
    query: (q) =>
      q
        .from({ sessions: sessionsCollection })
        .where(({ sessions }) => eq(sessions.status, `completed`))
        .groupBy(({ sessions }) => sessions.customer_id)
        .fn.select((row) => ({
          taskId: row.sessions.customer_id,
          latestActivity: max(row.sessions.date),
          sessionCount: count(row.sessions.id),
          totalAmount: sum(row.sessions.amount),
        }))
        .fn.having(({ $selected }) => $selected.sessionCount > 2),
  })

  // FAILS: Returns 0 results
  expect(sessionStats.size).toBe(1)
  expect(sessionStats.toArray[0]?.taskId).toBe(1)
})

Possible Solutions

Option 1: Preserve fnSelect results for non-aggregate fields

  • Pass fnSelect flag to processGroupBy()
  • In the else branch, preserve $selected from the first row of each group
  • Limitation: Only works for non-aggregate computed fields

Option 2: Run fnSelect after groupBy

  • Change execution order so fnSelect runs after groupBy completes
  • The function would receive the aggregated row with computed aggregates
  • This would require changes to the pipeline structure

Option 3: Document as unsupported

  • Keep fn.select + groupBy as unsupported
  • Users should use regular .select() for aggregates with groupBy
  • fn.select works for non-grouped transformations

Workaround

For now, use regular .select() with groupBy for aggregates:

// This works:
.groupBy(({ sessions }) => sessions.customer_id)
.select(({ sessions }) => ({
  taskId: sessions.customer_id,
  latestActivity: max(sessions.date),
  sessionCount: count(sessions.id),
}))
.having(({ $selected }) => gt($selected.sessionCount, 2))

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions