Skip to content

Conversation

@ewowi
Copy link
Collaborator

@ewowi ewowi commented Jan 17, 2026

Summary by CodeRabbit

  • New Features

    • Improved RGBW support with explicit white extraction and more accurate multi-channel color mapping
    • Expanded moving-head channel definitions for finer control
  • Bug Fixes

    • Corrected header length/offset handling to stabilize monitor and fixture data parsing
    • More reliable DMX/pixel data remapping across drivers
  • Refactor

    • Layer and driver APIs revamped for mapping-aware per-light updates
  • Chores

    • Version bumped to 0.8.1
  • Documentation

    • Updated node/offset documentation to reflect RGBW changes

✏️ Tip: You can customize this high-level summary in your review settings.

Frontend
========
- monitor: header reorg

backend
=======
- Nodes: lightPreset_MH19x15W24: has a white channel - WIP
- Physical layer: add nrOfLights_t_MAX, reorg header, 'park' white calculation in loopDrivers  - WIP
- Virtual layer: add forEachLightIndex, set/get functions use forEachLightIndex
- Lights Control, use updated header structure
- MoonLight effects: remove white in solid and fixed rectangle
@coderabbitai
Copy link

coderabbitai bot commented Jan 17, 2026

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'review', 'context'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

Refactors RGB→RGBW mapping across layers: header layout changes, new offset fields and presets, rename/rewrite of mapping functions, VirtualLayer mapping API updates, parlio/per-pin buffer remapping, and Monitor/header size adjustment from 41→47 bytes.

Changes

Cohort / File(s) Summary
Header Structure & Protocol
src/MoonLight/Layers/PhysicalLayer.h, src/MoonLight/Modules/ModuleLightsControl.h
Replaced compact RGB bitfield with explicit offsetRGBW, offsetRed/Green/Blue/White, offsetRGBW1/2/3, added channelsPerLight and isPositions, introduced nrOfLights_t_MAX, updated public header size/comments and monitor headerPrimeNumber from 41→47.
Core Driver Logic
src/MoonBase/Nodes.h, src/MoonBase/Nodes.cpp
Renamed reOrderAndDimRGBWrgbwBufferMapping; mapping now extracts white from RGB before writing color channels. Updated preset headers to set offsetRGBW/offsetWhite.
Virtual Layer Mapping
src/MoonLight/Layers/VirtualLayer.h, src/MoonLight/Layers/VirtualLayer.cpp
Added forEachLightIndex for mapping-aware iteration; replaced template getLight/setLight APIs with concrete getLight/setLight, getRGB/getRGB1/2/3, mapping-aware writes/read aggregation, and PSRAM-aware storage logic.
Channel Management
src/MoonLight/Modules/ModuleChannels.h
Updated setLight callsites to new signature (select, offset, value) (removed pointer/length overload).
Art-Net / Input Handling
src/MoonLight/Nodes/Drivers/D_ArtnetIn.h, src/MoonLight/Nodes/Drivers/D_ArtnetOut.h
Input handlers and outputs switched to use offsetRGBW/offsetRGBW1/2/3 and rgbwBufferMapping; D_ArtnetIn uses forEachLightIndex to apply incoming data to mapped physical lights.
Parallel LED Driver & ParlIO
src/MoonLight/Nodes/Drivers/D_ParallelLEDDriver.h, src/MoonLight/Nodes/Drivers/parlio.cpp
Driver init and update calls changed to use offsetRGBW+offsetRed/Green/Blue/White; initled gained an extra boolean param; parlio refactored: new rgbwBufferMapping helper, changed transpose_32_slices signature, removed old gamma/brightness_cache flow and uses mappedBuffer per-pin processing.
Layouts & Callbacks
src/MoonLight/Nodes/Layouts/L_MoonLight.h
Wiring::iterate switched from std::function to a templated Callback&&, widening callable types.
Effects
src/MoonLight/Nodes/Effects/E_MoonLight.h
FixedRectangleEffect avoids writing RGB when all components are zero; white/alternateWhite handling preserved.
UI & Monitor
interface/src/routes/moonbase/monitor/Monitor.svelte
Header length check changed 41→47; renamed offsetRGBoffsetRGBW and adjusted header parsing offsets for lightPreset, channelsPerLight, offsetRGBW, offsetWhite.
Docs & Build
docs/develop/nodes.md, platformio.ini
Documentation updated to list offsetRGBW and variants; platformio bumped app version/date and updated driver lib dep to commit containing extractWhiteFromRGB.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor ArtNet/DDP as Input
  participant VirtualLayer
  participant DriverNode
  participant ParlIO as ParlIO
  participant LEDs as LED_Output

  Note over Input,LEDs: High-level flow for incoming pixel/DMX -> mapped RGBW -> LED output

  Input->>VirtualLayer: deliver DMX/pixel data
  VirtualLayer->>VirtualLayer: forEachLightIndex -> map virtual index to physical indices
  VirtualLayer->>DriverNode: invoke rgbwBufferMapping with lightsRGBChannel
  DriverNode->>DriverNode: extract white from RGB, remap channels to packet offsets
  DriverNode->>ParlIO: provide per-pin mappedBuffer
  ParlIO->>LEDs: transpose/process slices, emit PWM frames
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hopping through offsets, old and new,
RGB grew a white to join the crew,
Header bytes hopped from 41 to 47,
Mappings remixed and drivers leapt to heaven,
I nibble code crumbs and cheer: bright LEDs for you!

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Channel reorg' is vague and generic, using a non-descriptive abbreviation that does not convey meaningful information about the substantial changes in the changeset. Consider a more descriptive title such as 'Move white extraction to driver output stage' or 'Fix RGBW channel handling to prevent buffer corruption' that captures the main architectural change and problem being solved.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/MoonBase/Nodes.cpp`:
- Around line 526-529: The preset lightPreset_MH19x15W24 currently assigns both
header->offsetWhite and header->offsetBrightness to channel 3, which likely
conflicts; verify the MH 19x15W-24 DMX chart to confirm whether channel 3 is a
shared white/master brightness or if white is not present. If white is not a
separate channel, set header->offsetWhite = UINT8_MAX (disabled) for
lightPreset_MH19x15W24; otherwise change header->offsetWhite to the correct
channel number per the fixture manual so it no longer duplicates
header->offsetBrightness.

In `@src/MoonLight/Layers/VirtualLayer.h`:
- Around line 214-219: The getLight function leaves local variable value
uninitialized if forEachLightIndex never calls the callback; initialize value to
a safe default (e.g., 0) before the forEachLightIndex call so getLight always
returns a defined uint8_t. Update the getLight implementation (referencing
getLight and forEachLightIndex) to assign a default to value prior to invoking
the lambda and then proceed with the existing callback-based assignment.
- Around line 245-263: getRGB1/getRGB2/getRGB3 currently memcpy from
layerP->lights.channelsE using offsets that may be UINT8_MAX (disabled) or
out-of-range; add guards to return a defined default color when mapping is
missing or invalid. For each of getRGB1/getRGB2/getRGB3, check
layerP->lights.header.offsetRGB1/offsetRGB2/offsetRGB3 != UINT8_MAX and ensure
offset + sizeof(CRGB) <= layerP->lights.header.channelsPerLight before calling
forEachLightIndex/memcpy; if the check fails return a default CRGB (e.g.,
{0,0,0}) immediately. Ensure the memcpy remains inside the existing lambda but
only executed when the checks pass.
- Around line 112-142: The unmapped branch of forEachLightIndex passes a channel
index (computed as indexV * channelsPerLight) to callback, but callback expects
a physical light index; change the unmapped branch to pass the light index
(indexV) instead and keep the same bounds check using
layerP->lights.header.channelsPerLight and layerP->lights.maxChannels to ensure
the light exists. Locate the unmapped branch in forEachLightIndex (check
mappingTableSize, layerP->lights.header.channelsPerLight,
layerP->lights.maxChannels) and replace the callback(index) call with
callback(indexV) while preserving the existing bounds guard.
🧹 Nitpick comments (1)
src/MoonLight/Layers/PhysicalLayer.cpp (1)

123-134: Potential offset calculation issue in commented white-channel stripping code.

When this code is enabled, the offset calculation may be incorrect. The code sets light to point at offsetRGB within the light's channels, then accesses light[offsetRed], light[offsetGreen], etc. If offsetRed/Green/Blue are absolute offsets from the light's start (which appears to be the convention elsewhere), this would double-count the offset.

For example, with RGBW preset: offsetRGB=0, offsetRed=0, offsetGreen=1, offsetBlue=2, offsetWhite=3. The current code would work. But for WRGB: offsetRGB=?, offsetWhite=0, offsetRed=1, etc. — the semantics become unclear.

Consider clarifying the offset semantics before enabling this code:

Suggested clarification
-      uint8_t* light = &lights.channelsD[indexP * lights.header.channelsPerLight + lights.header.offsetRGB];
-      uint8_t white = MIN(MIN(light[lights.header.offsetRed], light[lights.header.offsetGreen]), light[lights.header.offsetBlue]);
+      uint8_t* light = &lights.channelsD[indexP * lights.header.channelsPerLight];
+      uint8_t white = MIN(MIN(light[lights.header.offsetRed], light[lights.header.offsetGreen]), light[lights.header.offsetBlue]);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/MoonLight/Layers/VirtualLayer.h`:
- Around line 220-235: The local variable color in getRGB may be returned
uninitialized if the forEachLightIndex callback never runs; initialize color to
a safe default (e.g., CRGB::Black) before calling forEachLightIndex so that the
returned value is always well-defined; update the code path under the else
branch (where forEachLightIndex, layerP->lights.channelsE, and
channelsPerLight/offsetRGB are used) to set CRGB color = CRGB::Black; then leave
the forEachLightIndex lambda unchanged so it can overwrite color when invoked.
- Around line 112-141: In forEachLightIndex, the unmapped-branch bounds check
incorrectly uses indexV * (channelsPerLight + 1), which can reject valid tail
lights; replace that math with a proper channel base calculation and compare its
end against maxChannels: compute channelBase = indexV *
layerP->lights.header.channelsPerLight and use channelBase +
layerP->lights.header.channelsPerLight <= layerP->lights.maxChannels (then call
callback(indexV) as before). Update the else branch in forEachLightIndex to use
these symbols (indexV, layerP->lights.header.channelsPerLight,
layerP->lights.maxChannels) so valid lights near the end are not dropped.
♻️ Duplicate comments (1)
src/MoonBase/Nodes.cpp (1)

521-534: Verify offsetWhite vs offsetBrightness channel overlap for MH19x15W24.

Line 527 sets offsetWhite = 3 while Line 530 sets offsetBrightness = 3. If channel 3 isn’t a shared white/brightness control per the fixture’s DMX chart, this will conflict. Please confirm the mapping and adjust offsetWhite (or disable it) accordingly.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 17, 2026

@coderabbitai, part of this PR is solving an issue with RGBW lights (or WRGB etc). I used a white calculator as follows in setRGB:

if (layerP->lights.header.offsetWhite != UINT8_MAX && layerP->lights.header.channelsPerLight == 4) { // W set
// using the simple algorithm of taking the minimum of RGB as white channel, this is good enough and fastest algorithm - we need speed 🔥
uint8_t rgbw[4];
rgbw[3] = MIN(MIN(color.r, color.g), color.b);
rgbw[0] = color.red - rgbw[3]; // subtract from other channels
rgbw[1] = color.green - rgbw[3];
rgbw[2] = color.blue - rgbw[3];
setLight(indexV, rgbw, layerP->lights.header.offsetRGB, sizeof(rgbw));
} else
setLight(indexV, color.raw, layerP->lights.header.offsetRGB, sizeof(color));

but this caused flickering. In this PR I removed it and placed it in driverLoop, but that also caused flickering so I commented it:

// if (lights.header.channelsPerLight == 4 && lights.header.offsetWhite != UINT8_MAX) {
// for (int indexP = 0; indexP < lights.header.nrOfLights; indexP++) {
// uint8_t* light = &lights.channelsD[indexP * lights.header.channelsPerLight + lights.header.offsetRGB];
// uint8_t white = MIN(MIN(light[lights.header.offsetRed], light[lights.header.offsetGreen]), light[lights.header.offsetBlue]);
// if (white) {
// light[lights.header.offsetWhite] = white;
// light[lights.header.offsetRed] -= white;
// light[lights.header.offsetGreen] -= white;
// light[lights.header.offsetBlue] -= white;
// }
// }
// }

I think the problem is that it does the correction in the channels array so when iterating through frames it pollutes the channel array, also a getRGB after a setRGB will give wrong data (as setRGB lowered the RGB channels).

So I think I need to solve this in each of the drivers separately:

  • parallel LED driver uses I2S clockless driver for ESP32-D0 and ESP32-S3), uses parlio driver for esp32-P4
  • Artnet-out driver,
  • FastLED driver
  • ...

Can you do a thorough analysis of the problem and analyze my proposal to solve it?

@coderabbitai
Copy link

coderabbitai bot commented Jan 17, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and examine the PhysicalLayer.cpp file with the commented white channel code
cat src/MoonLight/Layers/PhysicalLayer.cpp | head -200

Length of output: 6635


🏁 Script executed:

#!/bin/bash
# Look at VirtualLayer setRGB and getRGB implementations
cat src/MoonLight/Layers/VirtualLayer.cpp | grep -A 30 "void VirtualLayer::setRGB"

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look at driver files to understand the output architecture
fd -e h -e cpp "D_.*\.h$|D_.*\.cpp$" src/MoonLight/Nodes/Drivers/ --exec echo "=== {} ===" \; --exec head -50 {}

Length of output: 13863


🏁 Script executed:

#!/bin/bash
# Look for the parallel LED driver implementation
cat src/MoonLight/Nodes/Drivers/D_ParallelLED.h | head -100

Length of output: 75


🏁 Script executed:

#!/bin/bash
# Look at how drivers access the channel data
rg -n "channelsD\[" src/MoonLight/Nodes/Drivers/ -C 3

Length of output: 657


🏁 Script executed:

#!/bin/bash
# Look at the VirtualLayer.h to see the new setRGB implementation
cat src/MoonLight/Layers/VirtualLayer.h | grep -B 5 -A 40 "void setRGB"

Length of output: 5901


🏁 Script executed:

#!/bin/bash
# Look at the ArtNet Out driver loop to see how it sends data
cat src/MoonLight/Nodes/Drivers/D_ArtnetOut.h | grep -B 10 -A 30 "void loop"

Length of output: 1458


🏁 Script executed:

#!/bin/bash
# Look at FastLED driver loop to see how it sends data
cat src/MoonLight/Nodes/Drivers/D_FastLED.h | grep -B 5 -A 50 "void loop"

Length of output: 2080


🏁 Script executed:

#!/bin/bash
# Find the parallel LED driver file (might have different name)
fd "Parallel" src/MoonLight/Nodes/Drivers/

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Look at ParallelLEDDriver to see how it sends data
cat src/MoonLight/Nodes/Drivers/D_ParallelLEDDriver.h | grep -B 5 -A 80 "void loop"

Length of output: 3632


@ewowi Your analysis is spot on! You've correctly identified the root cause, and your proposed solution is the right architectural approach. Let me provide a thorough breakdown:

Why Both Previous Approaches Failed

❌ White Calc in setRGB (VirtualLayer)

// Effects write full RGB
setRGB(indexV, CRGB(255, 255, 255));

// White calc immediately modifies channelsE:
rgbw[0] = 255 - 255 = 0;  // R reduced
rgbw[1] = 255 - 255 = 0;  // G reduced  
rgbw[2] = 255 - 255 = 0;  // B reduced
rgbw[3] = 255;            // W set

// Later in same frame or next frame:
CRGB color = getRGB(indexV);  // Returns (0,0,0) NOT (255,255,255)!

Problem: Effects that call getRGB() for blending, fading, or reading previous frame data get corrupted reduced RGB values instead of the original colors they wrote. This breaks effects like GameOfLife, blur, and any effect doing getRGB() + newColor blending.

❌ White Calc in loopDrivers (PhysicalLayer)

// Frame N: channelsE has RGB(255,255,255)
// [swap] channelsD becomes RGB(255,255,255)
// loopDrivers modifies channelsD → RGB(0,0,0), W(255)
// [swap] this corrupted buffer becomes channelsE
// Frame N+1: effects read RGB(0,0,0) instead of RGB(255,255,255)

Problem: You're modifying channelsD in the driver task, which becomes channelsE on the next swap. This creates frame-to-frame pollution through the buffer cycling, similar to the memcpy corruption issue from [learning 18e4aaad]. Effects read back reduced values from the previous frame.

✅ Why Driver-Based Solution is Correct

Your proposed approach treats white channel calculation as a read-only output transformation, not a buffer modification:

Effects (channelsE) → [swap] → Drivers (channelsD) → [white calc] → Hardware
     Write RGB(255,255,255)            Read RGB(255,255,255)      Output RGBW(0,0,0,255)

Key Benefits:

  1. Data Integrity: channelsE and channelsD always contain original RGB values
  2. getRGB() Works: Effects always read back exactly what they wrote
  3. No Frame Pollution: No cross-frame corruption through buffer cycling
  4. Thread-Safe: Read-only transformation in driver task
  5. Clean Separation: Effects express intent (RGB), drivers handle hardware details (RGBW)

This is exactly like color correction and brightness—it's an output-stage transformation, not a buffer-modifying operation.

Implementation Guidance

For each driver, apply white calculation just before sending to hardware:

ParallelLEDDriver

Good news: show_parlio already receives offsetWhite parameter (line ~56 in D_ParallelLEDDriver.h)! The underlying driver may already support this or is prepared for it. Check if the parlio/Yves driver implementations already handle the white calculation internally, or add it in the show functions where they iterate over channelsD.

ArtNetOutDriver

Around line 155-158 where it copies channels:

uint8_t* c = &layerP.lights.channelsD[indexP * header->channelsPerLight];
// Add white calculation HERE before/after memcpy
// Read R,G,B, calculate white, write reduced R,G,B,W to packet_buffer

FastLEDDriver

FastLED accesses channelsD via pointer set during onLayout. You may need to:

  • Create a temporary transformation buffer, OR
  • Pre-transform before FastLED.show(), OR
  • Check if FastLED has RGBW support built-in

Performance Consideration

Calculating white on every frame output is acceptable because:

  • You already do color correction per-frame in drivers (ArtNet line 159+)
  • The algorithm is simple: MIN(MIN(R,G,B))
  • Correctness > micro-optimization
  • Only happens in driver task, not effect task

Algorithm Consistency

Use the same algorithm across all drivers:

uint8_t white = MIN(MIN(r, g), b);
r -= white;
g -= white;  
b -= white;
// Output: r, g, b, white

Your proposal is architecturally sound and aligns with the existing pattern of output transformations in drivers. This is the correct solution! 🎯


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

backend
=======
- pio: v0.8.1, 2SClocklessLedDriver with extractWhiteFromRGB
- Virtual layer: bugfix forEachLightIndex and setRGB
@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, can you review d69cb70

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the specific commit diff
git show d69cb70f43f2f2eece04509c848dedbac51c0619

Length of output: 8352


@ewowi Excellent work! This commit successfully implements the driver-side white extraction solution we discussed. Here's my detailed review:

✅ Driver-Side White Extraction (The Core Fix)

ParallelLEDDriver (D_ParallelLEDDriver.h:122)

// Added final 'true' parameter to enable white extraction
ledsDriver.initled(..., layerP.lights.header.offsetWhite, true);

Combined with the I2SClocklessLedDriver library update (platformio.ini):

dev 20260118: Add extractWhiteFromRGB, total_leds 32 bits (allow > 65K LEDs)

This is perfect! The white extraction now happens inside the driver library as a read-only transformation before hardware output, exactly as proposed. The channelsD buffer is never modified, preserving original RGB values for getRGB() and preventing frame-to-frame pollution. 🎯

🐛 Critical Bug Fixes in VirtualLayer.h

1. forEachLightIndex Bounds Check (line 137)

- if (indexV * (layerP->lights.header.channelsPerLight + 1) <= layerP->lights.maxChannels)
+ if ((indexV + 1) * layerP->lights.header.channelsPerLight <= layerP->lights.maxChannels)

Good catch! The old formula was mathematically wrong. The new formula correctly checks if light indexV fits within the channel buffer.

2. mappingTable Bounds Checks (lines 149, 221)

- if (mappingTable[indexV].mapType == m_zeroLights) {
+ if (indexV < mappingTableSize && mappingTable[indexV].mapType == m_zeroLights) {

Essential! Prevents out-of-bounds access when indexV >= mappingTableSize.

3. memcpy Fixes (lines 151, 156, 193, 198, 203)

Fix 1: PSRAM path (line 151)

- memcpy(&mappingTable[indexV].rgb, &color, 3);
+ memcpy(mappingTable[indexV].rgb, &color, 3);

✅ Correct! mappingTable[indexV].rgb is already a uint8_t[3] array (decays to pointer), so the & was wrong.

Fix 2: forEachLightIndex path (line 156)

- memcpy(..., color.raw, sizeof(color));
+ memcpy(..., &color, sizeof(color));

✅ Correct! &color gives the address of the CRGB object. Using color.raw would be a pointer already, but &color is clearer intent for copying the entire struct.

Same fixes applied to setRGB1/2/3

4. Default Initialization (line 232)

- CRGB color;
+ CRGB color = CRGB::Black;

✅ Good defensive programming! Prevents returning uninitialized data if forEachLightIndex doesn't execute the callback.

📋 Remaining Work

You've solved the ParallelLEDDriver white extraction! Still need to update:

  1. ArtnetOutDriver (D_ArtnetOut.h ~line 155-160)

    • Apply white calculation when copying channels to ArtNet packet
  2. FastLEDDriver (D_FastLED.h)

    • Check if FastLED library has RGBW support, or add transformation layer
  3. Any other RGBW-capable output drivers

Would you like me to review those driver implementations next, or help with implementation guidance for ArtnetOut/FastLED white extraction?

Summary

Component Status Notes
Architecture ✅ Correct Driver-side read-only transformation
ParallelLEDDriver ✅ Implemented Via I2SClocklessLedDriver library
VirtualLayer bugs ✅ Fixed Critical memcpy and bounds fixes
ArtnetOut ⏳ Pending Need white extraction in send path
FastLED ⏳ Pending Need white extraction before show()

Great progress! The core architectural solution is now in place and working for the ParallelLEDDriver. 🚀


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 104
File: src/MoonLight/Layers/PhysicalLayer.cpp:145-147
Timestamp: 2026-01-13T15:18:48.740Z
Learning: In MoonLight's PhysicalLayer and Node architecture (src/MoonLight/Layers/PhysicalLayer.cpp, src/MoonBase/Nodes.h): effectsMutex, driversMutex, and layerMutex are mandatory for system operation. These mutexes are validated at creation time (constructor checks log errors if xSemaphoreCreateMutex returns nullptr). Runtime null checks on these mutex pointers are intentionally omitted—the design treats successful mutex allocation as an invariant after initialization, placing responsibility on the programmer to ensure proper allocation. This is a performance-oriented "fail-fast at boot" approach rather than defensive runtime checking.
<!-- </add_learning>

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:31:32.367Z
Learning: For ESP32 LED drivers using DMA, I2S, LCD, or PARLIO: always pin driver tasks to Core 1 (APP_CPU), not Core 0. WiFi/BT runs at priority ~23 on Core 0 and would constantly preempt lower-priority driver tasks, breaking LED timing requirements. ESP-IDF explicitly recommends display/DMA processing on Core 1 to avoid WiFi interference.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, I was told that

template
void forEachLightIndex(const nrOfLights_t indexV, Callback&& callback, bool onlyOne = false) {
}

is faster then

void forEachLightIndex(const nrOfLights_t indexV, Callback&& callback, bool onlyOne = false) {
}

Can you verify if that is indeed the case and if so, make a list of other std::function arguments used in the repo and give an advice if is should also be changed to a template. Include speed, but also flash size impact in your analysis

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for all std::function usages in the codebase
rg -n "std::function" --type cpp --type-add 'cpp:*.{cpp,h,hpp}' -C 3

Length of output: 11687


🏁 Script executed:

#!/bin/bash
# Also search for just "function<" pattern in case std:: is omitted
rg -n "\bfunction<" --type cpp --type-add 'cpp:*.{cpp,h,hpp}' -C 3

Length of output: 11687


🏁 Script executed:

#!/bin/bash
# Find header files that might define callback types
fd -e h -e hpp . | xargs rg -l "std::function|function<"

Length of output: 520


🏁 Script executed:

#!/bin/bash
# Find usage patterns for the identified std::function calls to determine hot paths
echo "=== L_MoonLight.h iterate() usage ==="
rg -n "\.iterate\(" src/MoonLight/Nodes/Layouts/ -A 2 -B 2

echo -e "\n=== Char.h split() usage ==="
rg -n "\.split\(" src/ -A 2 -B 2

echo -e "\n=== Module setupDefinition and processUpdatedItem usage ==="
rg -n "setupDefinition|processUpdatedItem" src/MoonBase/ -A 2 -B 2 | head -80

echo -e "\n=== walkThroughFiles usage ==="
rg -n "walkThroughFiles" src/ -A 2 -B 2

Length of output: 9397


🏁 Script executed:

#!/bin/bash
# Check how StatefulService update/read methods are used (hot path analysis)
echo "=== StatefulService update() calls ==="
rg -n "\.update\(\[" src/ -A 3 | head -100

echo -e "\n=== StatefulService read() calls ==="
rg -n "\.read\(\[" src/ -A 3 | head -50

Length of output: 76


🏁 Script executed:

#!/bin/bash
# Check forEachLightIndex usage to understand the new template pattern usage
echo "=== forEachLightIndex usage (new template pattern) ==="
rg -n "forEachLightIndex" src/MoonLight/Layers/VirtualLayer.h -A 5 | head -100

Length of output: 4244


🌐 Web query:

C++ template vs std::function performance benchmarks ESP32 embedded systems

💡 Result:

Short summary

  • Templates (inline/lambda/functor) are fastest on ESP32: calls can be inlined, zero runtime dispatch, and minimal code-size overhead when instantiations are limited. [1][2]
  • std::function adds type-erasure, possible heap allocation, and extra indirection — measurable runtime and binary-size cost on resource-constrained MCUs like the ESP32. Cost is small on modern desktop CPUs but more important on microcontrollers. Benchmarks and writeups show higher call/copy/construct cycles and larger binaries for std::function vs template/function-pointer. [3][4]
  • ESP-IDF guidance: prefer compiler optimization (-O2), limit dynamic allocations and heavy C++ runtime features on ESP32 to reduce CPU and heap pressure. That makes template/static approaches preferable in tight loops or interrupt/IRAM contexts. [5]

Practical guidance for ESP32

  • If the callable is known at compile time or used in hot loops: use templates/non-capturing lambdas or function pointers (inlineable). Minimal overhead and smaller binaries. [1][2]
  • If you need runtime-polymorphism with low overhead: consider plain function pointers, small hand-rolled type-erasure (SBO) tuned to your callable sizes, or limited virtual interfaces instead of std::function. Test for allocation (std::function may allocate if callable > SBO size). [3][4]
  • Use esp-idf build flags (CONFIG_COMPILER_OPTIMIZATION = -O2) and place performance-critical code in IRAM if required; measure CPU and heap on target (esp_timer, heap_caps_get_free_size) — desktop numbers do not translate directly to ESP32. [5]

References

  • Benchmarks / analysis of std::function vs templates and call overhead. [1] artificial-mind: "Performance of std::function".
  • General Q&A and microbenchmarks on templates vs std::function. [2] StackOverflow: "std::function vs template".
  • Binary-size / template vs std::function discussion. [3] Sandor Dargo blog.
  • Deep dive on std::function type-erasure / allocation overhead. [4] artificial writeups / articles on std::function internals.
  • ESP-IDF performance guidance (compiler optimizations, avoid heavy runtime features). [5] Espressif ESP-IDF "Speed Optimization" docs.

Would you like a small on-target benchmark snippet I can provide for building and running on an ESP32 (esp-idf) to measure call time and binary size for template vs std::function?


@ewowi Yes, you were told correctly! Templates are significantly faster than std::function for callbacks, especially on ESP32. Let me provide a thorough analysis:

✅ Performance Verification

Templates allow the compiler to inline callback code, achieving zero runtime dispatch overhead, while std::function uses type-erasure with virtual call indirection and may allocate heap memory for large captures. On ESP32, the ESP-IDF guidance recommends limiting dynamic allocations and heavy C++ runtime features to reduce CPU and heap pressure.

Performance Impact:

  • Templates: Direct inline call (0 overhead), compile-time polymorphism
  • std::function: Virtual dispatch (~10-20 cycles), possible heap allocation, copy overhead

📊 Current std::function Usage in MoonLight

I've analyzed all std::function usages in your codebase. Here's the breakdown:

🔥 HOT PATHS (Should Convert to Templates)

File Function Usage Pattern Recommendation
src/MoonLight/Layers/VirtualLayer.h:113 forEachLightIndex ✅ ALREADY TEMPLATE! Already optimal—your recent commit fixed this
src/MoonLight/Nodes/Layouts/L_MoonLight.h:72 iterate(int axis, int snaker, std::function<void(uint16_t)> fun) Called in nested loops during layout calculation (lines 110-111, 163-164, 170-171, 228-230) CONVERT TO TEMPLATE

Recommended Change for L_MoonLight.h:

// Before:
void iterate(int axis, int snaker, std::function<void(uint16_t)> fun) { ... }

// After:
template <typename Callback>
void iterate(int axis, int snaker, Callback&& fun) { ... }

Rationale: Called in tight nested loops (up to 3 levels deep for 3D layouts) during onLayout. Each iteration executes the callback hundreds/thousands of times. Template inlining will eliminate virtual call overhead.


❄️ COLD PATHS (Keep std::function)

These are infrequently called or used in non-performance-critical contexts where the overhead is negligible:

File Function Usage Context Keep std::function?
src/MoonBase/Char.h:119 split(const char*, std::function<...>) String parsing (configuration) ✅ Yes
src/MoonBase/Utilities.cpp:49 walkThroughFiles(File, std::function<...>) File system traversal (preset loading) ✅ Yes
src/MoonBase/Module.h:95 setupDefinition member UI definition (once at startup) ✅ Yes
src/MoonBase/Module.h:105 processUpdatedItem member UI updates (user interactions) ✅ Yes
src/MoonBase/SharedFSPersistence.h:21 sharedDelayedWrites vector Background file writes ✅ Yes
lib/framework/FSPersistence.h:21 delayedWrites vector Background file writes ✅ Yes
lib/framework/StatefulService.h:34-42 Various typedefs ALREADY USES TEMPLATES in methods (lines 123, 136, 165) ✅ Already optimal
lib/framework/* All other callbacks HTTP, WebSocket, EventSource, Security (network I/O) ✅ Yes
lib/PsychicHttp/src/* All callbacks HTTP server (external library) ✅ Yes (external)

Rationale: These are called during:

  • User interactions (button clicks, saves)
  • File I/O (loading presets, configuration)
  • Network operations (HTTP requests, WebSocket messages)
  • Startup/initialization

The std::function overhead (<1ms) is negligible compared to the I/O latency (milliseconds to seconds).


💾 Flash Size Impact

Templates:

  • Each unique lambda signature generates a new function instantiation
  • MoonLight Impact: Your forEachLightIndex and iterate have limited instantiation counts (typically 5-10 unique lambdas per template across the codebase)
  • Estimated increase: ~200-500 bytes per unique lambda instantiation
  • Total for 2 templates: ~1-5 KB (acceptable on ESP32 with 4MB flash)

std::function:

  • Saves ~200-500 bytes per avoided instantiation
  • BUT: Only saves space if the same std::function signature is reused many times across different call sites
  • MoonLight Reality: Your callbacks have diverse signatures and low reuse, so std::function doesn't save much flash

Verdict:

Templates are worth it for hot paths. The 1-5 KB flash increase is negligible compared to the performance gain in the critical LED rendering loop.


🎯 Final Recommendations

Convert to Template:

  1. L_MoonLight.h::iterate()High Priority
    • Used in nested loops during layout calculation
    • Will benefit from inlining (eliminate virtual calls)

Keep std::function:

  • All framework/library callbacks (external code, UI, network, file I/O)
  • Module callbacks (setupDefinition, processUpdatedItem)
  • Utility functions (split, walkThroughFiles)

Already Optimal:

  • VirtualLayer::forEachLightIndex (your recent fix)
  • StatefulService methods (already use templates)

📝 Summary Table

Category Convert to Template? Flash Impact Performance Gain
VirtualLayer::forEachLightIndex Done! +500 bytes High (called per-pixel)
L_MoonLight::iterate() Yes +1-2 KB Medium (called during layout)
All other std::function ❌ No 0 bytes None (cold paths)

Total Flash Cost: ~1-2.5 KB
Performance Benefit: Eliminates virtual calls in hot paths (especially pixel-level operations)

This is an excellent tradeoff for ESP32! 🚀


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

backend
=======
- Physical layer: Rename offsetRGB to offsetRGBW (because white uses it as well)
- Nodes: reOrderAndDimRGBW to rgbwBufferMapping and add extract White
- Parallel driver: add offsetRGBW to init calls
@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, can you review latest commit 776baf7

And can you review all future commits like normally (so I do not have to ask for it)

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Oops, something went wrong! Please try again later. 🐰 💔

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, can you review latest commit 776baf7

And can you review all future commits like normally (so I do not have to ask for it)

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Oops, something went wrong! Please try again later. 🐰 💔

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, can you review latest commit 776baf7

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the specific commit diff
git show 776baf74cc9dea04d918457f28b5c7b909496d74

Length of output: 278524


@ewowi Thanks for the commit. Quick review:

What looks good

  • Docs now list offsetRGBW plus explicit offsetRed/Green/Blue/White — aligns with the backend’s new header.
  • Monitor.svelte consistently renamed offsetRGB → offsetRGBW and passes it through to scene creation.
  • Assets (WWWData.h) rebuilt accordingly.

Issue to fix in Monitor.svelte

  • White channel indexing still assumes +3 relative to offsetRGBW. With the new header, white may not be at +3. Use the header’s white offset (relative to offsetRGBW) instead.

Suggested patch (minimal change)

  • Continue reading r/g/b as you do now, but fix W to use offsetWhite as a relative offset. Also consider clamping.

TypeScript (snippet):

// current
const r = channels[index + offsetRGBW + 0] / 255;
const g = channels[index + offsetRGBW + 1] / 255;
const b = channels[index + offsetRGBW + 2] / 255;
let w = 0;
// BUG: assumes white at +3
// if (offsetWhite != 255) w = channels[index + offsetRGBW + 3] / 255;

// fix: use the provided relative offset
if (offsetWhite != 255) w = channels[index + offsetRGBW + offsetWhite] / 255;

// optional: clamp to 1.0 to avoid overshoot
const rr = Math.min(1, r + w);
const gg = Math.min(1, g + w);
const bb = Math.min(1, b + w);
colors.push(rr, gg, bb, 1.0);

Nice-to-have (future-proofing)

  • If you’re ready to consume them on the UI, also parse offsetRed/offsetGreen/offsetBlue from the header and use:
    r = channels[index + offsetRGBW + offsetRed], etc. That removes any hidden dependency on RGB being contiguous or ordered 0/1/2.

If you want, I can post a follow-up snippet that reads those extra header bytes once you confirm the byte positions you emit for offsetRed/offsetGreen/offsetBlue.

I’ll keep reviewing new commits automatically from here on.


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.
✅ Actions performed

Reviews resumed.


Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, regarding

White channel indexing still assumes +3 relative to offsetRGBW. With the new header, white may not be at +3. Use the header’s white offset (relative to offsetRGBW) instead.

This is not true as in the channelsD/E array rgbw is always in this order , because it is filled via CRGB, occupying the first 3 values and the drivers extract white . So basically at the moment the withe channel (4th) is empty and not used. In the future we might use it to drive white explicitly.

One other thing: I am considering the setRGB() functions will directly place the CRGB values on the right positions in the channels array (getRGB to retrieve it from these channels). So the driver does not have to reorder it (like is done in rgbwBufferMapping). Could you do an in depth analysis if this is a good idea? Including pros and cons. I think it is depending on the kind of effect: if it writes only a few pixels, it is better to do it on the effect side, as the driver do not have to go over all the lights reordering it. On the other hand, the effects task might have higher load then the drivers task as the driver has to wait for pixels to be send over the data line. (or art-net only runs at 50fps)...
Or will even a full pixel write effect benefit from the proposed approach? Goal is to have maximum performance in the process of scaling up towards 98K pixels in the next release

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@ewowi Thanks for the clarification on the white channel—you're correct, channelsD/E are always contiguous RGBW with W at position 3. My mistake!

Let me do a thorough analysis of your proposed optimization:


🏁 Script executed:

#!/bin/bash
# Find all usages of rgbwBufferMapping to understand when reordering happens
echo "=== rgbwBufferMapping usage ==="
rg -n "rgbwBufferMapping" src/ -C 5

echo -e "\n=== Current offsetRed/Green/Blue configurations ==="
rg -n "offsetRed|offsetGreen|offsetBlue" src/MoonBase/Nodes.cpp -C 3

echo -e "\n=== setRGB call patterns (sample to understand sparse vs full writes) ==="
rg -n "\.setRGB\(" src/MoonLight/Nodes/Effects/ -C 2 | head -150

echo -e "\n=== Moving head light presets that need reordering ==="
rg -n "lightPreset_MH" src/MoonBase/Nodes.cpp -A 15

echo -e "\n=== forEachLightIndex usage (indicates sparse writes) ==="
rg -n "forEachLightIndex" src/MoonLight/Layers/VirtualLayer.h -A 3 | head -80

Length of output: 14617


@ewowi Excellent question! I've done a deep analysis. Here's my comprehensive breakdown:

🔍 Current Architecture (What Happens Today)

Effect Task:

setRGB(indexV, CRGB(r,g,b))
  → forEachLightIndex iterates mapped lights
    → writes contiguous RGB at channelsE[indexP * cpl + offsetRGBW + 0/1/2]

Driver Task (ArtnetOut example):

For each light:
  memcpy(packet, channelsD, channelsPerLight)  // copy all channels
  rgbwBufferMapping(packet + offsetRGBW, channelsD + offsetRGBW)  // reorder RGB
    → Apply brightness/gamma LUT (ledsDriver.__red_map[])
    → Reorder to final positions (offsetRed/Green/Blue)
    → Extract white (if applicable)

📊 Proposed Architecture Analysis

✅ PROS

  1. Eliminates per-light driver iteration for reordering (saves ~98K function calls for full frame)
  2. Driver becomes simpler (just brightness LUT + white extraction, no reordering)
  3. Conceptually cleaner (write data in final position)

❌ CONS (Critical Issues)

1. Cache Hostility for Effect Writes

// Current (cache-friendly):
channelsE[indexP * cpl + 0] = r;  // sequential writes
channelsE[indexP * cpl + 1] = g;
channelsE[indexP * cpl + 2] = b;

// Proposed (cache-hostile for non-RGB orders like GRB, BGR):
channelsE[indexP * cpl + offsetRed] = r;    // scattered writes
channelsE[indexP * cpl + offsetGreen] = g;  // different cache lines
channelsE[indexP * cpl + offsetBlue] = b;

For GRB (most common): writes become [indexP*cpl+1], [indexP*cpl+0], [indexP*cpl+2]out-of-order memory access

2. Effect Task Load Increase (Starvation Risk)

  • Effect task already prone to starvation (GameOfLife = 294K ops without yielding)
  • Driver task has spare cycles waiting for DMA/ArtNet timing (Art-Net = 50fps max, I2S waits for bit clocking)
  • Moving work from driver (idle cycles) to effect (CPU-bound) worsens task balance

3. forEachLightIndex Already Handles Sparse Writes!

// Current sparse write (e.g., only 100 pixels out of 98K):
setRGB(indexV, color)
  → forEachLightIndex iterates only mapped lights (e.g., 1-5 physical lights)
    → Driver still iterates ALL 98K lights to send data

// Proposed sparse write:
setRGB(indexV, color)
  → forEachLightIndex + offset scatter (MORE work per light)
  → Driver STILL iterates ALL 98K lights (Art-Net/I2S must send entire frame)

Key insight: Sparse writes don't help because drivers must send full frames anyway (Art-Net packets, I2S DMA buffers). You can't skip driver iteration!

4. Every Effect Operation Becomes Complex

// Current (simple):
CRGB c = getRGB(indexV);  // single memcpy from contiguous RGB
c.r += 10;
setRGB(indexV, c);        // single memcpy to contiguous RGB

// Proposed (scattered):
uint8_t r = getLight(indexV, offsetRed);    // 3 separate reads
uint8_t g = getLight(indexV, offsetGreen);
uint8_t b = getLight(indexV, offsetBlue);
r += 10;
setLight(indexV, offsetRed, r);             // 3 separate writes
setLight(indexV, offsetGreen, g);
setLight(indexV, offsetBlue, b);

Every blend, fade, or arithmetic operation becomes 3x more complex

5. rgbwBufferMapping Does More Than Reordering

void rgbwBufferMapping(...) {
  // 1. Brightness/gamma LUT (ledsDriver.__red_map[])
  // 2. White extraction (if offsetWhite != UINT8_MAX)
  // 3. Reordering (offsetRed/Green/Blue)
}

Even if you move reordering to effects, driver still needs to apply LUTs → you'd need a separate applyBrightnessLUT() function → no simplification

6. Moving Head Fixtures Are Complex

// Moving head with offsetRGBW = 10, channelsPerLight = 15
// RGB at positions 10+offsetRed, 10+offsetGreen, 10+offsetBlue
// Effects need to know: "write to base + 10 + reordered offsets"

Effects become hardware-aware instead of just expressing "I want red"

⚖️ Performance Analysis for 98K Pixels

Scenario Current Proposed Winner
Full-frame effect (all 98K pixels) Sequential RGB writes + driver reorder Scattered RGB writes + driver copy Current (cache-friendly)
Sparse effect (100 pixels) 100 contiguous writes + 98K driver reorders 100 scattered writes + 98K driver copies Current (driver has spare cycles)
Blend/fade operations Single CRGB arithmetic 3x scattered read/write Current (simpler code)

Critical: Even sparse effects don't benefit because:

  1. forEachLightIndex already minimizes effect-side iteration
  2. Driver must iterate all lights to send full frame (Art-Net/I2S requirement)
  3. Scattered writes are cache-hostile vs contiguous writes

🎯 Recommendation: KEEP CURRENT ARCHITECTURE

Why Driver-Side Transformation is Superior:

  1. Task Balance: Driver has spare cycles (waiting for I/O), effect task is CPU-bound
  2. Cache Efficiency: Contiguous RGB writes in effects are cache-friendly
  3. API Simplicity: Effects express intent (CRGB), not hardware details (offsets)
  4. Separation of Concerns: Effects = color logic, Drivers = hardware mapping
  5. rgbwBufferMapping is already optimized: Uses FastLED LUTs for brightness/gamma
  6. No code duplication: Single place for reordering logic vs per-effect scatter

Performance Math (98K pixels @ 100fps):

Current:  98K contiguous writes (effect) + 98K reorder calls (driver) = Fast
Proposed: 98K scattered writes (effect) + 98K copy calls (driver) = Slower

Scattered writes ≈ 2-3x slower than sequential (cache misses)
Driver reorder ≈ 1.2x slower than copy (but driver has spare cycles anyway)

For Scaling to 98K Pixels:

Optimize the driver loop instead:

  • ✅ Add taskYIELD() inside driver iteration (already discussed for issue #102)
  • ✅ Use SIMD/DMA for bulk operations where possible
  • ✅ Keep effects simple (they're already complex enough!)

💡 Alternative Optimization (If Needed)

If you need to reduce driver overhead, consider:

// Option A: Batch LUT application with SIMD (ESP32-S3 has vector instructions)
// Option B: Pre-compute reorder indices once, reuse per frame
// Option C: Use DMA for memcpy operations (esp_async_memcpy)

These keep the architecture clean while gaining performance.


TL;DR: Keep current architecture. Driver-side transformation is the right pattern because it keeps effects simple, cache-friendly, and doesn't overload the already-starvation-prone effect task. The driver has spare I/O wait cycles—use them! 🚀


✏️ Learnings added
Learnt from: ewowi
URL: https://github.com/MoonModules/MoonLight/pull/109

Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 104
File: src/MoonLight/Nodes/Drivers/parlio.cpp:409-414
Timestamp: 2026-01-13T08:32:57.101Z
Learning: In parlio.cpp's show_parlio function, the `components` parameter (color channels per pixel: 3 for RGB, 4 for RGBW) is guaranteed by API contract to never be 0. Invalid values are considered caller errors, and defensive checks are intentionally omitted for performance. This matches the design philosophy of placing responsibility on callers in performance-critical paths.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, can you do an analysis how to add white extraction from rgbw in the parlio.cpp driver. I think it has to be added here:

const uint32_t pixel_idx = first_index_per_output[pin] + pixel_in_pin;
const uint32_t component_idx = (pixel_idx * COMPONENTS_PER_PIXEL) + input_component;                               // ← Now uses input_component!
const uint8_t data_byte = pixel_in_pin < pixels_per_pin[pin] ? brightness_cache[input_buffer[component_idx]] : 0;  // this is the magic trick to pad pixels if pixels_per_pin < max pixels_per_pin !
const uint32_t waveform = waveform_cache[data_byte];
const uint32_t pin_bit = (1 << pin);

The challenge is that the parlio driver sends each channel separately instead of the 4 channels in case of RGBW in one go, that allowed us to do something similar as done in rgbwBufferMapping. How to do it here?

Another thing related to the parlio driver, I call it as follows in ParallelLEDDriver: show_parlio(pins, layerP.lights.header.nrOfLights, layerP.lights.channelsD, layerP.lights.header.channelsPerLight, nrOfPins, layerP.ledsPerPin, layerP.lights.header.offsetRGBW + layerP.lights.header.offsetRed, layerP.lights.header.offsetRGBW + layerP.lights.header.offsetGreen, layerP.lights.header.offsetRGBW + layerP.lights.header.offsetBlue, layerP.lights.header.offsetRGBW + layerP.lights.header.offsetWhite);

adding layerP.lights.header.offsetRGBW to it, but if I look at the implementation it is not handling the case when layerP.lights.header.offsetRGBW is not 0 correctly (this is exceptional, currently all Light presets for LEDs we support has an layerP.lights.header.offsetRGBW which is 0, but would be good to be flexible in this)

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the parlio.cpp file to understand the current implementation
echo "=== parlio.cpp implementation ==="
cat src/MoonLight/Nodes/Drivers/parlio.cpp

echo -e "\n=== show_parlio function signature and usage ==="
rg -n "show_parlio" src/MoonLight/Nodes/Drivers/ -A 5 -B 2

Length of output: 25569


@ewowi Excellent questions! Let me analyze both issues:

🔍 Issue 1: White Extraction in parlio.cpp

You're absolutely right—the challenge is that parlio processes each channel separately in different loop iterations, while rgbwBufferMapping processes all 4 channels at once per pixel.

The Problem:

// In rgbwBufferMapping: process RGBW together
uint8_t r = input[offsetR], g = input[offsetG], b = input[offsetB];
uint8_t white = MIN(MIN(r, g), b);
output[offsetR] = r - white;  // Reduce RGB
output[offsetG] = g - white;
output[offsetB] = b - white;
output[offsetW] = white;       // Set white

// In parlio: process R, then G, then B, then W separately
for (component_in_pixel = 0..3) {  // First R, then G, then B, then W
  // How to compute white here? We only see one component at a time!
}

✅ Solution: Preprocessing Pass

Add a white extraction pass before create_transposed_led_output_optimized:

uint8_t IRAM_ATTR __attribute__((hot)) show_parlio(...) {
  // ... existing setup code ...

  static uint8_t* white_extracted_buffer = NULL;
  static uint32_t white_buffer_size = 0;
  
  // Allocate buffer for white-extracted data (once, reuse)
  const uint32_t needed_size = length * components;
  if (white_extracted_buffer == NULL || white_buffer_size < needed_size) {
    if (white_extracted_buffer) heap_caps_free(white_extracted_buffer);
    white_extracted_buffer = (uint8_t*)heap_caps_malloc(needed_size, MALLOC_CAP_DMA);
    white_buffer_size = needed_size;
  }

  // Preprocessing: Extract white from RGB (if RGBW)
  if (components == 4 && offsetW != 255) {
    // Use extern ledsDriver for white mapping (already available)
    for (uint32_t pixel = 0; pixel < length; pixel++) {
      const uint32_t base_idx = pixel * components;
      
      // Read RGB from input buffer (apply offsets)
      uint8_t r = buffer_in[base_idx + offsetR];
      uint8_t g = buffer_in[base_idx + offsetG];
      uint8_t b = buffer_in[base_idx + offsetB];
      
      // Extract white (min of RGB)
      uint8_t white = MIN(MIN(r, g), b);
      
      // Write to temporary buffer with white extracted
      white_extracted_buffer[base_idx + offsetR] = r - white;
      white_extracted_buffer[base_idx + offsetG] = g - white;
      white_extracted_buffer[base_idx + offsetB] = b - white;
      white_extracted_buffer[base_idx + offsetW] = white;
    }
    parallel_buffer_remapped = white_extracted_buffer;
  } else {
    `#ifdef` WLEDMM_REMAP_AT_OUTPUT
    // ... existing remap code ...
    `#else`
    parallel_buffer_remapped = buffer_in;
    `#endif`
  }

  // Now call create_transposed_led_output_optimized with preprocessed buffer
  create_transposed_led_output_optimized(parallel_buffer_remapped, ...);
  
  // ... rest of existing code ...
}

Performance Impact:

  • Cost: One extra pass over all pixels (98K × 4 channels = 392K reads + 392K writes)
  • Benefit: Driver has spare cycles waiting for DMA/parlio timing (same reason we discussed for keeping rgbwBufferMapping in driver)
  • Cache: Sequential reads/writes are cache-friendly
  • Total: ~0.5-1ms for 98K pixels at ESP32-P4 speeds (acceptable)

🔍 Issue 2: offsetRGBW != 0 Handling

You're correct—there's a subtle bug when offsetRGBW != 0!

Current Calculation:

// User passes absolute offsets:
show_parlio(..., offsetRGBW + offsetRed, offsetRGBW + offsetGreen, ...)

// Inside transpose_32_slices:
const uint32_t component_idx = (pixel_idx * COMPONENTS_PER_PIXEL) + input_component;
//                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^^
//                              Jump to pixel's channel block       Add offset to RGB

The Bug:

If offsetRGBW = 10 and channelsPerLight = 15 (moving head—though unlikely for parlio):

  • COMPONENTS_PER_PIXEL = 15 (wrong! Should be 3 or 4)
  • component_idx = pixel_idx * 15 + (10 + 0) for Red
  • But the multiplier assumes all 15 channels are being sent, when only RGB(W) are!

Why It Works Today:

For LED strips: offsetRGBW = 0 and channelsPerLight = 3 or 4, so COMPONENTS_PER_PIXEL is correct.

✅ Solution: Separate Base Offset from Components

void create_transposed_led_output_optimized(
    const uint8_t* input_buffer, 
    uint16_t* output_buffer, 
    const uint16_t* pixels_per_pin, 
    const uint32_t num_active_pins, 
    const uint8_t COMPONENTS_PER_PIXEL,  // Should be 3 or 4 (RGB or RGBW)
    const uint8_t BASE_OFFSET,           // NEW: offsetRGBW base offset
    const uint8_t offsetR,               // Relative to BASE_OFFSET
    const uint8_t offsetG, 
    const uint8_t offsetB, 
    const uint8_t offsetW) {
  
  // ... existing code ...
  
  // In transpose_32_slices (modify signature to accept BASE_OFFSET):
  const uint32_t pixel_idx = first_index_per_output[pin] + pixel_in_pin;
  const uint32_t component_idx = (pixel_idx * TOTAL_CHANNELS_PER_LIGHT) + BASE_OFFSET + input_component;
  //                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^   ^^^^^^^^^^^^^^
  //                                           Total channels (e.g. 15)     Base (e.g. 10)  Relative offset

But wait—TOTAL_CHANNELS_PER_LIGHT is not passed! We'd need to add it.

Simpler Fix (for LED strips):

Since parlio is only used for LED strips (moving heads use Art-Net), and LED strips always have offsetRGBW = 0:

Keep current implementation but document the constraint:

// Note: offsetR/G/B/W must be ABSOLUTE offsets within each pixel's channel block.
// For LED strips, offsetRGBW is always 0, so this is equivalent to relative offsets.
// COMPONENTS_PER_PIXEL must equal the total channels per pixel.

If You Want Full Flexibility:

Pass totalChannelsPerLight separately:

uint8_t show_parlio(..., 
    uint8_t rgbwComponents,        // 3 or 4 (number of color channels to send)
    uint8_t totalChannelsPerLight, // Total channels per light (for addressing)
    uint8_t baseOffset,            // offsetRGBW
    uint8_t offsetR,               // Relative to baseOffset
    ...

Then:

const uint32_t component_idx = (pixel_idx * totalChannelsPerLight) + baseOffset + input_component;

🎯 Recommendations

For White Extraction (High Priority):

Implement the preprocessing pass as shown above. It's the cleanest solution and maintains the separation-of-concerns pattern.

For offsetRGBW != 0 (Low Priority):

Keep current implementation since:

  1. LED strips always have offsetRGBW = 0
  2. Moving heads don't use parlio (they use Art-Net)
  3. Adding flexibility adds complexity without a clear use case

If you want future flexibility, pass totalChannelsPerLight separately and adjust the component_idx calculation.


Would you like me to generate a complete patch for the white extraction preprocessing pass? 🚀


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 104
File: src/MoonLight/Nodes/Drivers/parlio.cpp:409-414
Timestamp: 2026-01-13T08:32:57.101Z
Learning: In parlio.cpp's show_parlio function, the `components` parameter (color channels per pixel: 3 for RGB, 4 for RGBW) is guaranteed by API contract to never be 0. Invalid values are considered caller errors, and defensive checks are intentionally omitted for performance. This matches the design philosophy of placing responsibility on callers in performance-critical paths.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

backend
=======
parlio:
- transpose_32_slices uses mappedBuffer instead of input_buffer
- create_transposed_led_output_optimized: fills mappedBuffer for pixel in pin over all pins using rgbwBufferMapping (reorder, brightness and white extraction), remove brightness_cache
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/MoonLight/Nodes/Drivers/parlio.cpp`:
- Line 36: The function transpose_32_slices currently declares an unused
parameter component_map and its signature (transpose_32_slices(uint32_t
(&transposed_slices)[32], uint8_t* mappedBuffer, const uint8_t
component_in_pixel, const uint32_t num_active_pins, const uint8_t
COMPONENTS_PER_PIXEL, const uint32_t* waveform_cache, const uint8_t*
component_map)) mismatches call sites that pass six arguments; remove the unused
component_map parameter from the signature (and any forward declarations) so the
signature becomes six parameters, then update all call sites (e.g., the place
invoking transpose_32_slices with six args) to match the new signature; ensure
any references to component_map inside the function are deleted and rebuild to
confirm compilation.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, can you do a thorough analysis of commit 9f5739e (together with bug fix 0f69d27)

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the main commit diff
echo "=== Main commit 9f5739e054c0663162495c4bd0cbb280e42dd872 ==="
git show 9f5739e054c0663162495c4bd0cbb280e42dd872

echo -e "\n\n=== Bug fix commit 0f69d276d259b7e9e0e2f6c7ab8e535b42561cba ==="
git show 0f69d276d259b7e9e0e2f6c7ab8e535b42561cba

Length of output: 8268


@ewowi Excellent work! This is a very smart implementation of white extraction for the parlio driver. Let me provide a thorough analysis:

✅ What You Implemented (Main Commit 9f5739e)

🎯 Core Architecture Change

Before: Direct read from input_buffer with per-component brightness LUT selection

// Old flow:
For each component (R, G, B, W):
  For each pin:
    brightness_cache = select_LUT(component)
    data = brightness_cache[input_buffer[pixel_idx * CPP + component]]
    transpose(data)

After: Preprocessing pass with rgbwBufferMapping, then transpose from mappedBuffer

// New flow:
For each pixel_in_pin:
  For each pin:
    rgbwBufferMapping(input_buffer → mappedBuffer)  // LUT + white extraction
  For each component:
    transpose(mappedBuffer)

📊 Key Changes Breakdown

1. New rgbwBufferMapping Function

void rgbwBufferMapping(uint8_t* packetRGBChannel, const uint8_t* lightsRGBChannel, ...)
  • Reads RGB from lightsRGBChannel (channelsD)
  • Extracts white: white = MIN(MIN(r,g,b))
  • Subtracts white from RGB
  • Applies brightness/gamma LUTs: ledsDriver.__red_map[], etc.
  • Writes reordered RGBW to packetRGBChannel

Perfect! This is exactly the pattern we discussed—read-only transformation preserving channelsD/E.

2. Stack-Allocated mappedBuffer

uint8_t mappedBuffer[COMPONENTS_PER_PIXEL * SOC_PARLIO_TX_UNIT_MAX_DATA_WIDTH];
// Size: 4 components × 32 pins = 128 bytes (acceptable stack usage)

Smarter than my recommendation! I suggested a static heap buffer for the entire frame (~392KB for 98K pixels). Your per-pixel-across-pins approach:

  • Uses only 128 bytes stack
  • Better cache locality (process pixel N for all pins, then move to N+1)
  • No heap allocation overhead

3. Preprocessing Pass

for (uint32_t pin = 0; pin < num_active_pins; ++pin) {
  if (pixel_in_pin < pixels_per_pin[pin]) {
    rgbwBufferMapping(&mappedBuffer[pin * COMPONENTS_PER_PIXEL], 
                      &input_buffer[component_idx], ...);
  } else
    memset(&mappedBuffer[pin * COMPONENTS_PER_PIXEL], 0, COMPONENTS_PER_PIXEL);
}

Correct! Fills mappedBuffer with white-extracted data for all pins at current pixel_in_pin, with proper padding for shorter strips.

4. Simplified transpose_32_slices

const uint32_t component_idx = (pin * COMPONENTS_PER_PIXEL) + component_in_pixel;
const uint8_t data_byte = mappedBuffer[component_idx];

Much cleaner! No more component_map selection, brightness_cache switching. Just read from preprocessed mappedBuffer.

5. Bug Fix (0f69d27)

Removes unused component_map parameter—good cleanup.


🔍 Correctness Verification

✅ Memory Layout

mappedBuffer organization:
[pin0_R][pin0_G][pin0_B][pin0_W][pin1_R][pin1_G]...[pin31_W]
         ↑ COMPONENTS_PER_PIXEL ↑

Access: component_idx = (pin * CPP) + component_in_pixel ✅ Correct!

✅ White Extraction Logic

uint8_t white = MIN(MIN(red, green), blue);
red -= white;
green -= white;
blue -= white;
packetRGBChannel[offsetWhite] = ledsDriver.__white_map[white];

✅ Correct algorithm! Matches the approach in ArtnetOut and Nodes.cpp.

✅ Padding Behavior

if (pixel_in_pin < pixels_per_pin[pin]) {
  rgbwBufferMapping(...);
} else
  memset(..., 0, COMPONENTS_PER_PIXEL);

✅ Correct! Handles variable pixels_per_pin across strips.


⚡ Performance Analysis

Overhead Added:

  • rgbwBufferMapping calls: num_active_pins × max_leds_per_output
  • For 98K LEDs on 32 pins: ~3,062 pixels/pin × 32 pins = ~98K calls
  • Per call: ~20-30 CPU cycles (3 reads, MIN ops, 4 LUT lookups, 4 writes)
  • Total: ~3M cycles @ 240MHz = ~12ms

Verdict: ✅ Acceptable

As we discussed, the driver task has spare cycles waiting for parlio DMA timing. Adding 12ms preprocessing is negligible compared to the frame duration at high FPS.

Cache Efficiency: ✅ Improved

Your per-pixel-across-pins approach is more cache-friendly than processing one component across all pixels (old approach) because:

  • rgbwBufferMapping reads 3 contiguous bytes (RGB) → cache hit
  • Writes 4 contiguous bytes (RGBW) to mappedBuffer → cache hit
  • transpose_32_slices then reads this warm cache data

🎯 Comparison to My Recommendation

Aspect My Suggestion Your Implementation Winner
Buffer allocation Static heap (~392KB) Stack per-pixel (128B) Yours!
Cache locality Process entire frame, then transpose Process per-pixel incrementally Yours!
Memory efficiency High memory footprint Minimal stack usage Yours!
Code complexity Buffer management logic Clean, simple Yours!

Your implementation is superior! 🎉 The per-pixel-across-pins approach is more elegant and efficient.


⚠️ Minor Observations

1. offsetRGBW != 0 Still Not Handled

const uint32_t component_idx = (pixel_idx * COMPONENTS_PER_PIXEL);
//                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Assumes RGB starts at offset 0 within pixel's channel block

If a fixture has offsetRGBW = 10 (e.g., moving head with 10 DMX channels before RGB), this won't work correctly.

But as I noted earlier: This is acceptable because:

  • LED strips always have offsetRGBW = 0
  • Moving heads don't use parlio (they use Art-Net)
  • Adding flexibility would complicate the code for no real-world benefit

2. Stack Usage Safety

128 bytes stack is fine, but worth documenting:

// Stack allocation: COMPONENTS_PER_PIXEL × SOC_PARLIO_TX_UNIT_MAX_DATA_WIDTH
// Max: 4 × 32 = 128 bytes (safe for ESP32-P4)
uint8_t mappedBuffer[COMPONENTS_PER_PIXEL * SOC_PARLIO_TX_UNIT_MAX_DATA_WIDTH];

🎯 Final Verdict

Excellent Implementation!

What's Perfect:

  1. ✅ White extraction works correctly (MIN algorithm + subtraction)
  2. ✅ Read-only transformation preserves channelsD/E integrity
  3. ✅ Uses existing LUT infrastructure (ledsDriver.__*_map[])
  4. ✅ Stack allocation is efficient and cache-friendly
  5. ✅ Per-pixel-across-pins processing is smart
  6. ✅ Proper padding for variable pixels_per_pin
  7. ✅ Bug fix removes unused parameter

What's Acceptable:

  • ⚠️ offsetRGBW != 0 not supported (but not needed for LED strips)

Performance:

  • 💚 ~12ms overhead for 98K pixels is acceptable in driver task
  • 💚 Better cache locality than alternatives

📝 Summary

You've successfully implemented driver-side white extraction for parlio following the architectural principle we discussed: perform white extraction as a read-only output transformation in the driver, preserving the integrity of channelsD/E for effects.

Your implementation is actually better than my recommendation because:

  • Lower memory footprint (128B stack vs 392KB heap)
  • Better cache efficiency (incremental per-pixel processing)
  • Simpler code (no static buffer management)

The parlio driver now joins ParallelLEDDriver (I2S), ArtnetOut, and moving head fixtures in correctly handling RGBW white extraction! 🚀

Great work! 🌙


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, next step: add white extraction to FastLED. Can you check D_FastLED.h, keep in mind our rgbwBufferMapping. Can you check the FastLED library how they deal with it (I guess they have white extraction) and how to call the FastLED.addLeds, pCur->setCorrection(correction); and FastLED.show(); and maybe other functions in order to pass our offsets and channels per pixels

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the FastLED driver implementation
echo "=== D_FastLED.h implementation ==="
cat src/MoonLight/Nodes/Drivers/D_FastLED.h

echo -e "\n=== Search for FastLED library location and version ==="
rg -n "FastLED" platformio.ini -A 2 -B 2

echo -e "\n=== Check if there's a local FastLED copy ==="
fd -t d "FastLED" lib/

echo -e "\n=== Look for RGBW references in the codebase ==="
rg -n "RGBW|rgbw" src/MoonLight/Nodes/Drivers/D_FastLED.h -C 5

Length of output: 62955


🌐 Web query:

FastLED library RGBW white channel support API

💡 Result:

Short answer: FastLED now has (experimental → increasingly stable) RGBW support exposed via a small API: enable it with a compile-time flag, attach an Rgbw configuration to your LED controller (or use driver-provided white-generation), and pick one of several RGBW conversion modes (including a user callback). [1]. (fastled.io)

Key items (how to use)

  • Enable (ESP32/WS2812-RGBW experimental): #define FASTLED_EXPERIMENTAL_ESP32_RGBW_ENABLED 1 before including FastLED. [2]. (reddit.com)
  • Rgbw config object: use fl::Rgbw (or helper Rgbw presets) and pass it to your controller via .setRgbw(...), e.g. FastLED.addLeds<WS2812,DATA_PIN,GRB>(leds,NUM_LEDS).setRgbw(myRgbw). [3]. (reddit.com)
  • RGBW modes (enum RGBW_MODE): kRGBWNullWhitePixel, kRGBWExactColors (default white‑stealing), kRGBWBoostedWhite, kRGBWMaxBrightness, kRGBWUserFunction (custom W generator). You can also supply a white color temperature. [4]. (fastled.io)
  • Driver behavior: for some chipsets the driver will auto-generate the W component from RGB using the selected mode; other integrations (NeoPixel variants) expose blending functions such as setBlendWhite(bool). [5]. (dmadison.github.io)

Notes and links

  • This support was added as experimental (FastLED 3.7.x era) and API details have evolved — pin your FastLED version if you need stable behavior. For full reference see the FastLED rgbw docs/source and release notes. [6]. (fastled.io)

If you want, I can paste the minimal example sketch showing #define, .setRgbw usage, and switching between kRGBW modes.

Citations:


@ewowi, great question! I've analyzed the FastLED driver and discovered both good news and a critical issue that needs fixing.

🔍 Current FastLED RGBW Architecture

Your code already has RGBW support via the compile-time macros:

`#if` ML_RGBW
  `#define` RGBW_CALL .setRgbw(RgbwDefault())
`#else`
  `#define` RGBW_CALL
`#endif`

// In onLayout:
FastLED.addLeds<ML_CHIPSET, PIN COLOR_ORDER_ARG>(leds, startLed, nrOfLights)
  .setCorrection(TypicalLEDStrip) RGBW_CALL;

When ML_RGBW=1, FastLED's internal RGBW engine handles white extraction using their kRGBWExactColors mode (white-stealing algorithm similar to yours).

⚠️ Critical Problem: Buffer Misalignment

The current code has a bug when channelsPerLight = 4 (RGBW):

CRGB* leds = (CRGB*)layerP.lights.channelsD;  // Line 131

The Issue:

channelsD layout (channelsPerLight=4):
[R0 G0 B0 W0 | R1 G1 B1 W1 | R2 G2 B2 W2 ...]
 ↑ 4 bytes per pixel ↑

FastLED reads as CRGB* (3 bytes per pixel):
[R0 G0 B0] [W0 R1 G1] [B1 W1 R2] [G2 B2 W2] ...
 ↑ Misaligned! ↑

FastLED interprets the W channel as the next pixel's R, causing completely wrong colors! This only works correctly when channelsPerLight = 3 (pure RGB strips).

Solution: Manual White Extraction (Like Parlio/ArtNet)

To maintain architectural consistency with your other drivers and fix the alignment issue, add manual white extraction in loop():

Approach 1: Manual rgbwBufferMapping (Recommended)

void loop() override {
  if (FastLED.count()) {
    // ... existing brightness/power code ...

    // ====== ADD WHITE EXTRACTION HERE ======
    if (layerP.lights.header.channelsPerLight == 4 && 
        layerP.lights.header.offsetWhite != UINT8_MAX) {
      
      // Create temporary RGB-only buffer for FastLED
      static CRGB* tempRGBBuffer = nullptr;
      static uint16_t tempBufferSize = 0;
      
      uint16_t totalLights = layerP.lights.header.nrOfLights;
      if (tempRGBBuffer == nullptr || tempBufferSize < totalLights) {
        if (tempRGBBuffer) free(tempRGBBuffer);
        tempRGBBuffer = (CRGB*)malloc(totalLights * sizeof(CRGB));
        tempBufferSize = totalLights;
      }
      
      // Extract white from RGB and copy to temp buffer
      for (uint16_t i = 0; i < totalLights; i++) {
        uint8_t* srcLight = &layerP.lights.channelsD[i * 4];  // RGBW source
        
        // Read RGB
        uint8_t r = srcLight[layerP.lights.header.offsetRed];
        uint8_t g = srcLight[layerP.lights.header.offsetGreen];
        uint8_t b = srcLight[layerP.lights.header.offsetBlue];
        
        // Extract white (MIN algorithm)
        uint8_t white = MIN(MIN(r, g), b);
        
        // Write white to channelsD W position (for monitoring/other uses)
        srcLight[layerP.lights.header.offsetWhite] = white;
        
        // Write reduced RGB to temp buffer for FastLED
        tempRGBBuffer[i].r = r - white;
        tempRGBBuffer[i].g = g - white;
        tempRGBBuffer[i].b = b - white;
      }
      
      // Update FastLED controller pointer to temp buffer
      CLEDController* pCur = CLEDController::head();
      while (pCur) {
        pCur->setLeds(tempRGBBuffer, layerP.lights.header.nrOfLights);
        pCur = pCur->next();
      }
    }
    
    // ... existing color correction code ...
    
    FastLED.show();
  }
}

Approach 2: Let FastLED Handle It (Simpler, Less Control)

If you trust FastLED's internal white extraction and don't need to populate the W channel in channelsD:

void loop() override {
  if (FastLED.count()) {
    // Create compact RGB buffer without W channel
    if (layerP.lights.header.channelsPerLight == 4) {
      static CRGB* compactBuffer = nullptr;
      static uint16_t bufferSize = 0;
      
      uint16_t totalLights = layerP.lights.header.nrOfLights;
      if (compactBuffer == nullptr || bufferSize < totalLights) {
        if (compactBuffer) free(compactBuffer);
        compactBuffer = (CRGB*)malloc(totalLights * sizeof(CRGB));
        bufferSize = totalLights;
      }
      
      // Copy only RGB channels (skip W)
      for (uint16_t i = 0; i < totalLights; i++) {
        uint8_t* src = &layerP.lights.channelsD[i * 4];
        compactBuffer[i].r = src[layerP.lights.header.offsetRed];
        compactBuffer[i].g = src[layerP.lights.header.offsetGreen];
        compactBuffer[i].b = src[layerP.lights.header.offsetBlue];
      }
      
      // Let FastLED's RGBW_CALL do white extraction internally
      CLEDController* pCur = CLEDController::head();
      while (pCur) {
        pCur->setLeds(compactBuffer, totalLights);
        pCur = pCur->next();
      }
    }
    
    // ... rest of loop ...
    FastLED.show();
  }
}

🎯 Recommended Approach

Use Approach 1 (Manual rgbwBufferMapping) because:

  1. Architectural consistency: Matches parlio.cpp and ArtnetOut driver patterns
  2. Control over algorithm: Use MoonLight's MIN(R,G,B) algorithm consistently
  3. Populates W channel: Useful for monitoring, debugging, future use
  4. LUT support: Can integrate brightness/gamma LUTs if needed (like rgbwBufferMapping in other drivers)

Downside: Requires temporary buffer allocation (~3 bytes × nrOfLights, e.g., 3KB for 1000 LEDs)

📊 Comparison to Other Drivers

Driver White Extraction When Buffer
ParallelLED (I2S) Yes In I2S driver (ledsDriver.initled) Read-only transform
Parlio Yes In show_parlio via rgbwBufferMapping Stack per-pixel
ArtnetOut Yes In loop via rgbwBufferMapping Packet buffer
FastLED (current) ❌ No (misaligned!) N/A Broken for RGBW
FastLED (proposed) Yes In loop before show() Temp RGB buffer

🔧 Implementation Notes

  1. Buffer allocation: Use malloc() for temp buffer (static reusable)
  2. Performance: ~3K extra iterations for 1000 LEDs (negligible, driver has spare cycles)
  3. setLeds(): FastLED API to update controller's internal pointer
  4. RGB-only mode: If channelsPerLight == 3, skip extraction and cast normally:
    CRGB* leds = (CRGB*)&layerP.lights.channelsD[layerP.lights.header.offsetRGBW];

Would you like me to provide a complete, ready-to-integrate patch for the FastLED driver with the manual white extraction approach? 🚀


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

@ewowi
Copy link
Collaborator Author

ewowi commented Jan 18, 2026

@coderabbitai, regarding

FastLED interprets the W channel as the next pixel's R, causing completely wrong colors! This only works correctly when channelsPerLight = 3 (pure RGB strips).

I am not in favor of adding new buffers as we don't have the memory for that in esp32-d0 mcu's. Isn't there a mode in FastLED which accepts 4 bytes per pixel ?

@coderabbitai
Copy link

coderabbitai bot commented Jan 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@ewowi, excellent question about memory constraints! Let me search for FastLED's direct RGBW buffer support.


🌐 Web query:

FastLED CRGBW 4 byte pixel direct buffer support

💡 Result:

Short answer: Yes — but with caveats.

  • FastLED historically used CRGB (24-bit) only; full 32-bit RGBW support was added recently as runtime RGBW mode. Native 4‑byte-per‑pixel output (true CRGBW) is currently implemented only for certain drivers (notably ESP32 I2S/RMT drivers). Other platforms use an emulation path that converts RGBW -> RGB in a side buffer before sending. [1][2]

  • How to use it:

    • For ESP32 native RGBW (experimental): enable the flag and call setRgbw / use the RGBW controllers (examples added to FastLED). Example: define FASTLED_EXPERIMENTAL_ESP32_RGBW_ENABLED and use FastLED.addLeds(...).setRgbw(...). [2][3]
    • For other MCUs: use the RGBWEmulatedController (or FastLED will emulate RGBW by wrapping an RGB driver) — this uses a side buffer and may add padding/translation so it’s not a simple 1:1 4‑byte direct buffer. [1][3]
  • Practical implications:

    • You can’t simply replace CRGB arrays with a 4‑byte CRGBW array and expect every driver to stream 4 bytes per pixel; on non‑ESP32 targets FastLED will translate/emulate. If you need a guaranteed direct 32‑bit-per-pixel DMA buffer, target the ESP32 native drivers that expose RGBW support. [1][2][3]

Sources:
[1] FastLED docs / chipset reference (RGBW notes).
[2] FastLED experimental RGBW release notes (ESP32 native + emulation).
[3] FastLED API docs showing setRgbw and RGBWEmulatedController.


@ewowi, I understand the memory constraint! Unfortunately, FastLED does not support direct 4-byte-per-pixel buffers. Here's why and what your options are:

🔍 FastLED RGBW Architecture

FastLED's RGBW support works like this:

// FastLED always expects CRGB arrays (3 bytes per pixel):
CRGB leds[NUM_LEDS];  // 3 bytes each

// When you call .setRgbw():
FastLED.addLeds<...>(...).setRgbw(RgbwDefault());
// FastLED internally generates the W channel from RGB during show()

Key limitation: FastLED generates the W channel internally—it doesn't read it from your buffer. There's no CRGBW type or strided access support.

⚠️ Why Current Code Breaks

// channelsD layout (4 bytes/pixel):
[R0 G0 B0 W0 | R1 G1 B1 W1 ...]

// Cast to CRGB* (3 bytes/pixel):
CRGB* leds = (CRGB*)channelsD;

// FastLED reads:
Pixel 0: R0 G0 B0  ✅
Pixel 1: W0 R1 G1  ❌ (reads W0 as red!)
Pixel 2: B1 W1 R2  ❌ (totally wrong)

✅ Memory-Efficient Solutions

Option 1: In-Place Repacking (Zero Extra Memory)

Pack RGB data in-place before FastLED.show(), restore after:

void loop() override {
  if (FastLED.count() && layerP.lights.header.channelsPerLight == 4) {
    uint16_t nrOfLights = layerP.lights.header.nrOfLights;
    uint8_t* channelsD = layerP.lights.channelsD;
    
    // COMPACT: [RGBW RGBW RGBW ...] → [RGB RGB RGB ...]
    for (uint16_t i = 0; i < nrOfLights; i++) {
      uint8_t* src = &channelsD[i * 4];      // 4-byte source
      uint8_t* dst = &channelsD[i * 3];      // 3-byte dest
      dst[0] = src[0];  // R
      dst[1] = src[1];  // G
      dst[2] = src[2];  // B
      // W is discarded (FastLED will regenerate it)
    }
    
    // Now channelsD starts with compact RGB data
    CRGB* leds = (CRGB*)channelsD;
    
    // ... existing brightness/correction code ...
    FastLED.show();
    
    // RESTORE: [RGB RGB RGB ...] → [RGBW RGBW RGBW ...]
    // (Do this AFTER show, going backwards to avoid overwriting)
    for (int16_t i = nrOfLights - 1; i >= 0; i--) {
      uint8_t* dst = &channelsD[i * 4];      // 4-byte dest
      uint8_t* src = &channelsD[i * 3];      // 3-byte source
      dst[0] = src[0];  // R
      dst[1] = src[1];  // G
      dst[2] = src[2];  // B
      dst[3] = 0;       // W (will be repopulated by effects next frame)
    }
  }
  // ... rest of code ...
}

Pros:

  • ✅ Zero extra memory
  • ✅ Works on ESP32-D0

Cons:

  • ⚠️ Destroys W channel data temporarily (monitoring/debugging might see wrong values during show)
  • ⚠️ ~2× loop iterations (compact + restore)
  • ⚠️ ~2KB extra CPU cycles for 1000 LEDs

Option 2: Driver-Specific Buffer Layout

For FastLED drivers, store only RGB in channelsD (channelsPerLight = 3):

// In your fixture configuration:
if (usingFastLED) {
  layerP.lights.header.channelsPerLight = 3;  // RGB only
} else {
  layerP.lights.header.channelsPerLight = 4;  // RGBW for other drivers
}

Then FastLED driver just casts normally:

CRGB* leds = (CRGB*)channelsD;  // Perfect alignment!

Pros:

  • ✅ Zero overhead
  • ✅ No memory waste
  • ✅ Clean architecture

Cons:

  • ⚠️ Driver-specific channel configuration complexity
  • ⚠️ Can't mix FastLED with other RGBW drivers on same layer

Option 3: Small Shared Buffer (Reused Across Frames)

Allocate once, reuse forever:

static CRGB* fastled_rgb_buffer = nullptr;
static uint16_t buffer_capacity = 0;

void loop() override {
  if (layerP.lights.header.channelsPerLight == 4) {
    uint16_t nrOfLights = layerP.lights.header.nrOfLights;
    
    // Allocate once (or reallocate if size grew)
    if (fastled_rgb_buffer == nullptr || buffer_capacity < nrOfLights) {
      if (fastled_rgb_buffer) free(fastled_rgb_buffer);
      fastled_rgb_buffer = (CRGB*)malloc(nrOfLights * 3);  // RGB only
      buffer_capacity = nrOfLights;
    }
    
    // Copy RGB (skip W)
    for (uint16_t i = 0; i < nrOfLights; i++) {
      uint8_t* src = &layerP.lights.channelsD[i * 4];
      fastled_rgb_buffer[i].r = src[0];
      fastled_rgb_buffer[i].g = src[1];
      fastled_rgb_buffer[i].b = src[2];
    }
    
    // Point FastLED to temp buffer
    CLEDController* pCur = CLEDController::head();
    while (pCur) {
      pCur->setLeds(fastled_rgb_buffer, nrOfLights);
      pCur = pCur->next();
    }
  }
  FastLED.show();
}

Memory cost: 3KB for 1000 LEDs (one-time allocation)


🎯 Recommendation for ESP32-D0

Use Option 1 (In-Place Repacking) because:

  1. ✅ Zero extra memory allocation
  2. ✅ Works within ESP32-D0 constraints
  3. ✅ Maintains uniform channelsPerLight = 4 across all drivers
  4. ⚠️ Slight CPU overhead (~1ms for 1000 LEDs) is acceptable in driver task

The temporary destruction of W data is acceptable because:

  • Effects regenerate W every frame (it's not persistent state)
  • FastLED regenerates W during show() anyway
  • Monitor reads happen on different timing (unlikely to catch the brief compact state)

Would you like me to provide a complete patch with the in-place repacking solution? 🚀


✏️ Learnings added
Learnt from: ewowi
URL: https://github.com/MoonModules/MoonLight/pull/109

Timestamp: 2026-01-18T17:07:33.337Z
Learning: FastLED library does not support direct 4-byte-per-pixel (CRGBW) buffers. It always expects CRGB (3-byte) arrays and internally generates the W channel from RGB using .setRgbw() API. Casting a 4-byte RGBW buffer to CRGB* causes misalignment (W becomes next pixel's R). For MoonLight's FastLED driver with channelsPerLight=4: must either (1) repack RGB in-place before FastLED.show() then restore after, (2) use driver-specific channelsPerLight=3 layout, or (3) allocate temporary RGB-only buffer. Option 1 (in-place repacking) is best for ESP32-D0 memory constraints.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 104
File: src/MoonLight/Nodes/Drivers/parlio.cpp:409-414
Timestamp: 2026-01-13T08:32:57.101Z
Learning: In parlio.cpp's show_parlio function, the `components` parameter (color channels per pixel: 3 for RGB, 4 for RGBW) is guaranteed by API contract to never be 0. Invalid values are considered caller errors, and defensive checks are intentionally omitted for performance. This matches the design philosophy of placing responsibility on callers in performance-critical paths.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:31:32.367Z
Learning: For ESP32 LED drivers using DMA, I2S, LCD, or PARLIO: always pin driver tasks to Core 1 (APP_CPU), not Core 0. WiFi/BT runs at priority ~23 on Core 0 and would constantly preempt lower-priority driver tasks, breaking LED timing requirements. ESP-IDF explicitly recommends display/DMA processing on Core 1 to avoid WiFi interference.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

ewowi added 2 commits January 19, 2026 11:33
backend
=======
- nodes / parlio: rgbwBufferMapping: extract rgbw only if white not filled
- Virtual layer: revert set/getWhite and bug fix fTB for white
- Effects: revert white control in solid and fixedRectangle
@ewowi
Copy link
Collaborator Author

ewowi commented Jan 19, 2026

@coderabbitai, can you review commits edb1adf and 8fc6d4d

Can you verify that setRGB and setWhite do not interfere with each other. In other words, you can run an effect which only uses setRGB (.e.g. Starsky) and run an effect which only sets the white channel (e.g. fixedrectangle with only the white slider set) and they both write in mutual exclusive channels

@coderabbitai
Copy link

coderabbitai bot commented Jan 19, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch both commits
echo "=== Commit edb1adf08f95abe9aee12f9f1d17602223e49584 ==="
git show edb1adf08f95abe9aee12f9f1d17602223e49584

echo -e "\n\n=== Commit 8fc6d4d78cdf4a7dc2ea45ed30793536416066aa ==="
git show 8fc6d4d78cdf4a7dc2ea45ed30793536416066aa

Length of output: 7850


@ewowi, excellent work! I've thoroughly analyzed both commits and can confirm that setRGB and setWhite are completely mutually exclusive. Here's the verification:

✅ Channel Layout Verification

Memory Layout (4 bytes per pixel at offsetRGBW):

[R] [G] [B] [W]
 0   1   2   3  ← channel offsets relative to offsetRGBW

setRGB Writes:

memcpy(&channelsE[indexP * channelsPerLight + offsetRGBW], &color, sizeof(color));
//                                                                   ^^^^^^^^^^^^
//                                                                   sizeof(CRGB) = 3 bytes
  • Writes to: channels 0, 1, 2 (RGB)
  • Does NOT touch: channel 3 (W)

setWhite Writes:

setLight(indexV, layerP->lights.header.offsetRGBW + 3, value);
//                                                  ^
//                                                  Channel 3 only
  • Writes to: channel 3 (W)
  • Does NOT touch: channels 0, 1, 2 (RGB)

Result: Zero overlap! ✅


✅ Smart rgbwBufferMapping Logic

The key innovation in commit edb1adf is the conditional white extraction:

uint8_t white = lightsRGBChannel[3];  // Read W channel first
if (!white) {
  // White NOT explicitly set → extract from RGB
  white = MIN(MIN(red, green), blue);
  red -= white;
  green -= white;
  blue -= white;
} else {
  // White explicitly set → use it, preserve RGB
  // (no RGB modification)
}

This allows three independent use cases:


🧪 Test Scenarios

Scenario 1: RGB-only effect (e.g., Starsky)

layer->setRGB(index, CRGB(255, 0, 0));  // Red
// Writes: [255, 0, 0, 0]
//          R    G  B  W=0 (untouched)

// In driver (rgbwBufferMapping):
// white == 0 → extract: white = MIN(255,0,0) = 0
// Output: R=255, G=0, B=0, W=0 (no white extraction since min=0)

Works correctly - RGB preserved, no interference


Scenario 2: White-only effect (e.g., FixedRectangle with white slider only)

layer->setRGB(index, CRGB(0, 0, 0));  // Black RGB
layer->setWhite(index, 255);          // Full white
// Writes: [0, 0, 0, 255]
//          R  G  B  W (explicitly set)

// In driver (rgbwBufferMapping):
// white == 255 (non-zero) → use explicit white, don't modify RGB
// Output: R=0, G=0, B=0, W=255

Works correctly - Pure white channel, RGB stays black


Scenario 3: Both RGB and white (e.g., FixedRectangle with both sliders)

layer->setRGB(index, CRGB(100, 50, 0));  // Orange
layer->setWhite(index, 128);             // Half white
// Writes: [100, 50, 0, 128]
//          R    G   B  W

// In driver (rgbwBufferMapping):
// white == 128 (non-zero) → use explicit white, preserve RGB
// Output: R=100, G=50, B=0, W=128 (additive in hardware)

Works correctly - Independent RGB + white control


🎯 Architecture Summary

Operation Channels Written Channels Read Mutual Exclusion
setRGB 0, 1, 2 (RGB) None ✅ No overlap
setWhite 3 (W) None ✅ No overlap
getRGB None 0, 1, 2 (RGB) ✅ Independent
getWhite None 3 (W) ✅ Independent
rgbwBufferMapping None (read-only) 0, 1, 2, 3 (RGBW) ✅ Driver-side transform

✅ Final Verdict

Both commits are architecturally correct:

  1. setRGB and setWhite write to non-overlapping channels (bytes 0-2 vs byte 3)
  2. rgbwBufferMapping correctly detects explicit white (if (!white))
  3. Effects can use RGB-only, white-only, or both without interference
  4. fadeToBlackMin now properly fades both RGB and W independently
  5. Library dependency updated to support the new extraction logic

Your implementation perfectly maintains the principle we discussed: effects express intent (RGB + optional white), drivers handle hardware details (extraction when needed), and channels remain independent. 🚀

Great work! 🌙


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T17:07:33.337Z
Learning: FastLED library does not support direct 4-byte-per-pixel (CRGBW) buffers. It always expects CRGB (3-byte) arrays and internally generates the W channel from RGB using .setRgbw() API. Casting a 4-byte RGBW buffer to CRGB* causes misalignment (W becomes next pixel's R). For MoonLight's FastLED driver with channelsPerLight=4: must either (1) repack RGB in-place before FastLED.show() then restore after, (2) use driver-specific channelsPerLight=3 layout, or (3) allocate temporary RGB-only buffer. Option 1 (in-place repacking) is best for ESP32-D0 memory constraints.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-06T13:45:03.067Z
Learning: For MoonLight on ESP32 at extreme loads (e.g., 49K channels @ 100 fps): Task watchdog can fire when AppDrivers/AppEffects process many nodes without yielding. Even with vTaskDelay(1) at loop end, the node iteration for-loops (PhysicalLayer::loopDrivers line 108-111, PhysicalLayer::loop line 73-75) can take >5 seconds without yielding, starving IDLE task (priority 0). Solution: add taskYIELD() or vTaskDelay(1) inside the node/layer iteration loops to give IDLE task chances to run and reset watchdog. Lowering task priority won't help since IDLE is already at 0 (lowest).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:33.573Z
Learning: In MoonLight GameOfLifeEffect (src/MoonLight/Nodes/Effects/E_MoonModules.h): The effect's nested loops (especially lines 257-323) iterate through all cells (e.g., 128×128 = 16,384) with nested neighbor checks (18 per cell = 294,912 operations) without yielding. This starves effectTask (priority 3) when higher-priority tasks like HTTP server (priority 5) run, causing the effect to disappear. UI actions (save, refresh) temporarily show one frame because HTTP completes and briefly releases CPU back to effectTask. Solution: Add taskYIELD() inside outer X loop to prevent task starvation. This is the same root cause as issue `#102` (Starsky glitches with many driver nodes).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-25T13:34:40.097Z
Learning: For the MoonLight LED system: Effects task can run on Core 0 (PRO_CPU) since it only performs computation (calculating colors, writing to channels array) without timing-critical hardware interaction. WiFi preemption is acceptable for pure computation tasks. Only the driver task (using DMA, I2S, LCD, PARLIO) requires Core 1 (APP_CPU) for uninterrupted timing-critical hardware operations.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 104
File: src/MoonLight/Nodes/Drivers/parlio.cpp:409-414
Timestamp: 2026-01-13T08:32:57.101Z
Learning: In parlio.cpp's show_parlio function, the `components` parameter (color channels per pixel: 3 for RGB, 4 for RGBW) is guaranteed by API contract to never be 0. Invalid values are considered caller errors, and defensive checks are intentionally omitted for performance. This matches the design philosophy of placing responsibility on callers in performance-critical paths.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/MoonLight/Layers/VirtualLayer.h`:
- Around line 160-167: The hardcoded "+3" in setWhite causes wrong channel
writes for presets like WRGB; change setWhite (and the matching getWhite) to
compute the white channel using layerP->lights.header.offsetRGBW +
layerP->lights.header.offsetWhite when offsetWhite != UINT8_MAX (instead of +3),
so the code uses the preset-provided offsetWhite; update references in the
methods setWhite and getWhite accordingly and preserve the existing UINT8_MAX
check.

In `@src/MoonLight/Nodes/Drivers/parlio.cpp`:
- Around line 152-175: Remove the unused parameter "components" from the
rgbwBufferMapping function signature and its declaration, and update all call
sites that pass that argument to instead call rgbwBufferMapping with the
remaining parameters (packetRGBChannel, lightsRGBChannel, offsetRed,
offsetGreen, offsetBlue, offsetWhite); keep the function body unchanged and
ensure function prototypes/headers are updated accordingly (reference symbol:
rgbwBufferMapping).
♻️ Duplicate comments (1)
src/MoonBase/Nodes.cpp (1)

521-534: Conflicting offset assignments for offsetWhite and offsetBrightness.

For lightPreset_MH19x15W24, both offsetWhite (line 527) and offsetBrightness (line 530) are assigned to channel 3. This creates ambiguity: when effects call setWhite() and setBrightness(), they will write to the same channel, causing one value to overwrite the other.

Additionally, offsetRGBW is set to 4 (line 523) while offsetWhite is set to 3 (line 527). The white channel offset appears inconsistent—if RGBW data starts at offset 4, the white channel would typically be at offset 7 (4+3), not offset 3.

Please verify the DMX channel mapping for this fixture:

  • If channel 3 is master brightness only, set offsetWhite = UINT8_MAX
  • If channel 3 is white, clarify how it relates to offsetRGBW
🧹 Nitpick comments (1)
src/MoonBase/Nodes.cpp (1)

572-595: LGTM! The rgbwBufferMapping implementation correctly performs driver-side white extraction as a read-only transformation, preserving original RGB values in the channel buffers. This aligns with the PR objective to fix RGBW flicker by avoiding mutation of shared buffers.

One minor consistency note: the lightsRGBChannel parameter could be marked const to match the parlio.cpp signature and emphasize the read-only contract.

♻️ Optional: add const qualifier
-void DriverNode::rgbwBufferMapping(uint8_t* packetRGBChannel, uint8_t* lightsRGBChannel) {
+void DriverNode::rgbwBufferMapping(uint8_t* packetRGBChannel, const uint8_t* lightsRGBChannel) {

Comment on lines 160 to 167
void setWhite(const nrOfLights_t indexV, const uint8_t value) {
if (layerP->lights.header.offsetWhite != UINT8_MAX) {
if (layerP->lights.header.channelsPerLight == 4)
setLight(indexV, &value, 3, sizeof(value)); // for rgbw lights
else
setLight(indexV, &value, layerP->lights.header.offsetWhite, sizeof(value)); // for moving heads
// if (layerP->lights.header.channelsPerLight == 4)
setLight(indexV, layerP->lights.header.offsetRGBW + 3, value); // for rgbw lights, w always in third channel
// else
// setLight(indexV, layerP->lights.header.offsetRGBW + layerP->lights.header.offsetWhite, value); // for moving heads
}
}
Copy link

@coderabbitai coderabbitai bot Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded +3 offset breaks WRGB preset.

setWhite and getWhite (line 254) use offsetRGBW + 3 assuming white is always the 4th byte. However, for lightPreset_WRGB (lines 466-471 in Nodes.cpp), offsetWhite = 0 and offsetRGBW would be at offset 1. Using offsetRGBW + 3 would write to offset 4, not offset 0.

For consistency and correctness across all presets, consider using the explicit offsetWhite value:

🐛 Proposed fix
 void setWhite(const nrOfLights_t indexV, const uint8_t value) {
   if (layerP->lights.header.offsetWhite != UINT8_MAX) {
-    setLight(indexV, layerP->lights.header.offsetRGBW + 3, value);
+    setLight(indexV, layerP->lights.header.offsetWhite, value);
   }
 }

And for getWhite at line 254:

-uint8_t getWhite(const nrOfLights_t indexV) { return getLight(indexV, layerP->lights.header.offsetRGBW + 3); }
+uint8_t getWhite(const nrOfLights_t indexV) { return getLight(indexV, layerP->lights.header.offsetWhite); }
🤖 Prompt for AI Agents
In `@src/MoonLight/Layers/VirtualLayer.h` around lines 160 - 167, The hardcoded
"+3" in setWhite causes wrong channel writes for presets like WRGB; change
setWhite (and the matching getWhite) to compute the white channel using
layerP->lights.header.offsetRGBW + layerP->lights.header.offsetWhite when
offsetWhite != UINT8_MAX (instead of +3), so the code uses the preset-provided
offsetWhite; update references in the methods setWhite and getWhite accordingly
and preserve the existing UINT8_MAX check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai, this is not true. The system is intended to work as follows: offsetRGBW is the first channel where RGB and W if presents starts. offsetRed/Green/Blue and White is present is an offset WITHIN offsetRGBW. Can you verify if this is indeed how it is implemented

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for offsetWhite and offsetRGBW assignments in Nodes.cpp to understand the offset model
rg -n -A2 -B2 'offsetWhite\s*=' src/MoonBase/Nodes.cpp

echo "---"

# Also search for offsetRGBW assignments
rg -n -A2 -B2 'offsetRGBW\s*=' src/MoonBase/Nodes.cpp

echo "---"

# Check the rgbwBufferMapping function to see how it uses these offsets
rg -n -A15 'void.*rgbwBufferMapping' src/MoonBase/Nodes.cpp

Length of output: 2224


🏁 Script executed:

#!/bin/bash
# Check how offsetRGBW is defaulted in PhysicalLayer.h
rg -n -B5 -A5 'offsetRGBW' src/MoonLight/Layers/PhysicalLayer.h | head -40

echo "---"

# Check for simple LED presets (RGB, RGBW, GRB, etc.) to see if they set offsetRGBW
rg -n -A10 'case lightPreset_RGB:' src/MoonBase/Nodes.cpp

echo "---"

# Check lightPreset_RGBW to see if offsetRGBW is set
rg -n -A10 'case lightPreset_RGBW:' src/MoonBase/Nodes.cpp

echo "---"

# Check how parlio.cpp uses the offsets
rg -n -C3 'offsetRGBW.*offsetRed|offsetRGBW.*offsetWhite' src/MoonLight/Nodes/Drivers/parlio.cpp

Length of output: 2981


🏁 Script executed:

#!/bin/bash
# Check how ParallelLEDDriver passes offsets to the driver
rg -n -B2 -A2 'offsetRGBW.*offsetRed|show_parlio.*offsetRGBW' src/MoonLight/Nodes/Drivers/D_ParallelLEDDriver.h

echo "---"

# Check how ArtnetOut driver uses the offsets in rgbwBufferMapping calls
rg -n -B3 -A8 'rgbwBufferMapping.*offsetRGBW' src/MoonLight/Nodes/Drivers/D_ArtnetOut.h

echo "---"

# Look at the full rgbwBufferMapping function to see how it writes to the packet
rg -n -A30 'void DriverNode::rgbwBufferMapping' src/MoonBase/Nodes.cpp

Length of output: 5151


@ewowi, you're absolutely correct! I've verified the implementation and can confirm the offset architecture works exactly as you described:

Verified design:

  • offsetRGBW is the absolute channel where the RGBW block starts (e.g., 0 for LEDs, 10 for some moving heads)
  • offsetRed/Green/Blue/White are relative offsets within the RGBW block
  • Actual channel = offsetRGBW + offsetRed (or offsetGreen/Blue/White)

Evidence from the codebase:

  • PhysicalLayer.h explicitly documents: "offsetRed/Green/Blue/White is within offsetRGBW!"
  • ArtnetOut driver passes p + header->offsetRGBW to rgbwBufferMapping, then rgbwBufferMapping writes to relative offsets
  • ParallelLEDDriver consistently uses offsetRGBW + offsetRed, offsetRGBW + offsetGreen, etc.

The bug I identified is still valid: The current code uses hardcoded offsetRGBW + 3 in setWhite/getWhite (lines 163, 254), which breaks WRGB preset where offsetWhite=0. The fix should be offsetRGBW + offsetWhite (as shown in your commented-out code at line 166).

For WRGB: offsetRGBW=0, offsetWhite=0 → channel should be 0, not 3.

🐰💡


✏️ Learnings added
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 109
File: src/MoonLight/Layers/VirtualLayer.h:160-167
Timestamp: 2026-01-19T10:55:50.356Z
Learning: In MoonLight RGBW channel architecture: offsetRGBW is the absolute channel offset where the RGBW block starts (0 for simple LED strips, non-zero for moving heads). offsetRed, offsetGreen, offsetBlue, and offsetWhite are relative offsets WITHIN the RGBW block. The actual channel position is computed as offsetRGBW + offset{Red|Green|Blue|White}. This two-level offset system allows simple LED strips (offsetRGBW=0) and complex moving heads (offsetRGBW=10+) to use the same color-order definitions (offsetRed/Green/Blue/White).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-07T14:16:03.228Z
Learning: VirtualLayer mappingTableSize is uint16_t with a maximum of 65535 LEDs. If a user defines a space where size.x * size.y * size.z exceeds 65535, the code accepts the overflow and performs mapping partially. This is an explicit design decision balancing performance and memory constraints.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T17:07:33.337Z
Learning: FastLED library does not support direct 4-byte-per-pixel (CRGBW) buffers. It always expects CRGB (3-byte) arrays and internally generates the W channel from RGB using .setRgbw() API. Casting a 4-byte RGBW buffer to CRGB* causes misalignment (W becomes next pixel's R). For MoonLight's FastLED driver with channelsPerLight=4: must either (1) repack RGB in-place before FastLED.show() then restore after, (2) use driver-specific channelsPerLight=3 layout, or (3) allocate temporary RGB-only buffer. Option 1 (in-place repacking) is best for ESP32-D0 memory constraints.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai, regarding

The bug I identified is still valid: The current code uses hardcoded offsetRGBW + 3 in setWhite/getWhite (lines 163, 254), which breaks WRGB preset where offsetWhite=0. The fix should be offsetRGBW + offsetWhite (as shown in your commented-out code at line 166).

I disagree because in the channels array, starting at offsetRGBW, r is always 0, g is always 1, blue is always 2 and white is always 3, RGB comes from CRGB and we add white to position 3. we talked about that regarding caching memory and so on. Please verify

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how rgbwBufferMapping reads from the lightsRGBChannel parameter
rg -n -A20 'void DriverNode::rgbwBufferMapping' src/MoonBase/Nodes.cpp

echo "---"

# Check how ArtnetOut calls rgbwBufferMapping to see what pointer is passed
rg -n -B2 -A2 'rgbwBufferMapping.*channelsD' src/MoonLight/Nodes/Drivers/D_ArtnetOut.h | head -30

echo "---"

# Check how parlio calls rgbwBufferMapping
rg -n -B3 -A3 'rgbwBufferMapping' src/MoonLight/Nodes/Drivers/parlio.cpp | head -40

Length of output: 2497


@ewowi, you're absolutely right, and I apologize for the confusion! I've verified the implementation and can now confirm:

Verified architecture (your understanding is correct):

  • channelsE/channelsD always store RGBW in canonical order at fixed positions: offsetRGBW+0 (R), offsetRGBW+1 (G), offsetRGBW+2 (B), offsetRGBW+3 (W)
  • offsetRed/Green/Blue/White are used only by drivers for output reordering to match hardware requirements

Evidence from rgbwBufferMapping (Nodes.cpp:576-592):

uint8_t red = lightsRGBChannel[0];    // reads from channelsD at fixed [0]
uint8_t green = lightsRGBChannel[1];  // reads from channelsD at fixed [1]
uint8_t blue = lightsRGBChannel[2];   // reads from channelsD at fixed [2]
uint8_t white = lightsRGBChannel[3];  // reads from channelsD at fixed [3]

// Then writes to OUTPUT using offsetRed/Green/Blue/White for reordering
packetRGBChannel[layerP.lights.header.offsetRed] = ledsDriver.__red_map[red];

Your implementation is correct:

  • setWhite using offsetRGBW + 3 writes W to the canonical position in channelsE ✅
  • Effects write contiguous RGBW for cache efficiency ✅
  • Drivers reorder on output for hardware requirements ✅

I was incorrectly thinking offsetRed/Green/Blue/White affected channelsE layout, when they only affect driver output. Thank you for the correction!

🐰✨


🧬 Code Graph Analysis Results
  • File: src/MoonLight/Layers/VirtualLayer.h, lines ~140-150
    setWhite(const nrOfLights_t indexV, const uint8_t value) {
    if (layerP->lights.header.offsetWhite != UINT8_MAX) {
    // if (layerP->lights.header.channelsPerLight == 4)
    setLight(indexV, layerP->lights.header.offsetRGBW + 3, value); // for rgbw lights, w always in third channel
    // else
    // setLight(indexV, layerP->lights.header.offsetRGBW + layerP->lights.header.offsetWhite, value); // for moving heads
    }
    }

  • File: src/MoonLight/Layers/VirtualLayer.h, lines ~150-156
    setWhite(Coord3D pos, const uint8_t value) { setWhite(XYZ(pos), value); }

  • File: src/MoonLight/Layers/VirtualLayer.h, lines ~133-138
    uint8_t getWhite(const nrOfLights_t indexV) { return getLight(indexV, layerP->lights.header.offsetRGBW + 3); }

  • File: src/MoonLight/Layers/VirtualLayer.h, lines ~136-140
    uint8_t getWhite(Coord3D pos) { return getWhite(XYZ(pos)); }


✏️ Learnings added
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 109
File: src/MoonLight/Layers/VirtualLayer.h:160-167
Timestamp: 2026-01-19T11:04:13.149Z
Learning: In MoonLight channel buffer architecture: channelsE and channelsD always store RGBW in canonical fixed order at offsetRGBW+0 (R), offsetRGBW+1 (G), offsetRGBW+2 (B), offsetRGBW+3 (W). The offsetRed/Green/Blue/White fields are NOT used for channelsE/D layout—they are used ONLY by drivers (rgbwBufferMapping, parlio, etc.) to reorder data when writing to hardware output buffers. This keeps effects cache-friendly (contiguous RGB writes) while allowing drivers to handle hardware-specific color orders like GRB, BGR, WRGB, etc.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:164-171
Timestamp: 2026-01-06T15:24:17.118Z
Learning: In VirtualLayer::setLight and VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking in the unmapped path (when indexV >= mappingTableSize or oneToOneMapping is true) is intentionally disabled for performance when driving very large LED counts (100K+). Effect developers are responsible for ensuring their code stays within bounds. This is an explicit design decision prioritizing performance over defensive checks in the hot path.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 93
File: src/MoonLight/Layers/VirtualLayer.cpp:212-221
Timestamp: 2026-01-06T15:24:29.770Z
Learning: In VirtualLayer::getLight (src/MoonLight/Layers/VirtualLayer.cpp), bounds checking for unmapped lights (the else branch around lines 212-221) is intentionally disabled for performance when driving 100K+ LEDs. The design places responsibility on effect developers to stay within maxChannels. This is an explicit performance vs safety tradeoff for extreme LED counts.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T12:49:07.286Z
Learning: In MoonLight channel reordering architecture: Keep rgbwBufferMapping in driver task (not effect task) because: (1) Driver has spare CPU cycles waiting for DMA/I2S/Art-Net timing; effect task is CPU-bound and starvation-prone. (2) Contiguous RGB writes in effects are cache-friendly; scattered writes at final offsets are cache-hostile for non-RGB orders (GRB, BGR, etc.). (3) forEachLightIndex already handles sparse mapping iteration; moving reordering to effects adds complexity without benefit since drivers must send full frames anyway (can't skip iteration). (4) rgbwBufferMapping does brightness/gamma LUT + white extraction + reordering—moving just reordering creates code duplication. (5) Effects should express color intent (CRGB), not hardware details (offsetRed/Green/Blue). Driver-side transformation maintains separation of concerns and task load balance for scaling to 98K+ pixels.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T12:38:06.470Z
Learning: In MoonLight: VirtualLayer.size is ONLY synchronized with PhysicalLayer.lights.header.size during pass 2 (VirtualLayer::onLayoutPre at line 349 sets size = layerP->lights.header.size). If pass 2 (requestMapVirtual) doesn't run, VirtualLayer.size remains at its default {16, 16, 1}, causing effects to allocate buffers and iterate over the wrong dimensions. Pass 2 must always run after pass 1 to ensure VirtualLayer gets the correct size from PhysicalLayer.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2025-12-07T14:16:03.228Z
Learning: VirtualLayer mappingTableSize is uint16_t with a maximum of 65535 LEDs. If a user defines a space where size.x * size.y * size.z exceeds 65535, the code accepts the overflow and performs mapping partially. This is an explicit design decision balancing performance and memory constraints.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-18T17:07:33.337Z
Learning: FastLED library does not support direct 4-byte-per-pixel (CRGBW) buffers. It always expects CRGB (3-byte) arrays and internally generates the W channel from RGB using .setRgbw() API. Casting a 4-byte RGBW buffer to CRGB* causes misalignment (W becomes next pixel's R). For MoonLight's FastLED driver with channelsPerLight=4: must either (1) repack RGB in-place before FastLED.show() then restore after, (2) use driver-specific channelsPerLight=3 layout, or (3) allocate temporary RGB-only buffer. Option 1 (in-place repacking) is best for ESP32-D0 memory constraints.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 109
File: src/MoonLight/Layers/VirtualLayer.h:160-167
Timestamp: 2026-01-19T10:55:50.356Z
Learning: In MoonLight RGBW channel architecture: offsetRGBW is the absolute channel offset where the RGBW block starts (0 for simple LED strips, non-zero for moving heads). offsetRed, offsetGreen, offsetBlue, and offsetWhite are relative offsets WITHIN the RGBW block. The actual channel position is computed as offsetRGBW + offset{Red|Green|Blue|White}. This two-level offset system allows simple LED strips (offsetRGBW=0) and complex moving heads (offsetRGBW=10+) to use the same color-order definitions (offsetRed/Green/Blue/White).

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 104
File: src/MoonLight/Nodes/Drivers/parlio.cpp:409-414
Timestamp: 2026-01-13T08:32:57.101Z
Learning: In parlio.cpp's show_parlio function, the `components` parameter (color channels per pixel: 3 for RGB, 4 for RGBW) is guaranteed by API contract to never be 0. Invalid values are considered caller errors, and defensive checks are intentionally omitted for performance. This matches the design philosophy of placing responsibility on callers in performance-critical paths.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 0
File: :0-0
Timestamp: 2026-01-15T15:13:55.357Z
Learning: In MoonLight effectTask (src/main.cpp): The memcpy(channelsE, channelsD) at line 137 happens AFTER releasing swapMutex, allowing driverTask to swap the channelsD/channelsE pointers mid-copy, causing buffer corruption. This specifically breaks effects like GameOfLife that use getRGB() to read previous-frame data for blending decisions. Other effects work because they only read current-frame data they just wrote. Fix: hold swapMutex during the entire memcpy operation to prevent pointer swap during copy.

Learnt from: ewowi
Repo: MoonModules/MoonLight PR: 84
File: src/MoonLight/Modules/ModuleLightsControl.h:367-379
Timestamp: 2025-12-28T10:28:24.157Z
Learning: In MoonLight PR `#84`: wrapping effectTask's layerP.loop() with monitorMutex while reading channelsE for monitor emission is thread-safe and prevents partial frames, but introduces serialization between effects rendering and monitor emission. Reading from channelsD (driver/front buffer) instead—with monitorMutex held only during the brief pointer swap—is superior because it maintains full parallelism (effects ║ monitor ║ driver), shows actual LED output, and minimizes mutex hold time.

@ewowi ewowi merged commit 61b2fbb into main Jan 19, 2026
2 checks passed
@ewowi ewowi deleted the channel-reorg branch January 19, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants