Skip to content

let's talk about encodings #21

@dankamongmen

Description

@dankamongmen

What I see of the payload description here worries me.

I have found in the year's Notcurses testing that performance in the large is absolutely dominated by the number of bytes transmitted. This is magnified when the application generating the data is remote. I collect a good amount of data on how many bytes are transmitted. Here's notcurses-demo's xray demo, which plays a video scaled to the terminal size:

Kitty

[schwarzgerat](0) $ ./notcurses-demo -p ../data/ x -d0

 notcurses 2.3.4 by nick black et al on Kitty
  70 rows (20px) 80 cols (10px) (87.50KiB) 48B crend 256 colors+RGB
  compiled with gcc-10.2.1 20210110, 16B little-endian cells
  terminfo from ncurses 6.2.20201114
  avformat 58.79.100 avutil 56.74.100 swscale 5.10.100

490 renders, 72.00ms (81.60µs min, 146.94µs avg, 202.88µs max)
490 rasters, 30.39ms (26.76µs min, 62.02µs avg, 74.57µs max)
490 writes, 9.46s (22.94µs min, 19.30ms avg, 40.38ms max)
892.36MiB (25B min, 1.82MiB avg, 1.84MiB max)
0 failed renders, 0 failed rasters, 0 refreshes
RGB emits:elides: def 950:106400 fg 6894:104161 bg 883:7785
Cell emits:elides: 116018/5371982 (97.89%) 99.12% 93.79% 89.81%
Sprixel emits:elides: 486/0 (0.00%)

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
 1│    xray│  16.07s│    486│ 892.35Mi│   30.2│ 0│ 0│58│  50.83║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
              16.07s│    486│ 892.35Mi│
[schwarzgerat](0) $ 

Alacritty

[schwarzgerat](0) $ ./notcurses-demo -p ../data/ x -d0

 notcurses 2.3.4 by nick black et al on Alacritty
  70 rows (20px) 80 cols (10px) (87.50KiB) 48B crend 256 colors+RGB
  compiled with gcc-10.2.1 20210110, 16B little-endian cells
  terminfo from ncurses 6.2.20201114
  avformat 58.79.100 avutil 56.74.100 swscale 5.10.100

490 renders, 63.35ms (68.76µs min, 129.28µs avg, 228.41µs max)
490 rasters, 27.58ms (22.51µs min, 56.28µs avg, 115.39µs max)
490 writes, 457.11ms (20.04µs min, 932.87µs avg, 2.63ms max)
18.44MiB (25B min, 38.54KiB avg, 83.46KiB max)
0 failed renders, 0 failed rasters, 0 refreshes
RGB emits:elides: def 956:106394 fg 6919:104274 bg 911:7895
Cell emits:elides: 116156/5371844 (97.88%) 99.11% 93.78% 89.65%
Sprixel emits:elides: 486/0 (0.00%)

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
 1│    xray│  16.37s│    486│  18.44Mi│   29.7│ 0│ 0│ 2│ 887.90║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
              16.37s│    486│  18.44Mi│
[schwarzgerat](0) $

Note that the Alacritty notcurses-demo was idle 98% of the time, while the Kitty one was idle 41% of the time. This is directly due to the 874MB transmitted to Kitty that wasn't sent to Alacritty; we sent Kitty almost 50x(!) the total amount of data, due to chunked, Base64-encoded, RGBA data as opposed to raw palette-indexed data with elided transparent pixels. Note that the Notcurses sixel implementation derives an entirely new palette for each frame of multiframe data.

The source video is about 1MB:

[schwarzgerat](0) $ ffprobe ../data/notcursesIII.mkv 
ffprobe version N-102049-gcc7943e803 Copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 10 (Debian 10.2.1-6)
  configuration: 
  libavutil      56. 74.100 / 56. 74.100
  libavcodec     58.137.100 / 58.137.100
  libavformat    58. 79.100 / 58. 79.100
  libavdevice    58. 14.100 / 58. 14.100
  libavfilter     7.111.100 /  7.111.100
  libswscale      5. 10.100 /  5. 10.100
  libswresample   3. 10.100 /  3. 10.100
Input #0, matroska,webm, from '../data/notcursesIII.mkv':
  Metadata:
    title           : Untitled Project
    creation_time   : 2021-03-20T16:07:17.000000Z
    ENCODER         : Lavf58.45.100
  Duration: 00:00:16.92, start: 0.000000, bitrate: 495 kb/s
  Stream #0:0: Video: hevc (Main), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn, 30 tbc (default)
    Metadata:
      DURATION        : 00:00:16.911000000
  Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp (default)
    Metadata:
      title           : Stereo
      DURATION        : 00:00:16.922000000
[schwarzgerat](0) $ 

so we have a 1920x1080 video at 1MB, becoming 18MB on a 800x1417 Alacritty, becoming 892MB on a 809x1417 Kitty.

I find this kinda disgusting, and would very much like to see it come down.

The most effective way to transmit visual data is of course using the advanced compression schemes of image-specific formats. So it's great that we're supporting containerized data, to be decoded by the terminal. I'd like to see a focus on WebP over PNG, but whatever. With WebP, we get a format that works well for both PNG-style and JPEG-style data, with transparency and other useful features. But that's neither here nor there.

Sometimes raw data is important, especially when modifying input data on the fly, and it would be desirable to transmit this in as little space as possible. This was probably the major focus of my STEGAP proposal.

Is avoiding C1 truly necessary, especially when this is a payload, especially when the size is transmitted, especially when we're in UTF8 mode anyway? It seems to be that this protocol is a sufficiently large endeavor that one's state machine can take a small tweak. If we can stomp on everything but 0x1B, we can get close to no overhead (see link). If we can stomp on 0x1B, that's even better.

Another issue I'd like to bring up is a bit unorthodox, and maybe this isn't important if we have a sufficient z-indexing/layering story: rather than using pure row-major order across the entirety of the graphic, it would be convenient for me (and maybe for terminal authors) to transmit row-major order within a cell's area. I.e. if the cell-pixel geometry is 20 tall, 10 wide, and i have a graphic that is 50 tall and 20 wide, i'd like to transmit:

  • 20 rows of 10 pixels (0, 0) - (9, 19)
  • 20 rows of 10 pixels (10, 0) - (19, 19)
  • 20 rows of 10 pixels (0, 20) - (9, 39)
  • 20 rows of 10 pixels (10, 20) - (19, 39)
  • 10 rows of 10 pixels (0, 40) - (9, 49)
  • 10 rows of 10 pixels (10, 40) - (19, 49)

why? because here's my methodology:

  • get input and convert it to a standalone RGBA matrix (ncvisual_from_*())
  • optionally scale the RGBA (ncvisual_render())
  • march through row-major image
    • at each pixel, determine what cell we're in
    • look at TAM (transparency-annihilation matrix) to see if our cell is "cut out"
    • if we're cut out, encode a 0 alpha / transparent pixel in this pixel's place
    • encode to kitty/sixel (row-major byte stream; indexing into this byte stream is complex)
  • at some point later, this glyph is written out ("rasterized") as part of a frame

now, maybe in the next frame, i have a cell i need cut out from the graphic (again, z-indexing/layering can let me work around the need to do so). if i have a row-major byte stream (or even worse, a row-major sixel stream), this is something of an ungodly mess. i mean, i can do it--i have code to do it--but it's a ton of in-place editing and mangling:

if i was transmitting cell-major, and then row-major within that, i could blast or rebuild one region at a time. given that we've already encoded to an arbitrary byte stream at this point, we needn't worry about what would otherwise be cache-suboptimal behavior (if we encode to cell-major, we work cell-major).

Thoughts?

tl;dr: base64 encoding our RGBA is not free, not one bit

Just to see how graphics tend to dominate TUI bandwidth, even when lightly used, here's full sample output from notcurses-demo. The only demos making extensive use of bitmaps are xray, view, and yield:

Alacritty

[schwarzgerat](0) $ ./notcurses-demo -p ../data/ -d0

 notcurses 2.3.4 by nick black et al on Alacritty
  70 rows (20px) 80 cols (10px) (87.50KiB) 48B crend 256 colors+RGB
  compiled with gcc-10.2.1 20210110, 16B little-endian cells
  terminfo from ncurses 6.2.20201114
  avformat 58.79.100 avutil 56.74.100 swscale 5.10.100

9747 renders, 1.20s (68.52µs min, 122.78µs avg, 23.15ms max)
9747 rasters, 363.70ms (22.44µs min, 37.31µs avg, 137.04µs max)
9747 writes, 11.43s (19.55µs min, 1.17ms avg, 58.39ms max)
412.90MiB (12B min, 43.38KiB avg, 1.02MiB max)
0 failed renders, 0 failed rasters, 14 refreshes
RGB emits:elides: def 59919:771879 fg 7130908:4164985 bg 8410862:2598805
Cell emits:elides: 11825686/97022661 (89.14%) 92.80% 36.87% 23.60%
Sprixel emits:elides: 1991/130 (6.13%)

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
 1│   intro│193.25ms│     41│   2.57Mi│  212.2│ 5│ 0│51│ 366.55║
 2│    xray│  16.64s│    486│  18.44Mi│   29.2│ 0│ 0│ 2│ 886.44║
 3│   eagle│   1.30s│    159│  14.91Mi│  122.3│ 1│ 0│21│ 513.32║
 4│   trans│  4.42ms│      9│ 156.95Ki│ 2038.3│23│ 6│51│  2.53K║
 5│  normal│118.67ms│    136│   2.85Mi│ 1146.0│14│ 3│38│  2.03K║
 6│  chunli│ 49.30ms│     53│   1.54Mi│ 1075.1│14│ 3│43│  1.78K║
 7│ highcon│603.78ms│   2760│ 677.62Ki│ 4571.2│52│21│10│  5.41K║
 8│  dragon│ 19.23ms│     17│  84.05Ki│  884.2│ 7│ 4│ 6│  4.81K║
 9│mojibake│212.38ms│    471│   9.46Mi│ 2217.7│25│ 7│58│  2.42K║
10│     box│307.04ms│    100│   8.85Mi│  325.7│17│ 1│64│ 392.56║
11│  keller│516.82ms│     24│   3.32Mi│   46.4│ 3│ 0│32│ 128.44║
12│   yield│  23.53s│    209│  58.35Mi│    8.9│ 0│ 0│20│  43.55║
13│    grid│   1.94s│    768│ 126.93Mi│  396.8│ 4│ 0│81│ 459.38║
14│ animate│421.20ms│   1668│   2.78Mi│ 3960.1│42│12│14│  5.71K║
15│    reel│  5.49ms│      1│  85.93Ki│  182.2│ 2│ 0│36│ 460.72║
16│whiteout│  9.03ms│     14│ 122.18Ki│ 1550.6│18│ 5│19│  3.55K║
17│uniblock│ 29.01ms│     70│ 870.82Ki│ 2413.0│27│ 9│23│  4.07K║
18│    view│  12.57s│    753│ 119.95Mi│   59.9│ 0│ 0│23│ 245.28║
19│   luigi│ 42.09ms│    113│   1.73Mi│ 2684.8│31│ 8│45│  3.16K║
20│ sliders│ 45.84ms│    201│ 489.23Ki│ 4384.7│50│13│17│  5.40K║
21│ fallin'│118.74ms│    415│   2.41Mi│ 3495.1│42│10│24│  4.47K║
22│  jungle│ 38.81ms│      4│ 104.75Ki│  103.1│ 1│ 0│11│ 750.61║
23│  qrcode│   1.28s│   1024│  18.30Mi│  799.6│ 8│ 2│17│  2.81K║
24│     zoo│  8.42ms│      3│ 170.70Ki│  356.5│ 4│ 1│34│ 879.41║
25│   outro│   3.31s│    244│  17.81Mi│   73.6│ 0│ 0│10│ 647.73║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
              63.31s│   9743│ 412.89Mi│
[schwarzgerat](0) $

these three demos were responsible for ~48% of the bytes, but only 14.7% of the frames.

Kitty


[schwarzgerat](0) $ ./notcurses-demo -p ../data/ -d0

 notcurses 2.3.4 by nick black et al on Kitty
  70 rows (20px) 80 cols (10px) (87.50KiB) 48B crend 256 colors+RGB
  compiled with gcc-10.2.1 20210110, 16B little-endian cells
  terminfo from ncurses 6.2.20201114
  avformat 58.79.100 avutil 56.74.100 swscale 5.10.100

9745 renders, 1.17s (69.03µs min, 120.19µs avg, 1.75ms max)
9745 rasters, 392.57ms (22.50µs min, 40.28µs avg, 125.58µs max)
9745 writes, 19.99s (20.08µs min, 2.05ms avg, 179.71ms max)
1.90GiB (6B min, 204.64KiB avg, 5.55MiB max)
0 failed renders, 0 failed rasters, 14 refreshes
RGB emits:elides: def 60562:765349 fg 7365396:4020342 bg 8275773:2814661
Cell emits:elides: 11901150/96871498 (89.06%) 92.67% 35.31% 25.38%
Sprixel emits:elides: 1654/124 (6.97%)

             runtime│ frames│output(B)│    FPS│%r│%a│%w│TheoFPS║
══╤════════╤════════╪═══════╪═════════╪═══════╪══╪══╪══╪═══════╣
 1│   intro│240.41ms│     41│  12.64Mi│  170.5│ 3│ 0│83│ 194.84║
 2│    xray│  16.53s│    486│ 892.36Mi│   29.4│ 0│ 0│56│  51.77║
 3│   eagle│   1.14s│    159│  14.83Mi│  139.3│ 1│ 0│11│  1.03K║
 4│   trans│  5.20ms│      9│ 146.03Ki│ 1729.4│19│ 5│56│  2.11K║
 5│  normal│128.03ms│    136│   2.87Mi│ 1062.3│13│ 3│39│  1.86K║
 6│  chunli│ 54.96ms│     53│   1.54Mi│  964.3│12│ 3│49│  1.48K║
 7│ highcon│605.88ms│   2760│ 665.68Ki│ 4555.4│51│22│10│  5.39K║
 8│  dragon│ 24.71ms│     17│  86.11Ki│  688.1│ 5│ 3│ 4│  5.21K║
 9│mojibake│485.05ms│    471│   9.75Mi│  971.0│12│ 3│79│  1.01K║
10│     box│490.91ms│    100│  28.29Mi│  203.7│ 3│ 0│83│ 231.49║
11│  keller│263.80ms│     24│  11.02Mi│   91.0│ 1│ 0│48│ 181.65║
12│   yield│   8.77s│    197│ 549.28Mi│   22.5│ 0│ 0│57│  38.59║
13│    grid│   2.55s│    768│ 127.00Mi│  300.6│ 4│ 0│81│ 346.48║
14│ animate│440.58ms│   1668│   2.78Mi│ 3785.9│41│12│16│  5.32K║
15│    reel│  4.16ms│      1│  85.64Ki│  240.2│ 3│ 0│13│  1.40K║
16│whiteout│ 11.28ms│     14│ 123.10Ki│ 1240.9│14│ 4│36│  2.24K║
17│uniblock│ 28.95ms│     70│ 890.58Ki│ 2417.6│26│ 8│24│  4.00K║
18│    view│  10.78s│    753│ 251.32Mi│   69.9│ 0│ 0│14│ 443.48║
19│   luigi│ 60.38ms│    113│   1.76Mi│ 1871.6│23│ 6│47│  2.41K║
20│ sliders│ 49.50ms│    201│ 487.45Ki│ 4060.6│46│13│23│  4.91K║
21│ fallin'│142.52ms│    425│   2.47Mi│ 2982.1│36│ 9│30│  3.85K║
22│  jungle│ 37.45ms│      4│ 104.42Ki│  106.8│ 1│ 0│ 7│  1.13K║
23│  qrcode│   1.31s│   1024│  18.36Mi│  783.5│ 8│ 2│19│  2.61K║
24│     zoo│ 13.05ms│      3│ 171.83Ki│  230.0│ 2│ 0│ 8│  1.95K║
25│   outro│   4.07s│    244│  18.53Mi│   60.0│ 0│ 0│ 3│  1.31K║
══╧════════╧════════╪═══════╪═════════╪═══════╧══╧══╧══╧═══════╝
              48.23s│   9741│   1.90Gi│
[schwarzgerat](0) $ 

here, these three demos accounted for 89%(!) of transmitted bytes, but only 14.7% of the frames.

i hope that this demonstrates the value of cutting down image bytes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions