Skip to content

Commit d62beae

Browse files
revision of C struct
1 parent c83496c commit d62beae

File tree

1 file changed

+209
-18
lines changed

1 file changed

+209
-18
lines changed

include/binsparse/c_bindings/binsparse_matrix.h

Lines changed: 209 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -67,19 +67,63 @@ bc_type_code ;
6767
// pointer[k] index[k] Name and description
6868
// ---------- -------- --------------------
6969
//
70-
// NULL non-NULL "Index": some entries present.
70+
// NULL NULL "ELLPACK:4", because p is a simple stride [0 4 8 ...]
71+
// a possible extension. a 2D ELLPACK:4 matrix has
72+
// 4 entries in each row, in any columns.
73+
//
74+
// axis 0: nothing, name "L:4", one number: 4
75+
//
76+
// axis 1: index, [ size 40 ]
77+
//
78+
// . x . x . x . . . x
79+
// . . x . x . x x . .
80+
// x . x x . . x . . .
81+
// . x . . x . . x x .
82+
// . . . x . x . x x .
83+
// . . . . . x x x . x
84+
// . . . x x x x . . .
85+
// . x x . x x . . . .
86+
// . . . . . . x x x x
87+
// . . . x x x . x . .
88+
// x x x . . . . . x .
89+
//
90+
//
91+
// NULL non-NULL "Index": some entries present. (Erik: "Sparse", not compressed, coo)
7192
// indices need not be in order, nor unique.
7293
// size of index [k] array is nindex [k].
7394
// in_order [k] can be true or false.
7495
//
75-
// non-NULL non-NULL "Hyper": some entries present.
96+
// NULL non-NULL "Hyper_ELLPACK:4", because p is a simple stride [0 4 8 ...]
97+
//
98+
// rows: 0 2 5, each have 4 entries
99+
//
100+
// . x . x . x . . . x
101+
// . . . . . . . . . .
102+
// x . x x . . x . . .
103+
// . . . . . . . . . .
104+
// . . . . . . . . . .
105+
// . . . . . x x x . x
106+
// . . . . . . . . . .
107+
// . . . . . . . . . .
108+
// . . . . . . . . . .
109+
// . . . . . . . . . .
110+
// . . . . . . . . . .
111+
//
112+
// (hyper-ELL:4, index)
113+
//
114+
// axis 0: index = [0 2 5]
115+
//
116+
// axis 1: index = [ size 12 ]
117+
// any given row is empty, or has exactly 4 entries.
118+
//
119+
// non-NULL non-NULL "Hyper": some entries present. (Erik: "DC" "doubly compressed")
76120
// indices must be in order and unique.
77121
// index [k] has size nindex [k]
78122
// pointer [k] has size nindex [k]+1 and must be
79123
// monotonically non-decreasing.
80124
// in_order [k] must be true.
81125
//
82-
// non-NULL NULL "Sparse": all entries present.
126+
// non-NULL NULL "Sparse": all entries present. (Erik: "Compressed" or C)
83127
// pointer [k] has size dim [k]+1.
84128
// nindex [k] not used (or can be set to
85129
// dim [k] for consistency).
@@ -319,14 +363,153 @@ bc_type_code ;
319363
// axis, since all objects to the right have the same size.
320364
//
321365
// (5) Like rule 1, once "Index" appears, the remaining formats to the right
322-
// must be "Index" or "Full". This is because "Index" has no pointer so
323-
// all formats to the right must have a known size, or be a list like
324-
// (Index, Index, Full) where the total size is given nindex [...].
366+
// must be "Sparse, "Index" or "Full". This is because "Index" has no
367+
// pointer so all formats to the right must have a known size, or be a
368+
// list like (Index, Index, Full) where the total size is given nindex
369+
// [...]. "Sparse" has known size: it is the entire dimension.
370+
//
371+
// (6) (..., Hyper, Sparse, ...) can be defined but is not useful.
372+
// The same can be done with (..., Index, Full, ...) by just deleting
373+
// the pointer for the Hyper axis. The pointer vector contains a
374+
// list of constant stride (see below).
375+
376+
/*
377+
10-by-10-by-10: suppose the 1st dimension is empty except for 0,2,5
378+
suppose the axis order is 0,1,2 (all "by row")
379+
380+
axis 0: entry 0: a 2D matrix, containing 5 entries (say by row)
381+
. . . . . . . . . .
382+
. . x . . . . . . .
383+
. . . . . . . . . .
384+
. . . x x . x . . .
385+
. . . . . . . . . .
386+
. . . . . . . . . .
387+
. . . . . . x . . .
388+
. . . . . . . . . .
389+
. . . . . . . . . .
390+
. . . . . . . . . .
391+
392+
axis 0: entry 2: a 2D matrix, containing 7 entries
393+
. . . . . . . . . .
394+
. . . . . . . . . .
395+
. . . . . . . . . .
396+
. x . . . . . . x .
397+
. . . . . . . . . .
398+
. . . . . . . . . .
399+
. . . . . x x x x .
400+
. . . . . . . . . .
401+
. x . . . . . . . .
402+
. . . . . . . . . .
403+
404+
axis 0: entry 5: a 2D matrix, containing 3 entries
405+
. . . . . . . . . .
406+
x . . x . . . . . .
407+
. . . . . x . . . .
408+
. . . . . . . . . .
409+
. . . . . . . . . .
410+
. . . . . . . . . .
411+
. . . . . . . . . .
412+
. . . . . . . . . .
413+
. . . . . . . . . .
414+
. . . . . . . . . .
415+
416+
(Hyper, Sparse, Index): can be specified but has some useless info
417+
0 2D matrix, 10-by-10, CSR, 5 entries
418+
2 2D matrix, 10-by-10, CSR, 7 entries
419+
5 2D matrix, 10-by-10, CSR, 3 entries
420+
421+
axis0: index(0) = [0, 2, 5], pointer(0) = [0 10 20 31=end], len = 3
422+
Note that pointer(0) an array of size 3+1, is useless since
423+
the next axis is "Sparse" so each has fixed size (of 10 each)
424+
425+
axis1: pointer(1) = an array of size 31, since there are 3 objects
426+
in the axis0 dimension. Each object is a pointer of size 10
427+
plus one end marker.
428+
429+
pointer(1) = [ 0 1 1 1 4 4 4 5 5 5 5 5 5 5 7 7 7 12 12 12 12 14 15 15 15 15 15 15 15 15 ]
430+
431+
0 1 2 3 4 5 6 7 8 9 -
432+
. . . . . . . . . . 0 <= pointer for this 2D slice
433+
. . x . . . . . . . 1
434+
. . . . . . . . . . 1
435+
. . . x x . x . . . 1
436+
. . . . . . . . . . 4
437+
. . . . . . . . . . 4
438+
. . . . . . x . . . 4
439+
. . . . . . . . . . 5
440+
. . . . . . . . . . 5
441+
. . . . . . . . . . 5
442+
443+
0 1 2 3 4 5 6 7 8 9 -
444+
. . . . . . . . . . 5
445+
. . . . . . . . . . 5
446+
. . . . . . . . . . 5
447+
. x . . . . . . x . 5
448+
. . . . . . . . . . 7
449+
. . . . . . . . . . 7
450+
. . . . . x x x x . 7
451+
. . . . . . . . . . 11
452+
. x . . . . . . . . 12
453+
. . . . . . . . . . 12
454+
455+
0 1 2 3 4 5 6 7 8 9 -
456+
. . . . . . . . . . 12
457+
x . . x . . . . . . 12
458+
. . . . . x . . . . 14
459+
. . . . . . . . . . 15
460+
. . . . . . . . . . 15
461+
. . . . . . . . . . 15
462+
. . . . . . . . . . 15
463+
. . . . . . . . . . 15
464+
. . . . . . . . . . 15
465+
. . . . . . . . . . 15
466+
15 <= end marker
467+
468+
axis2: index(2) = [ 2 3 4 6 6 1 8 6 7 8 9 0 3 5]
469+
an array of size 15
470+
471+
10-by-10-by-10
472+
(Index, Sparse, Index)
473+
Erik: (S-C-S)
474+
0 2D matrix, 10-by-10, CSR, 5 entries
475+
2 2D matrix, 10-by-10, CSR, 7 entries
476+
5 2D matrix, 10-by-10, CSR, 3 entries
477+
478+
same as above, but drop pointer(0) as not needed. So
479+
this is better than (Hyper, Sparse, Index).
480+
481+
Consider duplicates:
482+
483+
10-by-10-by-10
484+
(Index, Sparse, Index): with duplicate in axis 0.
485+
Erik: (S-C-S)
486+
0 2D matrix, 10-by-10, CSR, 5 entries
487+
5 2D matrix, 10-by-10, CSR, 7 entries
488+
5 2D matrix, 10-by-10, CSR, 3 entries
489+
490+
Here, the A(5,:,:) matrix is specified twice, so
491+
A(5,:,:) is the sum of both 2D matrices, with a total
492+
of 7 to 10 entries. A dup operator can be specified,
493+
or implied.
494+
495+
consider a 10-by-20-by-30-by-40 tensor:
496+
497+
(Index, Index, Hyper, Index) ugly with hack: requires look-ahead,
498+
group order to indices. Not allowed in this proposed format.
499+
500+
(Index, Sparse, Hyper, Index) fine
501+
502+
(Index, Index, Sparse, Index) fine
503+
504+
etc.
505+
506+
(Sparse, Hyper, Index, Index) fine
507+
*/
325508

326509
/*
327510
LANGUAGE OF VALID FORMATS
328511
329-
These 5 rules lead to a simple finite-state machine that descibes the language
512+
These 6 rules lead to a simple finite-state machine that descibes the language
330513
of valid formats. The starting state (0th rank) can be any of the four
331514
formats. Each state has a self-loop (not shown). The end state of the
332515
language must be Index or Full.
@@ -337,14 +520,14 @@ language must be Index or Full.
337520
| fixed size
338521
339522
"Sparse" "Index"
340-
(pointer present -------------------> no pointer
523+
(pointer present <------------------> no pointer
341524
no index. index present
342525
size is size is
343-
dim [k] <---\ /---> nindex[k]
344-
\ \ / \
345-
\ \ / \
346-
\ \ / \
347-
\ \ / \
526+
dim [k] /---> nindex[k]
527+
\ / \
528+
\ / \
529+
\ / \
530+
\ / \
348531
\ "Hyper" / ---> "Full"
349532
\-----> (both pointer no pointer
350533
and index. no index
@@ -356,34 +539,42 @@ language must be Index or Full.
356539
NO INDEX | INDEX IS PRESENT | NO INDEX
357540
must be | in order if axis[k].in_order | must be
358541
in order | is true, unordered if false | in order
542+
but I would say "Hyper" must
543+
be in order with no duplicates.
359544
360545
361546
That is, the format can start with any mix of Sparse and/or Hyper (or none of
362547
them), in any order. These formats have pointers so the size of the objects to
363548
the right of them can vary in size.
364549
365-
The Sparse and Hyper formats have a pointer, so the objects they describe to
366-
the right of them in axis k+1 have variable sizes.
550+
The Sparse and Hyper formats have a pointer, so if the axis k is Sparse or Hyper,
551+
the objects they describe to the right of them in axis k+1 can have variable
552+
sizes (any format, but only Hyper has variable size).
367553
368554
The Index and Full formats have no pointer, so the objects they describe
369-
in their axes and the axes to the right of them must have a fixed size.
555+
in their axes and the next axis to the right of them must have a fixed size
556+
(that is, Sparse, Index, or Full, but not Hyper).
370557
371558
The Sparse and Full formats have no index, so their own size must be dim [k]
372559
if they describe the kth axis. "Sparse" is short-hand for a dense list of
373560
objects, each of variable size. "Full" is short-hand for a dense list of
374561
objects of fixed size.
375562
563+
Regarding duplicates/out-of-order: I think only the Index type of axis
564+
should allow for duplicates and out-of-order indices. Duplicates are
565+
meant to be summed, in any axis.
566+
376567
*/
377568

378569
// rank = 3
379570
//
380-
// describe some for future extensions. 12 possible formats:
571+
// possible formats:
381572

382573
// (Index , Index , Index) all COO
574+
// (Index , Sparse, Index) 1D list of 2D CSR/CSC matrices
383575

384576
// (Hyper , Index , Index) 1D hyperlist of 2D COO matrices
385577
// (Hyper , Hyper , Index) 1D hyperlist of 2D hypersparse mtx
386-
// (Hyper , Sparse, Index) 1D hyperlist of 2D CSR/CSC matrices
387578

388579
// (Sparse, Index , Index) 1D dense array of 2D COO matrices
389580
// (Sparse, Hyper , Index) 1D dense array of 2D hypersparse

0 commit comments

Comments
 (0)