Commit 1180868
authored
## Description
Basically the same idea as #58659
So `Unique` aggregator uses `pyarrow.compute.unique` function
internally. This doesn't work with non-hashable types like lists.
Similar to what I did for `ApproximateTopK`, we now use pickle to
serialize and deserialize elements.
Other improvements:
- `ignore_nulls` flag didn't work at all. This flag now properly works
- Had to force `ignore_nulls=False` for datasets `unique` api for
backwards compatibility (we set `ignore_nulls` to `True` by default, so
behavior for datasets `unique` api will change now that `ignore_nulls`
actually works)
## Related issues
This PR replaces #58538
## Additional information
[Design doc on my
notion](https://www.notion.so/kyuds/Unique-Aggregator-Improvements-2b67a80e48eb80de9820edf9d4996e0a?source=copy_link)
---------
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Signed-off-by: kyuds <kyuseung1016@gmail.com>
1 parent 456d190 commit 1180868
File tree
4 files changed
+98
-7
lines changed- python/ray/data
- tests
4 files changed
+98
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
935 | 936 | | |
936 | 937 | | |
937 | 938 | | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
938 | 943 | | |
939 | 944 | | |
940 | 945 | | |
941 | 946 | | |
942 | 947 | | |
943 | 948 | | |
944 | 949 | | |
| 950 | + | |
945 | 951 | | |
946 | 952 | | |
947 | 953 | | |
948 | 954 | | |
949 | 955 | | |
950 | 956 | | |
951 | 957 | | |
| 958 | + | |
952 | 959 | | |
953 | 960 | | |
954 | 961 | | |
955 | 962 | | |
956 | 963 | | |
957 | | - | |
958 | | - | |
959 | 964 | | |
960 | | - | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
961 | 983 | | |
962 | 984 | | |
963 | 985 | | |
964 | 986 | | |
965 | 987 | | |
966 | 988 | | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
967 | 993 | | |
968 | 994 | | |
969 | 995 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2963 | 2963 | | |
2964 | 2964 | | |
2965 | 2965 | | |
2966 | | - | |
| 2966 | + | |
2967 | 2967 | | |
2968 | 2968 | | |
2969 | 2969 | | |
| |||
2986 | 2986 | | |
2987 | 2987 | | |
2988 | 2988 | | |
2989 | | - | |
| 2989 | + | |
2990 | 2990 | | |
2991 | 2991 | | |
2992 | 2992 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
1 | 3 | | |
2 | 4 | | |
3 | 5 | | |
| |||
6 | 8 | | |
7 | 9 | | |
8 | 10 | | |
| 11 | + | |
9 | 12 | | |
10 | 13 | | |
11 | 14 | | |
| |||
496 | 499 | | |
497 | 500 | | |
498 | 501 | | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
499 | 564 | | |
500 | 565 | | |
501 | 566 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
644 | 644 | | |
645 | 645 | | |
646 | 646 | | |
647 | | - | |
| 647 | + | |
648 | 648 | | |
649 | 649 | | |
650 | 650 | | |
| |||
751 | 751 | | |
752 | 752 | | |
753 | 753 | | |
754 | | - | |
| 754 | + | |
755 | 755 | | |
756 | 756 | | |
757 | 757 | | |
| |||
0 commit comments