Skip to content

Commit bc20ec9

Browse files
New version after major refactoring
1 parent 5aa62af commit bc20ec9

File tree

9 files changed

+467
-132
lines changed

9 files changed

+467
-132
lines changed

README.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -12,27 +12,29 @@ python3 -m pip install multisort
1212
None
1313

1414
### Performance
15-
Average over 10 iterations with 500 rows.
15+
Average over 10 iterations with 1000 rows.
1616
Test | Secs
1717
---|---
18-
cmp_func|0.0054
19-
pandas|0.0061
20-
reversor|0.0149
21-
msorted|0.0179
18+
superfast|0.0005
19+
multisort|0.0035
20+
pandas|0.0079
21+
cmp_func|0.0138
22+
reversor|0.037
2223

23-
As you can see, if the `cmp_func` is by far the fastest methodology as long as the number of cells in the table are 500 rows for 5 columns. However for larger data sets, `pandas` is the performance winner and scales extremely well. In such large dataset cases, where performance is key, `pandas` should be the first choice.
24+
Hands down the fastest is the `superfast` methdology shown below. You do not need this library to accomplish this as its just core python.
2425

25-
The surprising thing from testing is that `cmp_func` far outperforms `reversor` which which is the only other methodology for multi-columnar sorting that can handle `NoneType` values.
26+
`multisort` from this library gives reasonable performance for large data sets; eg. its better than pandas up to about 5,500 records. It is also much simpler to read and write, and it has error handling that does its best to give useful error messages.
2627

2728
### Note on `NoneType` and sorting
28-
If your data may contain None, it would be wise to ensure your sort algorithm is tuned to handle them. This is because sorted uses `<` comparisons; which is not supported by `NoneType`. For example, the following error will result: `TypeError: '>' not supported between instances of 'NoneType' and 'str'`.
29+
If your data may contain None, it would be wise to ensure your sort algorithm is tuned to handle them. This is because sorted uses `<` comparisons; which is not supported by `NoneType`. For example, the following error will result: `TypeError: '>' not supported between instances of 'NoneType' and 'str'`. All examples given on this page are tuned to handle `None` values.
2930

3031
### Methodologies
3132
Method|Descr|Notes
3233
---|---|---
33-
cmp_func|Multi column sorting in the model `java.util.Comparator`|Fastest for small to medium size data
34-
reversor|Enable multi column sorting with column specific reverse sorting|Medium speed. [Source](https://stackoverflow.com/a/56842689/286807)
35-
msorted|Simple one-liner designed after `multisort` [example from python docs](https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts)|Slowest of the bunch but not by much
34+
multisort|Simple one-liner designed after `multisort` [example from python docs](https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts)|Second fastest of the bunch but most configurable and easy to read.
35+
cmp_func|Multi column sorting in the model `java.util.Comparator`|Reasonable speed|Enable multi column sorting with column specific reverse sorting|Medium speed. [Source](https://stackoverflow.com/a/56842689/286807)
36+
superfast|NoneType safe sample implementation of multi column sorting as mentioned in [example from python docs](https://docs.python.org/3/howto/sorting.html#sort-stability-and-complex-sorts)|Fastest by orders of magnitude but a bit more complex to write.
37+
3638

3739

3840

@@ -49,39 +51,39 @@ rows_dict = [
4951
]
5052
```
5153

52-
### `msorted`
54+
### `multisort`
5355
Sort rows_dict by _grade_, descending, then _attend_, ascending and put None first in results:
5456
```
55-
from multisort import msorted
56-
rows_sorted = msorted(rows_dict, [
57-
('grade', {'reverse': False, 'none_first': True})
57+
from multisort import multisort
58+
rows_sorted = multisort(rows_dict, [
59+
('grade', {'reverse': False})
5860
,'attend'
5961
])
6062
6163
```
62-
6364
Sort rows_dict by _grade_, descending, then _attend_ and call upper() for _grade_:
6465
```
65-
from multisort import msorted
66-
rows_sorted = msorted(rows_dict, [
67-
('grade', {'reverse': False, 'clean': lambda s:None if s is None else s.upper()})
66+
from multisort import multisort
67+
rows_sorted = multisort(rows_dict, [
68+
('grade', {'reverse': False, 'clean': lambda s: None if s is None else s.upper()})
6869
,'attend'
6970
])
7071
7172
```
72-
`msorted` parameters:
73+
`multisort` parameters:
7374
option|dtype|description
7475
---|---|---
7576
`key`|int or str|Key to access data. int for tuple or list
7677
`spec`|str, int, list|Sort specification. Can be as simple as a column key / index
7778
`reverse`|bool|Reverse order of final sort (defalt = False)
7879

79-
`msorted` `spec` options:
80+
`multisort` `spec` options:
8081
option|dtype|description
8182
---|---|---
8283
reverse|bool|Reverse sort of column
83-
clean|func|Function / lambda to clean the value
84-
none_first|bool|If True, None will be at top of sort. Default is False (bottom)
84+
clean|func|Function / lambda to clean the value. These calls can cause a significant slowdown.
85+
required|bool|Default True. If false, will substitute None or default if key not found (not applicable for list or tuple rows)
86+
default|any|Value to substitute if required==False and key does not exist or None is found. Can be used to achive similar functionality to pandas `na_position`
8587

8688

8789

@@ -134,7 +136,7 @@ rows_obj = [
134136
]
135137
```
136138

137-
### `msorted`
139+
### `multisort`
138140
(Same syntax as with 'dict' example)
139141

140142

@@ -177,11 +179,11 @@ rows_tuple = [
177179
(COL_IDX, COL_NAME, COL_GRADE, COL_ATTEND) = range(0,4)
178180
```
179181

180-
### `msorted`
182+
### `multisort`
181183
Sort rows_tuple by _grade_, descending, then _attend_, ascending and put None first in results:
182184
```
183-
from multisort import msorted
184-
rows_sorted = msorted(rows_tuple, [
185+
from multisort import multisort
186+
rows_sorted = multisort(rows_tuple, [
185187
(COL_GRADE, {'reverse': False, 'none_first': True})
186188
,COL_ATTEND
187189
])
@@ -218,6 +220,6 @@ rows_sorted = sorted(rows_tuple, key=cmp_func(cmp_student), reverse=True)
218220
### Tests / Samples
219221
Name|Descr|Other
220222
---|---|---
221-
tests/test_msorted.py|msorted unit tests|-
223+
tests/test_multisort.py|multisort unit tests|-
222224
tests/performance_tests.py|Tunable performance tests using asyncio | requires pandas
223225
tests/hand_test.py|Hand testing|-

dot.vscode/launch.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
// Use IntelliSense to learn about possible attributes.
3+
// Hover to view descriptions of existing attributes.
4+
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
5+
"version": "0.2.0",
6+
"configurations": [
7+
{
8+
"name": "Python: Current File",
9+
"type": "python",
10+
"request": "launch",
11+
"console": "integratedTerminal",
12+
"justMyCode": true,
13+
// "program": "tests/hand_test.py",
14+
// "program": "tests/performance_tests.py",
15+
// "program": "tests/perf_tests_2.py",
16+
"program": "tests/test_multisort.py",
17+
// "args": ["DictTests.test_list_of_dicts"]
18+
}
19+
]
20+
}

dot.vscode/settings.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"python.envFile": "${workspaceFolder}/dev.env",
3+
"python.linting.pylintEnabled": false,
4+
"python.linting.flake8Enabled": true,
5+
"python.linting.enabled": true
6+
}

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "multisort"
3-
version = "0.1.1"
3+
version = "0.1.2"
44
description = "NoneType Safe Multi Column Sorting For Python"
55
license = "MIT"
66
authors = ["Timothy C. Quinn"]

src/multisort/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
from .multisort import msorted, cmp_func, reversor
1+
from .multisort import multisort, cmp_func, reversor

src/multisort/multisort.py

Lines changed: 86 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
cmp_func = cmp_to_key
1212

1313

14-
# .: msorted :.
14+
# .: multisort :.
1515
# spec is a list one of the following
1616
# <key>
1717
# (<key>,)
@@ -21,70 +21,102 @@
2121
# <opts> dict. Options:
2222
# reverse: opt - reversed sort (defaults to False)
2323
# clean: opt - callback to clean / alter data in 'field'
24-
# none_first: opt - If True, None will be at top of sort. Default is False (bottom)
25-
class Comparator:
26-
@classmethod
27-
def new(cls, *args):
28-
if len(args) == 1 and isinstance(args[0], (int,str)):
29-
_c = Comparator(spec=args[0])
24+
def multisort(rows, spec, reverse:bool=False):
25+
key=clean=rows_sorted=default=None
26+
col_reverse=False
27+
required=True
28+
for s_c in reversed([spec] if isinstance(spec, (int, str)) else spec):
29+
if isinstance(s_c, (int, str)):
30+
key = s_c
3031
else:
31-
_c = Comparator(spec=args)
32-
return cmp_to_key(_c._compare_a_b)
32+
if len(s_c) == 1:
33+
key = s_c[0]
34+
elif len(s_c) == 2:
35+
key = s_c[0]
36+
s_opts = s_c[1]
37+
assert not s_opts is None and isinstance(s_opts, dict), f"Invalid Spec. Second value must be a dict. Got {getClassName(s_opts)}"
38+
col_reverse = s_opts.get('reverse', False)
39+
clean = s_opts.get('clean', None)
40+
default = s_opts.get('default', None)
41+
required = s_opts.get('required', True)
3342

34-
def __init__(self, spec):
35-
if isinstance(spec, (int, str)):
36-
self.spec = ( (spec, False, None, False), )
37-
else:
38-
a=[]
39-
for s_c in spec:
40-
if isinstance(s_c, (int, str)):
41-
a.append((s_c, None, None, False))
42-
else:
43-
assert isinstance(s_c, tuple) and len(s_c) in (1,2),\
44-
f"Invalid spec. Must have 1 or 2 params per record. Got: {s_c}"
45-
if len(s_c) == 1:
46-
a.append((s_c[0], None, None, False))
47-
elif len(s_c) == 2:
48-
s_opts = s_c[1]
49-
assert not s_opts is None and isinstance(s_opts, dict), f"Invalid Spec. Second value must be a dict. Got {getClassName(s_opts)}"
50-
a.append((s_c[0], s_opts.get('reverse', False), s_opts.get('clean', None), s_opts.get('none_first', False)))
51-
52-
self.spec = a
53-
54-
def _compare_a_b(self, a, b):
55-
if a is None: return 1
56-
if b is None: return -1
57-
for k, desc, clean, none_first in self.spec:
43+
def _sort_column(row): # Throws MSIndexError, MSKeyError
44+
ex1=None
5845
try:
5946
try:
60-
va = a[k]; vb = b[k]
47+
v = row[key]
6148
except Exception as ex:
62-
va = getattr(a, k); vb = getattr(b, k)
63-
64-
except Exception as ex:
65-
raise KeyError(f"Key {k} is not available in object(s) given a: {a.__class__.__name__}, b: {a.__class__.__name__}")
49+
ex1 = ex
50+
v = getattr(row, key)
51+
except Exception as ex2:
52+
if isinstance(row, (list, tuple)): # failfast for tuple / list
53+
raise MSIndexError(ex1.args[0], row, ex1)
6654

67-
if clean:
68-
va = clean(va)
69-
vb = clean(vb)
55+
elif required:
56+
raise MSKeyError(ex2.args[0], row, ex2)
7057

71-
if va != vb:
72-
if va is None: return -1 if none_first else 1
73-
if vb is None: return 1 if none_first else -1
74-
if desc:
75-
return -1 if va > vb else 1
7658
else:
77-
return 1 if va > vb else -1
59+
if default is None:
60+
v = None
61+
else:
62+
v = default
63+
64+
if default:
65+
if v is None: return default
66+
return clean(v) if clean else v
67+
else:
68+
if v is None: return True, None
69+
if clean: return False, clean(v)
70+
return False, v
71+
72+
try:
73+
if rows_sorted is None:
74+
rows_sorted = sorted(rows, key=_sort_column, reverse=col_reverse)
75+
else:
76+
rows_sorted.sort(key=_sort_column, reverse=col_reverse)
77+
78+
79+
except Exception as ex:
80+
msg=None
81+
row=None
82+
key_is_int=isinstance(key, int)
83+
84+
if isinstance(ex, MultiSortBaseExc):
85+
row = ex.row
86+
if isinstance(ex, MSIndexError):
87+
msg = f"Invalid index for {row.__class__.__name__} row of length {len(row)}. Row: {row}"
88+
else: # MSKeyError
89+
msg = f"Invalid key/property for row of type {row.__class__.__name__}. Row: {row}"
90+
else:
91+
msg = ex.args[0]
92+
93+
raise MultiSortError(f"""Sort failed on key {"int" if key_is_int else "str '"}{key}{'' if key_is_int else "' "}. {msg}""", row, ex)
94+
95+
96+
return reversed(rows_sorted) if reverse else rows_sorted
97+
7898

79-
return 0
99+
class MultiSortBaseExc(Exception):
100+
def __init__(self, msg, row, cause):
101+
self.message = msg
102+
self.row = row
103+
self.cause = cause
104+
105+
class MSIndexError(MultiSortBaseExc):
106+
def __init__(self, msg, row, cause):
107+
super(MSIndexError, self).__init__(msg, row, cause)
80108

109+
class MSKeyError(MultiSortBaseExc):
110+
def __init__(self, msg, row, cause):
111+
super(MSKeyError, self).__init__(msg, row, cause)
81112

82-
def msorted(rows, spec, reverse:bool=False):
83-
if isinstance(spec, (int, str)):
84-
_c = Comparator.new(spec)
85-
else:
86-
_c = Comparator.new(*spec)
87-
return sorted(rows, key=_c, reverse=reverse)
113+
class MultiSortError(MultiSortBaseExc):
114+
def __init__(self, msg, row, cause):
115+
super(MultiSortError, self).__init__(msg, row, cause)
116+
def __str__(self):
117+
return self.message
118+
def __repr__(self):
119+
return f"<MultiSortError> {self.__str__()}"
88120

89121
# For use in the multi column sorted syntax to sort by 'grade' and then 'attend' descending
90122
# dict example:

0 commit comments

Comments
 (0)