Skip to content

Commit 0658c43

Browse files
committed
Revert "revert branch UNPICK"
This reverts commit 0edb249.
1 parent 0edb249 commit 0658c43

File tree

3 files changed

+59
-5
lines changed

3 files changed

+59
-5
lines changed

docs/source/user-guide/dataframe/index.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,44 @@ DataFusion's DataFrame API offers a wide range of operations:
126126
# Drop columns
127127
df = df.drop("temporary_column")
128128
129+
String Columns and Expressions
130+
------------------------------
131+
132+
Some ``DataFrame`` methods accept plain strings when an argument refers to an
133+
existing column. These include:
134+
135+
* :py:meth:`~datafusion.DataFrame.select`
136+
* :py:meth:`~datafusion.DataFrame.sort`
137+
* :py:meth:`~datafusion.DataFrame.drop`
138+
* :py:meth:`~datafusion.DataFrame.join` (``on`` argument)
139+
* :py:meth:`~datafusion.DataFrame.aggregate` (grouping columns)
140+
141+
For such methods, you can pass column names directly:
142+
143+
.. code-block:: python
144+
145+
df.sort('id')
146+
147+
The same operation can also be written with an explicit column expression:
148+
149+
.. code-block:: python
150+
151+
from datafusion import col
152+
df.sort(col('id'))
153+
154+
Whenever an argument represents an expression—such as in
155+
:py:meth:`~datafusion.DataFrame.filter` or
156+
:py:meth:`~datafusion.DataFrame.with_column`—use ``col()`` to reference columns
157+
and wrap constant values with ``lit()`` (also available as ``literal()``):
158+
159+
.. code-block:: python
160+
161+
from datafusion import col, lit
162+
df.filter(col('age') > lit(21))
163+
164+
Without ``lit()`` DataFusion would treat ``21`` as a column name rather than a
165+
constant value.
166+
129167
Terminal Operations
130168
-------------------
131169

python/datafusion/dataframe.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -521,19 +521,22 @@ def aggregate(
521521
aggs = [e.expr for e in aggs]
522522
return DataFrame(self.df.aggregate(group_by, aggs))
523523

524-
def sort(self, *exprs: Expr | SortExpr) -> DataFrame:
525-
"""Sort the DataFrame by the specified sorting expressions.
524+
def sort(self, *exprs: Expr | SortExpr | str) -> DataFrame:
525+
"""Sort the DataFrame by the specified sorting expressions or column names.
526526
527527
Note that any expression can be turned into a sort expression by
528-
calling its` ``sort`` method.
528+
calling its ``sort`` method.
529529
530530
Args:
531-
exprs: Sort expressions, applied in order.
531+
exprs: Sort expressions or column names, applied in order.
532532
533533
Returns:
534534
DataFrame after sorting.
535535
"""
536-
exprs_raw = [sort_or_default(expr) for expr in exprs]
536+
exprs_raw = []
537+
for expr in exprs:
538+
expr_obj = Expr.column(expr).sort() if isinstance(expr, str) else expr
539+
exprs_raw.append(sort_or_default(expr_obj))
537540
return DataFrame(self.df.sort(*exprs_raw))
538541

539542
def cast(self, mapping: dict[str, pa.DataType[Any]]) -> DataFrame:

python/tests/test_dataframe.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,19 @@ def test_sort(df):
268268
assert table.to_pydict() == expected
269269

270270

271+
def test_sort_string_and_expression_equivalent(df):
272+
from datafusion import col
273+
274+
result_str = df.sort("a").to_pydict()
275+
result_expr = df.sort(col("a")).to_pydict()
276+
assert result_str == result_expr
277+
278+
279+
def test_filter_string_unsupported(df):
280+
with pytest.raises(AttributeError):
281+
df.filter("a > 1")
282+
283+
271284
def test_drop(df):
272285
df = df.drop("c")
273286

0 commit comments

Comments
 (0)