File tree Expand file tree Collapse file tree 3 files changed +59
-5
lines changed
docs/source/user-guide/dataframe Expand file tree Collapse file tree 3 files changed +59
-5
lines changed Original file line number Diff line number Diff line change @@ -126,6 +126,44 @@ DataFusion's DataFrame API offers a wide range of operations:
126126 # Drop columns
127127 df = df.drop(" temporary_column" )
128128
129+ String Columns and Expressions
130+ ------------------------------
131+
132+ Some ``DataFrame `` methods accept plain strings when an argument refers to an
133+ existing column. These include:
134+
135+ * :py:meth: `~datafusion.DataFrame.select `
136+ * :py:meth: `~datafusion.DataFrame.sort `
137+ * :py:meth: `~datafusion.DataFrame.drop `
138+ * :py:meth: `~datafusion.DataFrame.join ` (``on `` argument)
139+ * :py:meth: `~datafusion.DataFrame.aggregate ` (grouping columns)
140+
141+ For such methods, you can pass column names directly:
142+
143+ .. code-block :: python
144+
145+ df.sort(' id' )
146+
147+ The same operation can also be written with an explicit column expression:
148+
149+ .. code-block :: python
150+
151+ from datafusion import col
152+ df.sort(col(' id' ))
153+
154+ Whenever an argument represents an expression—such as in
155+ :py:meth: `~datafusion.DataFrame.filter ` or
156+ :py:meth: `~datafusion.DataFrame.with_column `—use ``col() `` to reference columns
157+ and wrap constant values with ``lit() `` (also available as ``literal() ``):
158+
159+ .. code-block :: python
160+
161+ from datafusion import col, lit
162+ df.filter(col(' age' ) > lit(21 ))
163+
164+ Without ``lit() `` DataFusion would treat ``21 `` as a column name rather than a
165+ constant value.
166+
129167Terminal Operations
130168-------------------
131169
Original file line number Diff line number Diff line change @@ -521,19 +521,22 @@ def aggregate(
521521 aggs = [e .expr for e in aggs ]
522522 return DataFrame (self .df .aggregate (group_by , aggs ))
523523
524- def sort (self , * exprs : Expr | SortExpr ) -> DataFrame :
525- """Sort the DataFrame by the specified sorting expressions.
524+ def sort (self , * exprs : Expr | SortExpr | str ) -> DataFrame :
525+ """Sort the DataFrame by the specified sorting expressions or column names .
526526
527527 Note that any expression can be turned into a sort expression by
528- calling its` ``sort`` method.
528+ calling its ``sort`` method.
529529
530530 Args:
531- exprs: Sort expressions, applied in order.
531+ exprs: Sort expressions or column names , applied in order.
532532
533533 Returns:
534534 DataFrame after sorting.
535535 """
536- exprs_raw = [sort_or_default (expr ) for expr in exprs ]
536+ exprs_raw = []
537+ for expr in exprs :
538+ expr_obj = Expr .column (expr ).sort () if isinstance (expr , str ) else expr
539+ exprs_raw .append (sort_or_default (expr_obj ))
537540 return DataFrame (self .df .sort (* exprs_raw ))
538541
539542 def cast (self , mapping : dict [str , pa .DataType [Any ]]) -> DataFrame :
Original file line number Diff line number Diff line change @@ -268,6 +268,19 @@ def test_sort(df):
268268 assert table .to_pydict () == expected
269269
270270
271+ def test_sort_string_and_expression_equivalent (df ):
272+ from datafusion import col
273+
274+ result_str = df .sort ("a" ).to_pydict ()
275+ result_expr = df .sort (col ("a" )).to_pydict ()
276+ assert result_str == result_expr
277+
278+
279+ def test_filter_string_unsupported (df ):
280+ with pytest .raises (AttributeError ):
281+ df .filter ("a > 1" )
282+
283+
271284def test_drop (df ):
272285 df = df .drop ("c" )
273286
You can’t perform that action at this time.
0 commit comments