You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user-guide/common-operations/joins.rst
+7-2Lines changed: 7 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,6 +108,8 @@ Disambiguating Columns
108
108
109
109
When the join key exists in both DataFrames under the same name, the result contains two columns with that name. Assign a name to each DataFrame to use as a prefix and avoid ambiguity.
110
110
111
+
When you create a DataFrame with a ``name`` argument, that name is used as a prefix in ``col("name.column")`` to reference specific columns.
112
+
111
113
.. ipython:: python
112
114
113
115
from datafusion import col
@@ -116,7 +118,9 @@ When the join key exists in both DataFrames under the same name, the result cont
116
118
joined = left.join(right, on="id")
117
119
joined.select(col("l.id"), col("r.id"))
118
120
119
-
You can remove the duplicate column after joining.
121
+
Note that the columns in the result appear in the same order as specified in the ``select()`` call.
122
+
123
+
You can remove the duplicate column after joining. Note that ``drop()`` returns a new DataFrame (DataFusion's API is immutable).
120
124
121
125
.. ipython:: python
122
126
@@ -126,7 +130,8 @@ Automatic Deduplication
126
130
----------------------
127
131
128
132
Use the ``deduplicate`` argument of :py:meth:`DataFrame.join` to automatically
129
-
drop the duplicate join column from the right DataFrame.
133
+
drop the duplicate join column from the right DataFrame. Unlike PySpark which uses a ``_`` suffix by default,
134
+
DataFusion uses the ``__right_<col>`` naming convention for conflicting columns when not using deduplication.
0 commit comments