Skip to content

Commit 616e8b3

Browse files
Merge branch 'frame_1' of https://github.com/zhangbowen-coder/pandas into frame_1
2 parents e45e123 + 7f94d67 commit 616e8b3

File tree

7 files changed

+81
-21
lines changed

7 files changed

+81
-21
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,8 @@ and how to adapt your code to the new default.
8080

8181
.. _whatsnew_300.enhancements.copy_on_write:
8282

83-
Copy-on-Write
84-
^^^^^^^^^^^^^
83+
Consistent copy/view behaviour with Copy-on-Write
84+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8585

8686
The new "copy-on-write" behaviour in pandas 3.0 brings changes in behavior in
8787
how pandas operates with respect to copies and views. A summary of the changes:
@@ -101,9 +101,10 @@ copy or a view depended on the exact operation performed, which was often
101101
confusing).
102102

103103
Because every single indexing step now behaves as a copy, this also means that
104-
"chained assignment" (updating a DataFrame with multiple setitem steps) will
105-
stop working. Because this now consistently never works, the
106-
``SettingWithCopyWarning`` is removed.
104+
**"chained assignment"** (updating a DataFrame with multiple setitem steps)
105+
**will stop working**. Because this now consistently never works, the
106+
``SettingWithCopyWarning`` is removed, and defensive ``.copy()`` calls to
107+
silence the warning are no longer needed.
107108

108109
The new behavioral semantics are explained in more detail in the
109110
:ref:`user guide about Copy-on-Write <copy_on_write>`.
@@ -130,10 +131,18 @@ and will be removed in pandas 4.0.
130131

131132
.. _whatsnew_300.enhancements.col:
132133

133-
``pd.col`` syntax can now be used in :meth:`DataFrame.assign` and :meth:`DataFrame.loc`
134-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
134+
Initial support for ``pd.col()`` syntax to create expressions
135+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
135136

136-
You can now use ``pd.col`` to create callables for use in dataframe methods which accept them. For example, if you have a dataframe
137+
This release introduces :func:`col` to refer to a DataFrame column by name
138+
and build up expressions.
139+
140+
This can be used as a simplified syntax to create callables for use in
141+
methods such as :meth:`DataFrame.assign`. In practice, where you would
142+
have to use a ``lambda`` function before, you can now use ``pd.col()``
143+
instead.
144+
145+
For example, if you have a dataframe
137146

138147
.. ipython:: python
139148
@@ -151,6 +160,18 @@ you can now write:
151160
152161
df.assign(c = pd.col('a') + pd.col('b'))
153162
163+
The expression object returned by :func:`col` supports all standard operators
164+
(like ``+``, ``-``, ``*``, ``/``, etc.) and all Series methods and namespaces
165+
(like ``pd.col("name").sum()``, ``pd.col("name").str.upper()``, etc.).
166+
167+
Currently, the ``pd.col()`` syntax can be used in any place which accepts a
168+
callable that takes the calling DataFrame as first argument and returns a
169+
Series, like ``lambda df: df[col_name]``.
170+
This includes :meth:`DataFrame.assign`, :meth:`DataFrame.loc`, and getitem/setitem.
171+
172+
It is expected that the support for ``pd.col()`` will be expanded to more methods
173+
in future releases.
174+
154175
New Deprecation Policy
155176
^^^^^^^^^^^^^^^^^^^^^^
156177
pandas 3.0.0 introduces a new 3-stage deprecation policy: using ``DeprecationWarning`` initially, then switching to ``FutureWarning`` for broader visibility in the last minor version before the next major release, and then removal of the deprecated functionality in the major release. This was done to give downstream packages more time to adjust to pandas deprecations, which should reduce the amount of warnings that a user gets from code that isn't theirs. See `PDEP 17 <https://pandas.pydata.org/pdeps/0017-backwards-compatibility-and-deprecation-policy.html>`_ for more details.

pandas/core/col.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,8 @@ def col(col_name: Hashable) -> Expression:
260260
:meth:`DataFrame.assign` or :meth:`DataFrame.loc`, can also accept
261261
``pd.col(col_name)``.
262262
263+
.. versionadded:: 3.0.0
264+
263265
Parameters
264266
----------
265267
col_name : Hashable

pandas/core/indexes/base.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -341,8 +341,13 @@ class Index(IndexOpsMixin, PandasObject):
341341
Data type for the output Index. If not specified, this will be
342342
inferred from `data`.
343343
See the :ref:`user guide <basics.dtypes>` for more usages.
344-
copy : bool, default False
345-
Copy input data.
344+
copy : bool, default None
345+
Whether to copy input data, only relevant for array, Series, and Index
346+
inputs (for other input, e.g. a list, a new array is created anyway).
347+
Defaults to True for array input and False for Index/Series.
348+
Set to False to avoid copying array input at your own risk (if you
349+
know the input data won't be modified elsewhere).
350+
Set to True to force copying Series/Index input up front.
346351
name : object
347352
Name to be stored in the index.
348353
tupleize_cols : bool (default: True)
@@ -482,7 +487,7 @@ def __new__(
482487
cls,
483488
data=None,
484489
dtype=None,
485-
copy: bool = False,
490+
copy: bool | None = None,
486491
name=None,
487492
tupleize_cols: bool = True,
488493
) -> Self:
@@ -499,9 +504,16 @@ def __new__(
499504
if not copy and isinstance(data, (ABCSeries, Index)):
500505
refs = data._references
501506

507+
if isinstance(data, (ExtensionArray, np.ndarray)):
508+
# GH 63306
509+
if copy is not False:
510+
if dtype is None or astype_is_view(data.dtype, dtype):
511+
data = data.copy()
512+
copy = False
513+
502514
# range
503515
if isinstance(data, (range, RangeIndex)):
504-
result = RangeIndex(start=data, copy=copy, name=name)
516+
result = RangeIndex(start=data, copy=bool(copy), name=name)
505517
if dtype is not None:
506518
return result.astype(dtype, copy=False)
507519
# error: Incompatible return value type (got "MultiIndex",
@@ -569,7 +581,7 @@ def __new__(
569581
data = com.asarray_tuplesafe(data, dtype=_dtype_obj)
570582

571583
try:
572-
arr = sanitize_array(data, None, dtype=dtype, copy=copy)
584+
arr = sanitize_array(data, None, dtype=dtype, copy=bool(copy))
573585
except ValueError as err:
574586
if "index must be specified when data is not list-like" in str(err):
575587
raise cls._raise_scalar_data_error(data) from err

pandas/core/indexes/datetimes.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1280,7 +1280,7 @@ def date_range(
12801280
timezone-naive unless timezone-aware datetime-likes are passed.
12811281
normalize : bool, default False
12821282
Normalize start/end dates to midnight before generating date range.
1283-
name : str, default None
1283+
name : Hashable, default None
12841284
Name of the resulting DatetimeIndex.
12851285
inclusive : {"both", "neither", "left", "right"}, default "both"
12861286
Include boundaries; Whether to set each bound as closed or open.
@@ -1524,7 +1524,7 @@ def bdate_range(
15241524
Asia/Beijing.
15251525
normalize : bool, default False
15261526
Normalize start/end dates to midnight before generating date range.
1527-
name : str, default None
1527+
name : Hashable, default None
15281528
Name of the resulting DatetimeIndex.
15291529
weekmask : str or None, default None
15301530
Weekmask of valid business days, passed to ``numpy.busdaycalendar``,

pandas/core/indexes/timedeltas.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ def timedelta_range(
284284
Number of periods to generate.
285285
freq : str, Timedelta, datetime.timedelta, or DateOffset, default 'D'
286286
Frequency strings can have multiples, e.g. '5h'.
287-
name : str, default None
287+
name : Hashable, default None
288288
Name of the resulting TimedeltaIndex.
289289
closed : str, default None
290290
Make the interval closed with respect to the given frequency to

pandas/tests/copy_view/index/test_index.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
DataFrame,
66
Index,
77
Series,
8+
array,
89
)
910
import pandas._testing as tm
1011
from pandas.tests.copy_view.util import get_array
@@ -150,3 +151,27 @@ def test_index_values():
150151
idx = Index([1, 2, 3])
151152
result = idx.values
152153
assert result.flags.writeable is False
154+
155+
156+
def test_constructor_copy_input_ndarray_default():
157+
arr = np.array([0, 1])
158+
idx = Index(arr)
159+
assert not np.shares_memory(arr, get_array(idx))
160+
161+
162+
def test_constructor_copy_input_ea_default():
163+
arr = array([0, 1], dtype="Int64")
164+
idx = Index(arr)
165+
assert not tm.shares_memory(arr, idx.array)
166+
167+
168+
def test_series_from_temporary_index_readonly_data():
169+
# GH 63370
170+
arr = np.array([0, 1], dtype=np.dtype(np.int8))
171+
arr.flags.writeable = False
172+
ser = Series(Index(arr))
173+
assert not np.shares_memory(arr, get_array(ser))
174+
assert ser._mgr._has_no_reference(0)
175+
ser[[False, True]] = np.array([0, 2], dtype=np.dtype(np.int8))
176+
expected = Series([0, 2], dtype=np.dtype(np.int8))
177+
tm.assert_series_equal(ser, expected)

web/pandas/community/blog/pandas-3.0-release-candidate.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,18 @@ release candidate now](#call-to-action-test-the-release-candidate).
1414

1515
pandas 3.0 introduces several major enhancements:
1616

17-
- **Dedicated string data type by default**: String columns are now inferred as
17+
- **Dedicated string data type by default**: string columns are now inferred as
1818
the new `str` dtype instead of `object`, providing better performance and type
1919
safety
2020
- **Consistent copy/view behaviour with Copy-on-Write (CoW)** (a.k.a. getting
21-
rid of the SettingWithCopyWarning): More predictable and consistent behavior
21+
rid of the SettingWithCopyWarning): more predictable and consistent behavior
2222
for all operations, with improved performance through avoiding unnecessary
2323
copies
24-
- **New `pd.col` syntax**: Initial support for `pd.col()` as a simplified syntax
24+
- **New `pd.col` syntax**: initial support for `pd.col()` as a simplified syntax
2525
for creating callables in `DataFrame.assign`
2626

27-
Together with a lot of other improvements and bug fixes. You can find the
28-
complete list of changes in our
27+
Further, pandas 3.0 includes a lot of other improvements and bug fixes. You can
28+
find the complete list of changes in our
2929
[release notes](https://pandas.pydata.org/docs/dev/whatsnew/v3.0.0.html).
3030

3131
## Important changes requiring code updates

0 commit comments

Comments
 (0)