Merge branch 'frame_1' of https://github.com/zhangbowen-coder/pandas into frame_1

zhangbowen-coder · zhangbowen-coder · commit 616e8b3f1289 · 2025-12-17T13:03:19.000+08:00
diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst
@@ -80,8 +80,8 @@ and how to adapt your code to the new default.
 
 .. _whatsnew_300.enhancements.copy_on_write:
 
-Copy-on-Write
-^^^^^^^^^^^^^
+Consistent copy/view behaviour with Copy-on-Write
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The new "copy-on-write" behaviour in pandas 3.0 brings changes in behavior in
 how pandas operates with respect to copies and views. A summary of the changes:
@@ -101,9 +101,10 @@ copy or a view depended on the exact operation performed, which was often
 confusing).
 
 Because every single indexing step now behaves as a copy, this also means that
-"chained assignment" (updating a DataFrame with multiple setitem steps) will
-stop working. Because this now consistently never works, the
-``SettingWithCopyWarning`` is removed.
+**"chained assignment"** (updating a DataFrame with multiple setitem steps)
+**will stop working**. Because this now consistently never works, the
+``SettingWithCopyWarning`` is removed,  and defensive ``.copy()`` calls to
+silence the warning are no longer needed.
 
 The new behavioral semantics are explained in more detail in the
 :ref:`user guide about Copy-on-Write <copy_on_write>`.
@@ -130,10 +131,18 @@ and will be removed in pandas 4.0.
 
 .. _whatsnew_300.enhancements.col:
 
-``pd.col`` syntax can now be used in :meth:`DataFrame.assign` and :meth:`DataFrame.loc`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Initial support for ``pd.col()`` syntax to create expressions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-You can now use ``pd.col`` to create callables for use in dataframe methods which accept them. For example, if you have a dataframe
+This release introduces :func:`col` to refer to a DataFrame column by name
+and build up expressions.
+
+This can be used as a simplified syntax to create callables for use in
+methods such as :meth:`DataFrame.assign`. In practice, where you would
+have to use a ``lambda`` function before, you can now use ``pd.col()``
+instead.
+
+For example, if you have a dataframe
 
 .. ipython:: python
 
@@ -151,6 +160,18 @@ you can now write:
 
     df.assign(c = pd.col('a') + pd.col('b'))
 
+The expression object returned by :func:`col` supports all standard operators
+(like ``+``, ``-``, ``*``, ``/``, etc.) and all Series methods and namespaces
+(like ``pd.col("name").sum()``, ``pd.col("name").str.upper()``, etc.).
+
+Currently, the ``pd.col()`` syntax can be used in any place which accepts a
+callable that takes the calling DataFrame as first argument and returns a
+Series, like ``lambda df: df[col_name]``.
+This includes :meth:`DataFrame.assign`, :meth:`DataFrame.loc`, and getitem/setitem.
+
+It is expected that the support for ``pd.col()`` will be expanded to more methods
+in future releases.
+
 New Deprecation Policy
 ^^^^^^^^^^^^^^^^^^^^^^
 pandas 3.0.0 introduces a new 3-stage deprecation policy: using ``DeprecationWarning`` initially, then switching to ``FutureWarning`` for broader visibility in the last minor version before the next major release, and then removal of the deprecated functionality in the major release. This was done to give downstream packages more time to adjust to pandas deprecations, which should reduce the amount of warnings that a user gets from code that isn't theirs. See `PDEP 17 <https://pandas.pydata.org/pdeps/0017-backwards-compatibility-and-deprecation-policy.html>`_ for more details.
diff --git a/pandas/core/col.py b/pandas/core/col.py
@@ -260,6 +260,8 @@ def col(col_name: Hashable) -> Expression:
     :meth:`DataFrame.assign` or :meth:`DataFrame.loc`, can also accept
     ``pd.col(col_name)``.
 
+    .. versionadded:: 3.0.0
+
     Parameters
     ----------
     col_name : Hashable
diff --git a/pandas/core/indexes/base.py b/pandas/core/indexes/base.py
@@ -341,8 +341,13 @@ class Index(IndexOpsMixin, PandasObject):
         Data type for the output Index. If not specified, this will be
         inferred from `data`.
         See the :ref:`user guide <basics.dtypes>` for more usages.
-    copy : bool, default False
-        Copy input data.
+    copy : bool, default None
+        Whether to copy input data, only relevant for array, Series, and Index
+        inputs (for other input, e.g. a list, a new array is created anyway).
+        Defaults to True for array input and False for Index/Series.
+        Set to False to avoid copying array input at your own risk (if you
+        know the input data won't be modified elsewhere).
+        Set to True to force copying Series/Index input up front.
     name : object
         Name to be stored in the index.
     tupleize_cols : bool (default: True)
@@ -482,7 +487,7 @@ def __new__(
         cls,
         data=None,
         dtype=None,
-        copy: bool = False,
+        copy: bool | None = None,
         name=None,
         tupleize_cols: bool = True,
     ) -> Self:
@@ -499,9 +504,16 @@ def __new__(
         if not copy and isinstance(data, (ABCSeries, Index)):
             refs = data._references
 
+        if isinstance(data, (ExtensionArray, np.ndarray)):
+            # GH 63306
+            if copy is not False:
+                if dtype is None or astype_is_view(data.dtype, dtype):
+                    data = data.copy()
+                    copy = False
+
         # range
         if isinstance(data, (range, RangeIndex)):
-            result = RangeIndex(start=data, copy=copy, name=name)
+            result = RangeIndex(start=data, copy=bool(copy), name=name)
             if dtype is not None:
                 return result.astype(dtype, copy=False)
             # error: Incompatible return value type (got "MultiIndex",
@@ -569,7 +581,7 @@ def __new__(
                 data = com.asarray_tuplesafe(data, dtype=_dtype_obj)
 
         try:
-            arr = sanitize_array(data, None, dtype=dtype, copy=copy)
+            arr = sanitize_array(data, None, dtype=dtype, copy=bool(copy))
         except ValueError as err:
             if "index must be specified when data is not list-like" in str(err):
                 raise cls._raise_scalar_data_error(data) from err
diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py
@@ -1280,7 +1280,7 @@ def date_range(
         timezone-naive unless timezone-aware datetime-likes are passed.
     normalize : bool, default False
         Normalize start/end dates to midnight before generating date range.
-    name : str, default None
+    name : Hashable, default None
         Name of the resulting DatetimeIndex.
     inclusive : {"both", "neither", "left", "right"}, default "both"
         Include boundaries; Whether to set each bound as closed or open.
@@ -1524,7 +1524,7 @@ def bdate_range(
         Asia/Beijing.
     normalize : bool, default False
         Normalize start/end dates to midnight before generating date range.
-    name : str, default None
+    name : Hashable, default None
         Name of the resulting DatetimeIndex.
     weekmask : str or None, default None
         Weekmask of valid business days, passed to ``numpy.busdaycalendar``,
diff --git a/pandas/core/indexes/timedeltas.py b/pandas/core/indexes/timedeltas.py
@@ -284,7 +284,7 @@ def timedelta_range(
         Number of periods to generate.
     freq : str, Timedelta, datetime.timedelta, or DateOffset, default 'D'
         Frequency strings can have multiples, e.g. '5h'.
-    name : str, default None
+    name : Hashable, default None
         Name of the resulting TimedeltaIndex.
     closed : str, default None
         Make the interval closed with respect to the given frequency to
diff --git a/pandas/tests/copy_view/index/test_index.py b/pandas/tests/copy_view/index/test_index.py
@@ -5,6 +5,7 @@
     DataFrame,
     Index,
     Series,
+    array,
 )
 import pandas._testing as tm
 from pandas.tests.copy_view.util import get_array
@@ -150,3 +151,27 @@ def test_index_values():
     idx = Index([1, 2, 3])
     result = idx.values
     assert result.flags.writeable is False
+
+
+def test_constructor_copy_input_ndarray_default():
+    arr = np.array([0, 1])
+    idx = Index(arr)
+    assert not np.shares_memory(arr, get_array(idx))
+
+
+def test_constructor_copy_input_ea_default():
+    arr = array([0, 1], dtype="Int64")
+    idx = Index(arr)
+    assert not tm.shares_memory(arr, idx.array)
+
+
+def test_series_from_temporary_index_readonly_data():
+    # GH 63370
+    arr = np.array([0, 1], dtype=np.dtype(np.int8))
+    arr.flags.writeable = False
+    ser = Series(Index(arr))
+    assert not np.shares_memory(arr, get_array(ser))
+    assert ser._mgr._has_no_reference(0)
+    ser[[False, True]] = np.array([0, 2], dtype=np.dtype(np.int8))
+    expected = Series([0, 2], dtype=np.dtype(np.int8))
+    tm.assert_series_equal(ser, expected)
diff --git a/web/pandas/community/blog/pandas-3.0-release-candidate.md b/web/pandas/community/blog/pandas-3.0-release-candidate.md
@@ -14,18 +14,18 @@ release candidate now](#call-to-action-test-the-release-candidate).
 
 pandas 3.0 introduces several major enhancements:
 
-- **Dedicated string data type by default**: String columns are now inferred as
+- **Dedicated string data type by default**: string columns are now inferred as
   the new `str` dtype instead of `object`, providing better performance and type
   safety
 - **Consistent copy/view behaviour with Copy-on-Write (CoW)** (a.k.a. getting
-  rid of the SettingWithCopyWarning): More predictable and consistent behavior
+  rid of the SettingWithCopyWarning): more predictable and consistent behavior
   for all operations, with improved performance through avoiding unnecessary
   copies
-- **New `pd.col` syntax**: Initial support for `pd.col()` as a simplified syntax
+- **New `pd.col` syntax**: initial support for `pd.col()` as a simplified syntax
   for creating callables in `DataFrame.assign`
 
-Together with a lot of other improvements and bug fixes. You can find the
-complete list of changes in our
+Further, pandas 3.0 includes a lot of other improvements and bug fixes. You can
+find the complete list of changes in our
 [release notes](https://pandas.pydata.org/docs/dev/whatsnew/v3.0.0.html).
 
 ## Important changes requiring code updates