Skip to content

DataFrame(::PyPandasDataFrame) converts date & datetime to bytes #293

@tomdstone

Description

@tomdstone

When doing some work involving dataframes in python via PythonCall, it seems like DataFrame(PyTable(p)) where p is a pandas data table converts the date and datetime columns into byte vectors. Is this issue related to the issue #265 with milliseconds vs microseconds, or due to a missing part of the DataFrame(::PyPandasDataFrame) implementation?

Here are a few minimal examples, in a conda environment with pandas.

using PythonCall
using Dates
using DataFrames

a = DataFrame(x = [now()])  # julia dataframe
b = pytable(a)              # pandas dataframe
c = PyTable(b)              # PyPandasDataFrame
d = DataFrame(c)            # julia dataframe again

This results in:

julia> c
1×1 PyPandasDataFrame
                        x
0 2023-04-13 14:36:13.939

julia> d
1×1 DataFrame
 Row │ x
     │ PyArray…
─────┼───────────────────────────────────
   1 │ UInt8[0xc0, 0x62, 0x31, 0x8c, 0x…

The same thing happens when initially defining b as a pandas dataframe, so the microsecond issue in #265 seems to not be the problem?

julia> b = pd.DataFrame([[dt.datetime.now()]])
Python DataFrame:
                           0
0 2023-04-13 14:46:57.940077

julia> c = PyTable(b)
1×1 PyPandasDataFrame       
                           0
0 2023-04-13 14:46:57.940077

julia> d = DataFrame(c)
1×1 DataFrame
 Row │ 0
     │ PyArray…
─────┼───────────────────────────────────
   1 │ UInt8[0xc8, 0xf9, 0xa5, 0x7d, 0x…

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions