Skip to content

prefix arg to pandas.read_csv is deprecated #2

@ChickenProp

Description

@ChickenProp

As of 1.4.0. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html Will be removed in future, dunno when.

Per pandas-dev/pandas#43396, the reason seems to be it conflicted with the names and headers arguments, and didn't add much value. (I think it only took effect if you passed header=None and didn't pass names.) Instead of prefix='foo_' you're now supposed to call df.columns = [f'foo_{col}' for col in df.columns] after read_csv.

But p9-cli doesn't give the user a way to do that. By default columns are just numbered 0,1,..., and plotnine.aes works best if the column names are valid python identifiers. So if you have no header, your options to replace prefix=col seem to be:

  • Provide names,=col0 ,=col1 ,=col2 ... - annoying.
  • Instead of x=col0 y=col1, use x='data[0]' y='data[1]' - undocumented in p9-cli, more verbose, and I'm not sure it works in all the same ways.

If I don't like these, options for p9-cli seem to be:

  • Implement prefix ourselves, by removing it from the kwargs passed to read_csv and then renaming the columns afterwards.
    • Either always rename if prefix is passed (different from read_csv), or only if header and names are both None (might be helpful if e.g. the header in the file is numeric; unlikely to cause problems?).
  • If header and names are both None, automatically rename columns to add a prefix. (Obvious choices are c, col or col_. Following q I think I like c.)
  • Some combo. Perhaps: if header and names are both None, automatically add prefix. Look at the prefix kwarg to choose the prefix, sensible default if not given. If they're not both None, and there's a prefix kwarg, add a prefix anyway. I think I like this best.

Could also move this prefix arg outside of --csv, which would improve consistency, and possibly also apply to --dataset and (if supported in future) reading from sqlite tables and stuff. I think I'll leave it there at least for now though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions