Skip to content

csv.Sniffer fails on CR (\r) line endings due to hardcoded split('\n') #142188

@ChuheLin

Description

@ChuheLin

Bug report

Bug description:

The csv.Sniffer._guess_delimiter method currently uses data.split('\n') to split lines.
This prevents it from correctly processing CSV data that uses Classic Mac (\r) line endings.

The Issue

Because split('\n') ignores \r, valid multi-line data using CR is interpreted as a single line.
This causes Sniffer to fail (raising Could not determine delimiter) or, in cases of mixed line endings (e.g., concatenated streams), to silently detect the wrong delimiter.

Reproduction

import csv

# Scenario 1: Pure CR data (Classic Mac / Legacy Systems)
# Sniffer sees 1 line instead of 2. Fails to determine delimiter.
sample_cr = "Name,Age\rAlice,30" 
try:
    csv.Sniffer().sniff(sample_cr)
except csv.Error as e:
    print(f"CR Failure: {e}")

# Scenario 2: Mixed endings (e.g. concatenated strings)
# Sniffer merges lines incorrectly, leading to SILENT DATA CORRUPTION.
# It detects '0' instead of ',' because ',' frequency becomes inconsistent.
sample_mixed = "User,ID\rAlice,001\nBob,002"
dialect = csv.Sniffer().sniff(sample_mixed)
print(f"Mixed Failure (Detected): {dialect.delimiter!r}")

Proposed Fix

Replace data.split('\n') with data.splitlines(). This aligns Sniffer's behavior with csv.reader, which correctly handles universal newlines (\r, \n, \r\n).

I have implemented the fix and added regression tests locally. Submitting a PR shortly.

CPython versions tested on:

CPython main branch, 3.11

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    pendingThe issue will be closed if no feedback is providedstdlibStandard Library Python modules in the Lib/ directory

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions