Skip to content

Pasting non-BMP emoji into the REPL in Windows Terminal fails with UnicodeDecodeError #140788

@tjol

Description

@tjol

Bug report

Bug description:

Pasting non-BMP unicode characters (such as 🔬) into the Python 3.14.0 REPL running in Windows Terminal causes the REPL to raise a UnicodeDecodeError

Steps to reproduce:

  1. Start Python 3.14, on Windows, in Windows Terminal.
  2. (optional) evaluate '\U0001F52C' to get a string with non-BMP character (it will print successfully)
  3. copy the string '🔬' (either from here or the terminal)
  4. paste in the terminal

Expected result: the string appears in the terminal and is equivalent to '\U0001F52C'

Actual result: a traceback appears:

Traceback (most recent call last):
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\readline.py", line 395, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
    self.do_cmd(cmd)
    ~~~~~~~~~~~^^^^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
    self.refresh()
    ~~~~~~~~~~~~^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
    self.screen = self.calc_screen()
                  ~~~~~~~~~~~~~~~~^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
    screen = super().calc_screen()
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
    colors = list(gen_colors(self.get_unicode()))
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\utils.py", line 109, in gen_colors
    for color in gen_colors_from_token_stream(gen, line_lengths):
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\utils.py", line 169, in gen_colors_from_token_stream
    for prev_token, token, next_token in token_window:
                                         ^^^^^^^^^^^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\_pyrepl\utils.py", line 370, in prev_next_window
    window = deque((None, next(iterator)), maxlen=3)
                          ~~~~^^^^^^^^^^
  File "C:\Users\ThomasJollans\AppData\Local\Programs\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 1-2: surrogates not allowed

Note: in Python 3.13, the string would be replaced by '??' upon pasting. In Python 3.12, it would be replaced by '??' upon evaluation. Neither of these are ideal, but possibly better than not being able to paste at all.

CPython versions tested on:

3.14

Operating systems tested on:

Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-windowsstdlibStandard Library Python modules in the Lib/ directorytopic-replRelated to the interactive shelltype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions