Incorrect parsing of JSON strings with surrogate pair escape sequences

The JSON string `"\ud83d\udc95"` has one codepoint, not two.                                            
                                                                                                      
This is because the spec allows extended characters to be encoded as a pair of 16-bit values, called a "surrogate pair".                                        
                                                                                                      
From RFC 4627:                                                                                        
                                                                                                      
    To escape an extended character that is not in the Basic Multilingual                               
    Plane, the character is represented as a twelve-character sequence,                                 
    encoding the UTF-16 surrogate pair.  So, for example, a string                                      
    containing only the G clef character (U+1D11E) may be represented as                                
    "\uD834\uDD1E".

But SWI-Prolog's JSON parser reads that string as two (invalid) characters.

I have fixed this in my fork and will submit a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect parsing of JSON strings with surrogate pair escape sequences #158

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect parsing of JSON strings with surrogate pair escape sequences #158

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions