Skip to content

octal and hex escapes in char and string declarations require further support #15

@markjenkins

Description

@markjenkins

Octal (\NNN) and hex escape (\xNN) sequences in char and string declarations require further support.

Test C code:

#include <stdio.h>

#define NUM_TEST_SINGLE_CHAR 19
char test_single_char[NUM_TEST_SINGLE_CHAR] = {
  '\0',  // octal
  '\00', // octal
  '\000',// octal
  '\x0', // hex
  '\x00',// hex
  
  '\1',  // octal
  '\01', // octal
  '\001',// octal
  '\x1', // hex
  '\x01',// hex

  '\t',
  '\11', // octal
  '\011',// octal
  '\x9', // hex
  '\x09',// hex

  'z',
  '\172',// octal
  '\x7a',// hex
  '\x7A',// hex
};


#define NUM_TEST_STR_SINGLE_CHAR 19
char * test_str_single_char[NUM_TEST_STR_SINGLE_CHAR] = {
  "\0",  // octal
  "\00", // octal
  "\000",// octal
  "\x0", // hex
  "\x00",// hex
  
  "\1",  // octal
  "\01", // octal
  "\001",// octal
  "\x1", // hex
  "\x01",// hex

  "\t",
  "\11", // octal
  "\011",// octal
  "\x9", // hex
  "\x09",// hex

  "z",
  "\172",// octal
  "\x7a",// hex
  "\x7A",// hex
};

#define NUM_TEST_STR_MULTI_CHAR 13
char * test_str_multi_char[] = {
  "Tab\tGap",
  "Tab\11Gap",       // octal
  "Tab\011Gap",      // octal
  "Tab\x9Gap",       // hex, barely works as G is not a hex character
  "Tab\x9" "Gap",    // hex, but more clear and less scary
  "Tab\x09Gap", // hex, two characters
  "Tab\x9" "animals", // hex, restarting the quoting a nescesity
  "zzzAre sleepy time",
  "z\x7a\x7A" "Are sleepy time", // nescessary restarting of quoting
  "z z z Are sleepy time",
  "\172 \x7a \x7A Are sleepy time", // no restart of quoting required
  "time for zzzz",
  "time for \x7a\172\x7Az",
};

int main(){
  int i=0;
  for (i=0; i<NUM_TEST_SINGLE_CHAR; i++){
    // print single in quotes with trailing new line
    // \x for hex and %.2x to print in 2 digit hex format
    printf("'\\x%.2x'\n", test_single_char[i]);
  }
  printf("\n");
  for (i=0; i<NUM_TEST_STR_SINGLE_CHAR; i++){
    // print in double quotes with trailing new line
    // \x for hex and %.2x to print in 2 digit hex format
    printf("\"\\x%.2x\"\n", *test_str_single_char[i]);
  }
  printf("\n");
  
  for (i=0; i<NUM_TEST_STR_MULTI_CHAR; i++){
    printf("%s\n", test_str_multi_char[i]);
  }
  
  return 0;
}

For hex escape sequences this leads to cparser.simple_escape_char being invoked by cpre2_parse() with 'x' as an argument. Hex escape sequences are not of the simple kind that simple_escape_char is designed for. Handling for '\0' and "\0" doesn't recognize that these particular sequences are octal escapes.

Additional states are required in cpre2_parse().

The output of the above should be:

'\x00'
'\x00'
'\x00'
'\x00'
'\x00'
'\x01'
'\x01'
'\x01'
'\x01'
'\x01'
'\x09'
'\x09'
'\x09'
'\x09'
'\x09'
'\x7a'
'\x7a'
'\x7a'
'\x7a'

"\x00"
"\x00"
"\x00"
"\x00"
"\x00"
"\x01"
"\x01"
"\x01"
"\x01"
"\x01"
"\x09"
"\x09"
"\x09"
"\x09"
"\x09"
"\x7a"
"\x7a"
"\x7a"
"\x7a"

Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	Gap
Tab	animals
zzzAre sleepy time
zzzAre sleepy time
z z z Are sleepy time
z z z Are sleepy time
time for zzzz
time for zzzz

I have some initial code to address the hex escapes in double quoated strings. After this issue is opened I'll reference the issue number.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions