Move string concatenation for C into the parser

Programming / Compilers / GCC - jsm28 [138bc75d-0d04-0410-961f-82ee72b054a4] - 7 November 2019 01:01 EST

This patch is another piece of preparation for C2x attributes support.

C2x attributes require unbounded lookahead in the parser, because the token sequence '[[' that starts a C2x attribute is also valid in Objective-C in some of the same contexts, so it is necessary to see whether the matching ']]' are consecutive tokens or not to determine whether those tokens start an attribute.

Unbounded lookahead means lexing an unbounded number of tokens before they are parsed. c_lex_one_token does various context-sensitive processing of tokens that cannot be done at that lookahead time, because it depends on information (such as whether particular identifiers are typedefs) that may be different at the time it is relevant than at the time the lookahead is needed (recall that more or less arbitrary C code, including declarations and statements, can appear inside expressions in GNU C).

Most of that context-sensitive processing is not a problem, simply because it is not needed for lookahead purposes so can be deferred until the tokens lexed during lookahead are parsed. However, the earliest piece of context-sensitive processing is the handling of string literals based on flags passed to c_lex_with_flags, which determine whether adjacent literals are concatenated and whether translation to the execution character set occurs.

Because the choice of whether to translate to the execution character set is context-sensitive, this means that unbounded lookahead requires the C parser to move to the approach used by the C++ parser, where string literals are generally not translated or concatenated from within c_lex_with_flags, but only later in the parser once it knows whether translation is needed. (Translation requires the tokens in their form before concatenation.)

Thus, this patch makes that change to the C parser. Flags in the parser are still used for two special cases similar to C++: the handling of an initial #pragma pch_preprocess, and arranging for strings inside attributes not to be translated (the latter is made more logically correct by saving and restoring the flags, as in the C++ parser, rather than assuming that the state outside the attribute was always to translate string literals, which might not be the case in corner cases involving declarations and attributes inside attributes).

The consequent change to pragma_lex to use c_parser_string_literal makes it disallow wide strings and disable translation in that context, which also follows C++ and is more logically correct than the previous state without special handling in that regard. Translation to the execution character set is always disabled when string constants are handled in the GIMPLE parser.

Although the handling of strings is now a lot closer to that in C++, there are still some differences, in particular regarding the handling of locations. See c-c++-common/Wformat-pr88257.c, which has different expected multiline diagnostic output for C and C++, for example; I'm not sure whether the C or C++ output is better there (C++ has a more complete range than C, C mentions a macro definition location that C++ doesn't), but I tried to keep the locations the same as those previously used by the C front end, as far as possible, to minimize the testsuite changes needed, rather than possibly making them closer to those used with C++.

The only changes needed for tests of user-visible diagnostics were for the wording of one diagnostic changing to match C++ (as a consequence of having a check for wide strings based on a flag in a general string-handling function rather than in a function specific to asm). However, although locations are extremely similar to what they were before, I couldn't make them completely identical in all cases. (My understanding of the implementation reason for the differences is as follows: lex_string uses src_loc from each cpp_token; the C parser is using the virtual location from cpp_get_token_with_location as called by c_lex_with_flags, and while passing that through linemap_resolve_location with LRK_MACRO_DEFINITION_LOCATION, as this patch does, produces something very close to what lex_string uses, it's not completely identical in some cases.)

This results in changes being needed to two of the gcc.dg/plugin tests that use a plugin to test details of how string locations are handled. Because the tests being changed are for ICEs and the only change is to the details of the particular non-user-visible error that code gives in cases it can't handle (one involving __FILE__, one involving a string literal from stringizing), I think it's OK to change that non-user-visible error and that the new errors are no worse than the old ones. So these particular errors are now different for C and C++ (some other messages in those tests already had differences between C and C++).

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

- c-parser.c (c_parser): Remove lex_untranslated_string. Add lex_joined_string and translate_strings_p. (c_lex_one_token): Pass 0 or C_LEX_STRING_NO_JOIN to c_lex_with_flags. (c_parser_string_literal): New function. (c_parser_static_assert_declaration_no_semi): Use c_parser_string_literal. Do not set lex_untranslated_string. (c_parser_asm_string_literal): Use c_parser_string_literal. (c_parser_simple_asm_expr): Do not set lex_untranslated_string. (c_parser_gnu_attributes): Set and restore translate_strings_p instead of lex_untranslated_string. (c_parser_asm_statement): Do not set lex_untranslated_string. (c_parser_asm_operands): Likewise. (c_parser_has_attribute_expression): Set and restore translate_strings_p instead of lex_untranslated_string. (c_parser_postfix_expression): Use c_parser_string_literal. (pragma_lex): Likewise. (c_parser_pragma_pch_preprocess): Set lex_joined_string. (c_parse_file): Set translate_strings_p.
- gimple-parser.c (c_parser_gimple_postfix_expression) (c_parser_gimple_or_rtl_pass_list): Use c_parser_string_literal.
- c-parser.c (c_parser_string_literal): Declare function.

- gcc.dg/asm-wide-1.c, gcc.dg/diagnostic-token-ranges.c, gcc.dg/plugin/diagnostic-test-string-literals-1.c, gcc.dg/plugin/diagnostic-test-string-literals-2.c: Update expected diagnostics.

a65d5ae024a Move string concatenation for C into the parser.
gcc/c/ChangeLog | 25 ++
gcc/c/c-parser.c | 274 ++++++++++++++++-----
gcc/c/c-parser.h | 1 +
gcc/c/gimple-parser.c | 9 +-
gcc/testsuite/ChangeLog | 7 +
gcc/testsuite/gcc.dg/asm-wide-1.c | 18 +-
gcc/testsuite/gcc.dg/diagnostic-token-ranges.c | 2 +-
.../plugin/diagnostic-test-string-literals-1.c | 3 +-
.../plugin/diagnostic-test-string-literals-2.c | 3 +-
9 files changed, 261 insertions(+), 81 deletions(-)


  • Share