Re: [sv-bc] 1339: (RESEND)`define behavior on trimming leading and trailing spaces in macros

From: Greg Jaxon <Greg.Jaxon_at_.....>
Date: Mon Nov 19 2007 - 11:22:59 PST
Alsop, Thomas R wrote:

> Here is the additional wording.  I took some of it from the ANSI C
> preprocessing document that Steven pointed us to. I am not a wizard yet
> on LRM word-smith’ing so any advice before we vote on this would be welcome.
> 
> Thanks, -Tom

You need more from the ANSI C standard - specifically the whole step-by-step
operational definition approach.  Here are some questions I have for your
definition:

   A) Is the backslash escape for newline applied before or after other
      uses of backslash as escape (for example in quoted strings, or escaped
      identifiers)?  If I want a quoted newline in the replacement, what
      should I write? (see below for some alternatives)

   B) Is backslash-newline whitespace?  I always assumed it was, but you treat
      it separately, why?

   C) Can the backslash-newline ligature be the terminating whitespace of
      of an escaped identifier?  If so, will the identifier end with a backslash
      or not?

   D) The first sentence defines "macro text" as being arbitrary stuff
      on the same "line"; veterans who know the Unix convention of escaped newlines
      can factor this in as just more arbitrary bytes.  But your additional
      sentences describe the "macro replacement string", which I feel is a misuse
      of the well-defined term "string".  I think both the term "text" and "string"
      are misleading, and the LRM should instead define the "macro_replacement formula",
      since it clearly contains free variables.  But ultimately the trimming
      effort belies the original definition of this text as "arbitrary".

> The macro text can be any arbitrary text specified on the same line as
> the text macro name. If more than one line is necessary to specify the
> text, the newline shall be preceded by a backslash ( \ ). The first
> newline not preceded by a backslash shall end the macro text. The
> newline preceded by a backslash shall be replaced in the expanded macro
> with a newline (but without the preceding backslash character).

Which raises question (A) what tokenization happens after this replacement,
and what backslash substitutions happen before it.  Which text below expands to the
newline character?

`define ascii_NL "\\
"

or

`define ascii_NL "\\\
"
?

In Unix conventions, a "line" is defined as arbitrary text delimited by unescaped newlines.
I'd prefer to see that definition once very early in the lexical convention section,
and then simply not deal with it, except in notes or examples to illustrate the concept.

Similarly "whitespace" comprises space, the horizontal and vertical tabs, and newline,
maybe carriage return - and possibly others.  Isn't there a standard covering this?

As to whether the committee should back down from "arbitrary" text to "trimmed" text,
I would personally recommend trimming /leading/ whitespace, but NOT /trailing/ whitespace.
The first is done in the interest of free vertical alignment, to make
   `define A 1
equal
   `define \A 1
, and to prevent any macro from expanding to mere whitespace.

The second is done to finesse this whole complication about escaped identifiers.

> Any white-space characters preceding or following the macro replacement
> string are not considered part of the replacement. Additionally, for
> multi-line macros any trailing white-space between the last token on a
> line and the newline before a backslash is not considered part of the
> replacement.

That "Additionally" clause is probably just a note on your original definition,
not an actual addition to the rules.  I oppose the trimming of trailing
whitespace.  However, I don't vote, so don't fret about it if you can get
consensus otherwise.

Greg


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
Received on Mon Nov 19 11:23:32 2007

This archive was generated by hypermail 2.1.8 : Mon Nov 19 2007 - 11:24:03 PST