Re: [sv-bc] Trimming whitespace from macro actuals

From: Steven Sharp <sharp_at_.....>
Date: Sat Sep 08 2007 - 20:02:07 PDT
>From: "Brad Pierce" <Brad.Pierce@synopsys.com>

>Is there any difference in meaning between the following macro
>invocations?
>
>   `M1(wire1,wire2)
>   `M1( wire1 , wire2 )
>
>Are the whitespaces trimmed before invocation?

Yes.  The LRM says that whitespace at the beginning and end of
the arguments is removed.  So these are equivalent.  In Verilog,
it didn't really matter much anyway, since whitespace was
irrelevant outside string literals, and arguments are not
substituted inside string literals.

In SV, whitespace starts mattering.  The '" mechanism allows
argument substitution inside string literals, so any whitespace
included in the argument becomes visible.  The '' mechanism
could also be affected by whitespace.

>What if the whitespaces are being used to terminate an escaped
>identifier?
>
>  `define paste(x,y) x``y
>   assign `paste(\7@ ,8) = 1'b1; 
>
>According to 3.7.1 in the V2005 LRM
>
>  "Neither the leading backslash character nor the terminating white
>space is considered to be part of the identifier."
>
>Another example --
>
>  assign `paste(\7@ ,\8@ ) = 1'b1;


With escaped identifiers, we have to be careful about whether we
are referring to the identifier name or the syntax used to specify
that identifier name.  The terminating white space is not part of
the identifier name, but it is part of the syntax used to specify
that name.

This starts to require macro expansion to be specified in more
detail than was ever required before the '' operator was added.

Does macro expansion substitute the exact text of the macro argument
and then operate on that text?  Or does it tokenize the macro
arguments somehow and operate on those tokens?  Or does it tokenize
them and then convert back to text?

Note that some form of token recognition is going on during macro
argument processing.  It recognizes the comma separators between
the arguments and the right parenthesis at the end.  It recognizes
that a comma inside quotes is part of a string literal, not an
argument separator, so it is essentially recognizing that a string
literal is an atomic unit.  It recognizes that commas inside curly
braces in a concatenation or inside parentheses in a function call
are not argument separators.  It recognizes that the right parenthesis
at the end of a function call is not the end of the macro invocation.
It recognizes that a comma, quote, curly bracket or parenthesis inside
an escaped identifier is part of the identifier, and not to be processed
like an argument separator, or part of a string literal, concatenation
or function argument.  But this could still be character-level processing,
without forming higher-level tokens.

In your first example,

   assign `paste(\7@ ,8) = 1'b1; 

If it is pure text processing, the space terminating the escaped
identifier is stripped like any whitespace after an argument.  So
you get

  assign \7@8 = 1'b1;
  
This seems to work as you would expect.  But if you don't happen to
provide any whitespace where you use the macro, things go wrong.  If
you write

  assign `paste(\7@ ,8)= 1'b1;
  
then you get

  assign \7@8= 1'b1;
  
Which makes the = part of the escaped identifier and makes this a
syntax error.

In your second example,

  assign `paste(\7@ ,\8@ ) = 1'b1;
  
pure text processing would give you

  assign \7@\8@ = 1'b1;
  
Which makes the \ on the second escaped identifier become part of the
name of the identifier, which probably isn't what you wanted.  You
wanted the identifier name to be the concatenation of the two separate
identifier names, "7@8@", which would be represented as \7@8@ .  I
assume this is what you were exploring with these examples.

If macro expansion is expected to preserve escaped identifiers in the
macro arguments, some form of tokenizing seems required.  And if you
want '' to be able to glue together escaped identifiers, then you
cannot just substitute the text for the tokens and then delete the '' 
characters.  In your second example, that would give

  assign \7@ \8@ = 1'b1;
  
Which fails to glue the two identifiers together to create one
identifier.

To make that work, '' has to operate at the token level after the
\7@ and \8@ have been recognized as identifiers named "7@" and "8@",
and gluing the identifier texts together into "7@8@".  Then if it is
to be converted back into pure text again, the text representation
for the identifier would need to be an escaped identifier.

Alternately, a set of special case rules for text processing where
escaped identifiers are concerned might be sufficient.  For example,
don't trim the first trailing space after an escaped identifier in
a macro argument.  But if a macro argument ending with white space
is being substituted just before '', then remove the white space at
that point.

I haven't thought this through all the way.  But it does seem to me
that the existing LRM text is inadequate.  I don't think this was an
issue before SV added '" and ''.  And I don't think we can look at
how C handles this with its stringizing and token-pasting operators,
since C doesn't have escaped identifiers.

Steven Sharp
sharp@cadence.com


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
Received on Sat Sep 8 20:02:34 2007

This archive was generated by hypermail 2.1.8 : Sat Sep 08 2007 - 20:02:51 PDT