RE: [sv-bc] Trimming whitespace from macro actuals

From: Brad Pierce <Brad.Pierce_at_.....>
Date: Sat Sep 08 2007 - 23:30:22 PDT
Steven,

I looked through every occurrence of "white space" and "whitespace" in
Draft 3a, but did not find the following rule --

> The LRM says that whitespace at the beginning and end of the arguments
is removed.

Maybe you were thinking of the LRM's discussion of macro formals?

Nevertheless, your analysis of the problem is useful and interesting.  I
agree with you that Verilog's \-and-whitespace escape brackets make the
problem more complicated than C had to deal with. 

>And I don't think we can look at how C handles this with its
stringizing
>and token-pasting operators, since C doesn't have escaped identifiers.

-- Brad

-----Original Message-----
From: Steven Sharp [mailto:sharp@cadence.com] 
Sent: Saturday, September 08, 2007 8:02 PM
To: sv-bc@eda-stds.org; Brad.Pierce@synopsys.COM
Subject: Re: [sv-bc] Trimming whitespace from macro actuals


>From: "Brad Pierce" <Brad.Pierce@synopsys.com>

>Is there any difference in meaning between the following macro 
>invocations?
>
>   `M1(wire1,wire2)
>   `M1( wire1 , wire2 )
>
>Are the whitespaces trimmed before invocation?

Yes.  The LRM says that whitespace at the beginning and end of the
arguments is removed.  So these are equivalent.  In Verilog, it didn't
really matter much anyway, since whitespace was irrelevant outside
string literals, and arguments are not substituted inside string
literals.

In SV, whitespace starts mattering.  The '" mechanism allows argument
substitution inside string literals, so any whitespace included in the
argument becomes visible.  The '' mechanism could also be affected by
whitespace.

>What if the whitespaces are being used to terminate an escaped 
>identifier?
>
>  `define paste(x,y) x``y
>   assign `paste(\7@ ,8) = 1'b1;
>
>According to 3.7.1 in the V2005 LRM
>
>  "Neither the leading backslash character nor the terminating white 
>space is considered to be part of the identifier."
>
>Another example --
>
>  assign `paste(\7@ ,\8@ ) = 1'b1;


With escaped identifiers, we have to be careful about whether we are
referring to the identifier name or the syntax used to specify that
identifier name.  The terminating white space is not part of the
identifier name, but it is part of the syntax used to specify that name.

This starts to require macro expansion to be specified in more detail
than was ever required before the '' operator was added.

Does macro expansion substitute the exact text of the macro argument and
then operate on that text?  Or does it tokenize the macro arguments
somehow and operate on those tokens?  Or does it tokenize them and then
convert back to text?

Note that some form of token recognition is going on during macro
argument processing.  It recognizes the comma separators between the
arguments and the right parenthesis at the end.  It recognizes that a
comma inside quotes is part of a string literal, not an argument
separator, so it is essentially recognizing that a string literal is an
atomic unit.  It recognizes that commas inside curly braces in a
concatenation or inside parentheses in a function call are not argument
separators.  It recognizes that the right parenthesis at the end of a
function call is not the end of the macro invocation.
It recognizes that a comma, quote, curly bracket or parenthesis inside
an escaped identifier is part of the identifier, and not to be processed
like an argument separator, or part of a string literal, concatenation
or function argument.  But this could still be character-level
processing, without forming higher-level tokens.

In your first example,

   assign `paste(\7@ ,8) = 1'b1; 

If it is pure text processing, the space terminating the escaped
identifier is stripped like any whitespace after an argument.  So you
get

  assign \7@8 = 1'b1;
  
This seems to work as you would expect.  But if you don't happen to
provide any whitespace where you use the macro, things go wrong.  If you
write

  assign `paste(\7@ ,8)= 1'b1;
  
then you get

  assign \7@8= 1'b1;
  
Which makes the = part of the escaped identifier and makes this a syntax
error.

In your second example,

  assign `paste(\7@ ,\8@ ) = 1'b1;
  
pure text processing would give you

  assign \7@\8@ = 1'b1;
  
Which makes the \ on the second escaped identifier become part of the
name of the identifier, which probably isn't what you wanted.  You
wanted the identifier name to be the concatenation of the two separate
identifier names, "7@8@", which would be represented as \7@8@ .  I
assume this is what you were exploring with these examples.

If macro expansion is expected to preserve escaped identifiers in the
macro arguments, some form of tokenizing seems required.  And if you
want '' to be able to glue together escaped identifiers, then you cannot
just substitute the text for the tokens and then delete the '' 
characters.  In your second example, that would give

  assign \7@ \8@ = 1'b1;
  
Which fails to glue the two identifiers together to create one
identifier.

To make that work, '' has to operate at the token level after the \7@
and \8@ have been recognized as identifiers named "7@" and "8@", and
gluing the identifier texts together into "7@8@".  Then if it is to be
converted back into pure text again, the text representation for the
identifier would need to be an escaped identifier.

Alternately, a set of special case rules for text processing where
escaped identifiers are concerned might be sufficient.  For example,
don't trim the first trailing space after an escaped identifier in a
macro argument.  But if a macro argument ending with white space is
being substituted just before '', then remove the white space at that
point.

I haven't thought this through all the way.  But it does seem to me that
the existing LRM text is inadequate.  I don't think this was an issue
before SV added '" and ''.  And I don't think we can look at how C
handles this with its stringizing and token-pasting operators, since C
doesn't have escaped identifiers.

Steven Sharp
sharp@cadence.com


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
Received on Sat Sep 8 23:31:03 2007

This archive was generated by hypermail 2.1.8 : Sat Sep 08 2007 - 23:31:27 PDT