Re: [sv-bc] Trimming whitespace from macro actuals

From: Coffin, Eric <eric_coffin_at_.....> Date: Mon Sep 10 2007 - 14:42:04 PDT · This archive was generated by hypermail 2.1.8 : Mon Sep 10 2007 - 14:43:27 PDT

Steven brings up a number of points that have been discussed at some 
length internally at Mentor. Given the fact that many verification 
methodologies require macro use, and that the LRM's treatment seems to 
come up a little short, I would like to see a rework of the macro 
section in the next release of the LRM.

We're open to any changes which unambiguously define how to create and 
use macros. Some experimentation leads me to prefer an implementation in 
which macro text is tokenized, especially if macros with optional 
arguments become standard.

For what its worth, C's preprocessor strips all whitespace from both the 
left and the right of the token-pasting "operator". What if the macro 
body contains an escaped identifier terminated by a \n and the next two 
characters in the input stream are tick-tick? There a lot of 
complications here that will require a set of carefully explored rules.

Eric Coffin
eric_coffin@mentor.com

Brad Pierce wrote:
> Steven,
>
> I looked through every occurrence of "white space" and "whitespace" in
> Draft 3a, but did not find the following rule --
>
>   
>> The LRM says that whitespace at the beginning and end of the arguments
>>     
> is removed.
>
> Maybe you were thinking of the LRM's discussion of macro formals?
>
> Nevertheless, your analysis of the problem is useful and interesting.  I
> agree with you that Verilog's \-and-whitespace escape brackets make the
> problem more complicated than C had to deal with. 
>
>   
>> And I don't think we can look at how C handles this with its
>>     
> stringizing
>   
>> and token-pasting operators, since C doesn't have escaped identifiers.
>>     
>
> -- Brad
>
> -----Original Message-----
> From: Steven Sharp [mailto:sharp@cadence.com] 
> Sent: Saturday, September 08, 2007 8:02 PM
> To: sv-bc@eda-stds.org; Brad.Pierce@synopsys.COM
> Subject: Re: [sv-bc] Trimming whitespace from macro actuals
>
>
>   
>> From: "Brad Pierce" <Brad.Pierce@synopsys.com>
>>     
>
>   
>> Is there any difference in meaning between the following macro 
>> invocations?
>>
>>   `M1(wire1,wire2)
>>   `M1( wire1 , wire2 )
>>
>> Are the whitespaces trimmed before invocation?
>>     
>
> Yes.  The LRM says that whitespace at the beginning and end of the
> arguments is removed.  So these are equivalent.  In Verilog, it didn't
> really matter much anyway, since whitespace was irrelevant outside
> string literals, and arguments are not substituted inside string
> literals.
>
> In SV, whitespace starts mattering.  The '" mechanism allows argument
> substitution inside string literals, so any whitespace included in the
> argument becomes visible.  The '' mechanism could also be affected by
> whitespace.
>
>   
>> What if the whitespaces are being used to terminate an escaped 
>> identifier?
>>
>>  `define paste(x,y) x``y
>>   assign `paste(\7@ ,8) = 1'b1;
>>
>> According to 3.7.1 in the V2005 LRM
>>
>>  "Neither the leading backslash character nor the terminating white 
>> space is considered to be part of the identifier."
>>
>> Another example --
>>
>>  assign `paste(\7@ ,\8@ ) = 1'b1;
>>     
>
>
> With escaped identifiers, we have to be careful about whether we are
> referring to the identifier name or the syntax used to specify that
> identifier name.  The terminating white space is not part of the
> identifier name, but it is part of the syntax used to specify that name.
>
> This starts to require macro expansion to be specified in more detail
> than was ever required before the '' operator was added.
>
> Does macro expansion substitute the exact text of the macro argument and
> then operate on that text?  Or does it tokenize the macro arguments
> somehow and operate on those tokens?  Or does it tokenize them and then
> convert back to text?
>
> Note that some form of token recognition is going on during macro
> argument processing.  It recognizes the comma separators between the
> arguments and the right parenthesis at the end.  It recognizes that a
> comma inside quotes is part of a string literal, not an argument
> separator, so it is essentially recognizing that a string literal is an
> atomic unit.  It recognizes that commas inside curly braces in a
> concatenation or inside parentheses in a function call are not argument
> separators.  It recognizes that the right parenthesis at the end of a
> function call is not the end of the macro invocation.
> It recognizes that a comma, quote, curly bracket or parenthesis inside
> an escaped identifier is part of the identifier, and not to be processed
> like an argument separator, or part of a string literal, concatenation
> or function argument.  But this could still be character-level
> processing, without forming higher-level tokens.
>
> In your first example,
>
>    assign `paste(\7@ ,8) = 1'b1; 
>
> If it is pure text processing, the space terminating the escaped
> identifier is stripped like any whitespace after an argument.  So you
> get
>
>   assign \7@8 = 1'b1;
>   
> This seems to work as you would expect.  But if you don't happen to
> provide any whitespace where you use the macro, things go wrong.  If you
> write
>
>   assign `paste(\7@ ,8)= 1'b1;
>   
> then you get
>
>   assign \7@8= 1'b1;
>   
> Which makes the = part of the escaped identifier and makes this a syntax
> error.
>
> In your second example,
>
>   assign `paste(\7@ ,\8@ ) = 1'b1;
>   
> pure text processing would give you
>
>   assign \7@\8@ = 1'b1;
>   
> Which makes the \ on the second escaped identifier become part of the
> name of the identifier, which probably isn't what you wanted.  You
> wanted the identifier name to be the concatenation of the two separate
> identifier names, "7@8@", which would be represented as \7@8@ .  I
> assume this is what you were exploring with these examples.
>
> If macro expansion is expected to preserve escaped identifiers in the
> macro arguments, some form of tokenizing seems required.  And if you
> want '' to be able to glue together escaped identifiers, then you cannot
> just substitute the text for the tokens and then delete the '' 
> characters.  In your second example, that would give
>
>   assign \7@ \8@ = 1'b1;
>   
> Which fails to glue the two identifiers together to create one
> identifier.
>
> To make that work, '' has to operate at the token level after the \7@
> and \8@ have been recognized as identifiers named "7@" and "8@", and
> gluing the identifier texts together into "7@8@".  Then if it is to be
> converted back into pure text again, the text representation for the
> identifier would need to be an escaped identifier.
>
> Alternately, a set of special case rules for text processing where
> escaped identifiers are concerned might be sufficient.  For example,
> don't trim the first trailing space after an escaped identifier in a
> macro argument.  But if a macro argument ending with white space is
> being substituted just before '', then remove the white space at that
> point.
>
> I haven't thought this through all the way.  But it does seem to me that
> the existing LRM text is inadequate.  I don't think this was an issue
> before SV added '" and ''.  And I don't think we can look at how C
> handles this with its stringizing and token-pasting operators, since C
> doesn't have escaped identifiers.
>
> Steven Sharp
> sharp@cadence.com
>
>
>   

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.