Re: [sv-bc] Possible SV macro expansion algorithm

From: Greg Jaxon <Greg.Jaxon_at_.....> Date: Tue Nov 20 2007 - 21:49:45 PST · This archive was generated by hypermail 2.1.8 : Tue Nov 20 2007 - 21:50:37 PST

Coffin, Eric wrote:
> Embedded comments start with [ecoffin:]
> 
>> Coffin, Eric wrote:
> [ecoffin:]  The ASCII/non-ASCII representation has greater significance
> then just the `" feature.

It can also be significant for ( `` ) token gluing, depending on how macro
bodies undergoing substitution of actuals for formals are tokenized
(either accepting or ignoring escaped identifiers).  But your example
illustrates my concern precisely.

> [ecoffin:]  The ongoing dialog about white space (what is
> it, when to trim it, what function it has) comes into play.  For
> example, the LRM states (1800/D4 section 5.6.1) that for an escaped
> identifiers the trailing white space is not a part of the identifier
> itself.  Knowing how to treat white space and how to represent an
> escaped identifier in a macro expansion are important.  Consider the
> following trio of macros and their usage.  What are reasonable expansions?
> 
> `define escapedIdent \Tuesday
> `define variant_A `"Tuesday`"
> `define variant_B `"`escapedIdent`"
> string S1 = `variant_A;
> string S2 = `variant_B;
> 
> Most likely SV users would agree that S1's initializer is "Tuesday", but
> what about S2's initializer?  Should it be "Tuesday", "\Tuesday ",
> "\Tuesday", or a macro expansion error?  The answer depends upon if you
> treat the macro bodies as ASCII, and thus maintain the escaped
> identifier's leading slash and trailing whitespace, or if you treat the
> macro bodies as lists of tokens.

Because of `", these choices affect the bytes in S2.

  I don't see a plausible macro /expansion/ error, unless `variant_A is
  undefined after \Tuesday consumes the newline leaving no terminator
  for the first macro.  (Yuck!)  Maybe you mean macro definition error.

  The line "`define escapedIdent \Tuesday" only needs to be fully "tokenized"
  up to the character after the macro identifier.  If this introduces a formal
  argument list, then that list is of course tokenized, but here we can
  choose NOT to accept full Verilog "identifier"s, we are free to describe
  more classical C-style preprocessor identifiers.   The reasons one wants
  escaped identifiers for the physical components being modeled really don't
  apply to the preprocessing language.  Allowing a many-to-one mapping
  from text strings to token sequences would always be problematic, since
  the `" rules seem to ask for fairly strict text substitution.  And my final
  argument for a dedicated "macro_formal_arg_identifier" is that it leaves no
  doubt that there is exactly one point in your algorithm where the macro
  body is tokenized to look for just a few traditionally delimited words
    - not to activate a lot of reduction rules best left to the core parser, and
    - not to allow macro actuals to reference macro formals.

  This view favors the expansion "\Tuesday".  It is neutral to negative on the
  subject of trimming of leading whitespace, and it is negative on trimming
  trailing whitespace.   This is the classical view that macro processing manipulates
  text and knows very little about the language into which that text is being forged.
  To preserve generality, it wants to allow you to juxtapose and glue text
  together before the core language even tokenizes it.  Similar concerns then apply to
  macro invocations, where the actual arguments are also not fully tokenized,

>>> C++ treats macro expansion as a text-to-token transformation.

  This is a more advanced view integrating the meta and object language in
  a way that SV users probably also expect - ignoring all monster cases.

  C++ has a table of many-to-one spellings of their language tokens.  I
  don't know if it "canonicalizes" all token streams before expanding
  macros, or leaves it raw, or just squeezes whitespace down to
  a minimum.

  If SV were to fully tokenize macro formal arg lists and macro bodies,
  it would have to answer a question C++ doesn't confront: many-to-one
  spellings for identifiers.   `define escFormal( \F[1] , \Tuesday ) \
  `"Tuesday F[1] \Tuesday \F[1] `"

  If the formal list and the body are "tokenized" for SV parsing, we might get

     `escFormal(A,B)  expanding to  "B F[1] BA"

  SV has a little bit of context-dependent lexical analysis that could complicate
  this.  The string "\n " and the quoted escaped identifier `"\n `" bear watching.

  There is also a danger of double tokenizing to be avoided.  Notice that the
  canonical form of an escaped identifier can appear in a context where it will
  look like another escaped identifier.  Will such things /always/ be reduced again,
  or would the user have to "``" glue it to a trailing whitespace-character?

  Multiple token reductions could come from two sources:

    1) After token gluing, something has to "retokenize" the graft to see what
       congealed.

    2) If the macro actuals are not tokenized before substitution, they'll
       need to be tokenized before parsing.  Your system is very careful to
       arrange BOTH treatments: literal substitution for `` operands and
       pre-tokenizing for non-glued expansions.

  So when your system is used:  `escFormal(\\once ,\\twice )    (as defined above)
  expands to:  "\twice F[1] \twice\once".   If the body of escFormal HAD NOT
  begun and ended with `", the expansion would not be a single string token, but
  a sequence of seven tokens:
       \\twice
       F
       [
       1
       ]
       \\twice
       \\once

  i.e. NO additional reduction of escaped identifiers occurs.

  The motivation for using this text-to-token approach has to be to reach
  a useful definition for token gluing.  So I think we need to work a hard
  example.  When one operand is an already recognized token and the other
  is raw actual argument text I think that the recognized token reverts to
  its original text form before the compound is retokenized.  If a doubly
  escaped identifier is the operand of a gluing operator with a macro formal
           \\twice ``macro_formal
  and the macro's actual is [2], I believe we want one token (not two or three)
  as the result and it should be \\twice[2] .  This cannot happen with the
  raw text methods, and it won't happen correctly unless your algorithm makes
  it explicit.  But by considering token streams and choosing this nice
  ordering of immediate and deferred expansions you have the descriptive
  tools to build this the right way.  That /might/ involve special treatment
  of the whitespace in the macro actual adjacent to the `` glue operator,
  but too many special rules raise implementation costs.

  Hybrids of these opposite viewpoints are possible, but seem arbitrary and confusing
  to me.  Multiple reductions of escape sequences must be avoided - they cannot
  happen INSIDE the fixed-point expansion loop.  Your algorithm makes the advanced
  approach fairly safe and as intuitive as this topic will even be (i.e. just barely).

Good work,
Greg

>>>     
>>
>> Is your proposal (below) different from C++ pre-processing in
>> any major respect?
>>   
> 
> [ecoffin:] I tried to follow C++'s treatment of macro expansion as
> closely as I could.
>> In the "... looking for identifiers matching formal argument names"
>> activity, there is an implied tokenization of the unexpanded body.
>> We need to specify the rules for that tokenization, in particular
>> whether escaped identifiers are to be recognized.
>>
>> On first review, though, this looks a lot better than the existing
>> section (23.2 of 1800-2005)!
>>
>> Greg
>>
>>
>>   
>>> *********************************************************************
>>>
>>> Here is a rough outline of a possible way to expand macros that might
>>> give some
>>> consistency to the various SV implementations out there.
>>>
>>> Order of actions to expand a macro:
>>>
>>>  - After the macro use has been identified in the SV source text,
>>>    gather the use's actual arguments.
>>>
>>>  - Independently expand all actual arguments, but do not substitute
>>>    them into the macro body.  If the macro use did not specify an
>>>    actual and a default value was specified then expand the default
>>>    text.  Some SV implementations first expand and then substitute,
>>>    while others do not.  Note that all arguments should be expanded
>>>    even if they are not used within the macro body.
>>>
>>>  - Walk through the macro body looking for identifiers matching
>>>    formal argument names.  Replace any macro formal argument with its
>>>    expanded actual text, unless the macro formal is adjacent to a
>>>    tick-tick ('').  If the formal arg is next to a tick-tick, then
>>>    literally substitute the (unexpanded) actual text for the formal arg.
>>>
>>>  - do {
>>>      - Perform token-pasting upon the expansion's body.  Token
>>>        pasting should have no effect upon the `" and the `\`" macro
>>>        operators.  Furthermore, token pasting ignores any white space,
>>>        and will not paste comments, nor paste across comments.
>>>      - Rescan the resulting body for any more macros to expand.
>>>        Expand them.  Do not expand `" or `\`".
>>>      } while the expansion body changes
>>>
>>>  - Expand the special macro-operators, tick-quote `" and
>>> tick-slash-tick-quote `\`"
>>>
>>> -Eric
>>>
>>>
>>>
>>>
>>>
>>>
>>>     
>>
>>   

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.