[sv-ec] Review of Mantis 890 (clocking blocks)

From: Jonathan Bromley <jonathan.bromley_at_.....> Date: Thu Sep 21 2006 - 08:11:37 PDT · This archive was generated by hypermail 2.1.8 : Thu Sep 21 2006 - 08:27:09 PDT

Since I was cast in the role of chief troublemaker on clocking
blocks at the last sv-ec meeting, I thought I'd try to live 
up to that...

Background
~~~~~~~~~~
At the last SV-EC meeting (Monday Sept 11) there was an incomplete
discussion of Mantis 890.  Mehdi very sensibly suggested that Doug
Warmke's proposal SV-890-3.pdf should be reviewed point by point.
This note attempts to do that.

Since I can't easily edit the PDF document, I've copied relevant
fragments of its text here with what I hope is self-evident markup;
my observations and proposed amendments are indented and have the
marginal mark [JB].  Apologies in advance for any inconvenience.

Many of my comments are "friendly amendments" - rewording, proposed
clarifications and so on.  I've tried to capture the sense of last
week's meeting as well as various other emails that went before it.

There is, I think, only one potentially controversial point, relating
to clause 15.12 where Doug proposed an addition to the text that I
find hard to accept.

In a nutshell, the difficulty is that clocking blocks work well only
in one specific use case: as a bridge (I think Arturo Salz called it
a "trampoline") between the scheduling regimes in a program and in
a design (modules and interfaces).  The rather complicated interaction
between Active, Reactive and NBA regions of the scheduler, together
with the sampling behaviour of clockings, makes this work reliably
and without races.  In short, a clocking block has two "ends" -
a "signal end" that hooks into design code, and a "testbench end"
that should be manipulated only by program code.  Any other
use model gives rise to many opportunities for races or unexpected
behaviour.

The offending proposal in SV-890-3 is a workaround to make clockings
behave sensibly when the "testbench end" is manipulated by module
code instead of program code.  It's been suggested that this matches
the sample() behaviour of covergroups, but I think that's a spurious
comparison; sampling a covergroup affects only the coverage data,
but updating a clocking block's sampled inputs could have extensive
knock-on throughout the rest of the testbench and I would need a
lot of convincing that this workaround assures freedom from races.
Furthermore, I suspect the proposal is completely broken in the
case of #0 input sampling; I have tried to discuss that issue in
more detail in the appropriate place below.

Thanks for your consideration.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Comments from Jonathan Bromley <jonathan.bromley@doulos.com>
on document SV-890-3.pdf associated with Mantis item 890
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.2 Clocking Block Declaration
[snip]

[JB] This change seems fine.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.10 Cycle delay
...
What constitutes a cycle is determined by the default clocking in effect
(see 15.11). If no default clocking has been specified for the current
module, interface, or program then the compiler shall issue an error.

Example:
## 5; // wait 5 cycles (clocking events) using the default clocking
## (j + 1); // wait j+1 cycles (clocking events) using the default clocking

<insert>
If a ## cycle delay operator is executed at a simulation time that does
not correspond to a default clocking event (perhaps due to the use of a #
delay control or an asynchronous @ event control), the processing of the
cycle delay is postponed until the time of the next default clocking
event. Thus a ##1 cycle delay shall always be guaranteed to wait at least
one full clock cycle.
</insert>

[JB] This formulation is mostly clear, but has some strange effects.
     (Once again I'm not the only one who's unhappy here; existing
     implementations don't fully match the described behaviour.)
     It leads to behaviour that is completely at odds with the usual
     behaviour of Verilog @ timing controls - if I say "@(posedge clk)"
     at a time that's halfway between two clock events, I expect
     to wait for half a cycle rather than 1.5 cycles.  And, in particular,
     it makes life very difficult if you want to do something on the
     very first clock event.  Surely if I write

       initial begin
         ##1 sig <= expr;

     my intent was that 'sig' should be driven at the FIRST clock, not
     the second?  I realise that it may now be too late to rescind this
     decision.  To rescue the situation, can we use ##0 to mean "wait
     until the current-or-next clocking event"?  If so, all is well
     (despite the discontinuity with regular @).

     There's a further ambiguity here.  If I use the clocking block's name
     as an event, using the @cb event control, do I get *exactly* the same
     behaviour as ##1?  I guess so, but, especially in view of the problems
     I outline above, I think this should be made explicit.

     Finally, using the phrase "default clocking event" in this context
     is clearly wrong.  If I say
       ##1 cb.out <= ...
     then the ##1 is a cycle of cb, which is not necessarily the same
     as a cycle of the default clocking.

     So, my conclusions: If we wish to keep the current proposals of
     SV-890-3 here,
     (1) it is essential that we explicitly define the behaviour of
         ##0, so that there's a way of reaching the next-or-current
         clocking event;
     (2) there should be a note clarifying the stark difference in
         behaviour between ## and the regular @ event control, and
         clearly stating the equivalence (if any) between @cb and ##1.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In 15.12, MODIFY the text as follows:

[JB] I have a number of issues with this, which I'll take one piece
     at a time...

15.12 Input sampling
All clocking block inputs (input or inout) are sampled at the
corresponding clocking event. If the input skew is not an explicit #0,
then the value sampled corresponds to the signal value at the Postponed
region of the time step skew time-units prior to the clocking event (see
Figure 15-1 in 15.3).
<strikeout>
If the input skew is an explicit #0, then the value sampled
corresponds to the signal value in the Observed region.
</strikeout>
If the input skew is an explicit #0, several additional considerations
shall govern input sampling. First, the value sampled corresponds to the
signal value in the Observed region.

[JB] OK so far.

<insert>
Next, when the clocking event occurs, the sampled value shall be
updated and available for reading the instant the clocking event
takes place and before any other statements are executed.

[JB] This new stipulation appears to be necessary to legitimize
     the approach taken in some vendors' verification methodologies
     that don't use program blocks for the test bench.  It apparently
     aims to sidestep the write/read race condition that pertains if
     you have a clocking whose clocking event is on a design variable
     and whose inputs are examined in design code.  Can we be confident
     that this new stipulation is (a) appropriate, (b) general?  It is
     almost equivalent to creating a new scheduler region (Pre-active?!).
     If we accept this new behaviour, it is absurd to accept the
     caveat that follows:

<insert>
Finally, if the clocking event occurs due to activity on a
program object, there is a race condition between the update
of the clocking block input's value and the execution of
program code that reads that value.
</insert>

[JB] The internal contradictions here are in my opinion insupportable.
     In effect it says:

       Clocking event on a design object, clocking inputs read
       in design code:
          NO RACE because of special treatment of clocking inputs.

       Clocking event on a program object, clocking inputs read
       in program code:
         RACE because update of the clocking input happens in
         the same scheduler region as reading of that input.

     There is a fundamental problem here.  Clocking inputs are updated
     as a result of occurrence of their clocking event; this is sure
     to race with reading of the clocking input, *unless* the clocking
     event is on a design variable but the clocking inputs are read in
     program code.  This is, as I understand it, precisely the scenario
     for which clockings were originally designed and in which they
     can be expected to work reliably without races.  I don't really
     understand the need to shoe-horn them into other scenarios where
     straightforward module code would do just as well.

     I also completely fail to understand how this approach can yield
     meaningful behaviour when #0 input sampling is specified, because
     it implies that the clocking block's sampled input values should
     be updated BEFORE the Observed region where the sampling is
     specified to take place!  I discuss this in more detail below.

     I would prefer to see this new stipulation (clocking inputs update
     before anything else happens) completely removed, and in its place
     a warning added that clocking inputs can be read in a race-free way
     only if all the following conditions are met:
     * the clocking event occurs in the design regions of the scheduler
     * the clocking input observes a design net or variable
     * the clocking input is read only from code running in the program
       regions of the scheduler

     I also wish to see a note to the effect that input #0 sampling
     has unusual behaviour.  It samples its input signal *after* the
     design regions have iterated, and therefore (in most cases)
     *after* the clocking event has occurred.  It seems to me that
     this works sensibly only if the sampled "input #0" is read in
     program code rather than in design code.  Reading it from design
     code will introduce an additional cycle's delay before the result
     is visible.

     input #0 and output #0 appear to have been intended to provide
     the useful effect of giving, to signals read or driven through a
     clocking, exactly the same timing behaviour as you would see
     from a program that reads and drives those signals without an
     intervening clocking block.  Insisting that input samples be updated
     instantaneously on the clocking event will break that model, since
     the sampled value will be updated before the Observe region; this
     update will presumably obtain the value that was sampled in the
     Observe region of the *previous* clocking event's timestep.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.14 Synchronous drives

Clocking block outputs (output or inout) are used to drive values onto
their corresponding signals, but at a specified time. That is, the
corresponding signal changes value at the indicated clocking event as
modified by the output skew.
<insert>
For zero skew clocking block outputs with no cycle delay, synchronous
drives shall schedule new values in the NBA region of the current time
unit. This has the effect of causing the big loop in Figure 9-1 to iterate
from the reactive/re-inactive regions back into the NBA region of the
current time unit. For clocking block outputs with non-zero skew or non-
zero cycle delay, the corresponding signal shall be scheduled to change
value in the NBA region of a future time unit.
</insert>

Examples:
[snip]
Regardless of when the drive statement executes (due to event_count
delays), the driven value is assigned to the corresponding signal
only at the time specified by the output skew.

[JB] In the last sentence, the parenthetical remark is entirely
     bewildering and should be removed.  In fact, given the various
     other changes and clarifications proposed, I suspect the whole
     sentence could be removed without loss.

<insert>
It is possible for a drive statement to execute asynchronously at a time
that does not correspond to its associated clocking event. Such drive
statements shall be processed as if they had executed at the time of the
next clocking event. Any values read on the right hand side of the drive
statement are read immediately, but the processing of the statement is
delayed until the time of the next clocking event. This has implications
on synchronous drive resolution (See 15.14.2) and ## cycle delay
scheduling.
Note: The synchronous drive syntax does not allow intra-assignment delays
like a regular procedural assignment does.

[JB] This is good.  However, with apologies for the pedantry,
     can we please reword the final "Note" sentence as follows?

        Note: Unlike blocking and nonblocking procedural
        assignment, the synchronous drive syntax does not
        allow intra-assignment delays.

15.14.1 Drives and nonblocking assignments
<strikeout>
Synchronous signal drives are processed as nonblocking assignments.
</strikeout>
<insert>
Note: While the non-blocking assignment operator is used in the
synchronous drive syntax, these assignments are different than non-
blocking variable assignments. The intention of using this operator is to
remind readers of certain similarities shared by synchronous drives and
non-blocking assignments. One main similarity is that variables and wires
connected to clocking block outputs and inouts are driven in the NBA
region.
</insert>
Another key NBA-like feature of inout clocking block variables signals and
synchronous drives is that a drive does not change the clocking block
input. This is because reading the input always yields the last sampled
value, and not the driven value.

[JB] Excellent.

<insert>
One difference between synchronous drives and classic NBA assignments is
that transport delay is not performed by synchronous drives (except in the
presence of the intra-assignment cycle delay operator). Another key
difference is drive value resolution, discussed in the next section.
</insert>

[JB] It seems to me that synchronous drive *does* perform transport
     delay, albeit in a rather unusual way:  first there is a transport
     delay from the execution of the drive to its maturation, and then
     there is a second transport delay associated with the clocking
     output's skew.  I suspect it would be better to remove entirely
     the sentence about transport delay.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

15.14.2 Drive value resolution
...
The driven value of nibble is 4'b0xx1, regardless of whether nibble is a
reg or a wire.
<insert>
If a given clocking output is driven by more than one assignment in the
same time unit, but the assignments are scheduled to mature at different
future times due to the use of cycle delay, then no drive value resolution
shall be performed.  The drives shall be applied with classic Verilog NBA
transport delay semantics in this case.
If a given clocking output is driven asynchronously at different time
units within the same clock cycle, then drive value resolution is performed
as if all such assignments were made at the same time unit in which the next
clocking event occurs.
</insert>

[JB] I don't think this is as helpful as it could be.  It describes
     the behaviour from the point of view of the clocking drive,
     whereas it is clearer and more general to describe it from the
     point of view of the cycle in which the assignment(s) mature.
     I'd like to suggest the following re-wording, which is somewhat
     heavy going but seems to me to be more precise:

     <proposed LRM text>
     Assignment to a clocking output using the syntax
       clocking_name.output_name <= [##N] expr;
     is known as a clocking drive.  A clocking drive shall take
     effect (mature) at a current or future clocking event, as follows:
     * If the intra-assignment cycle delay ##N is absent or N
       is zero, the drive shall mature at the next occurrence of
       the clocking event, or immediately if the clocking
       event has occurred in the current timestep.
     * If N is greater than zero, the drive shall mature at
       the corresponding future clocking event.
     In this way, all clocking drives shall mature in the timestep of
     a clocking event of their clocking block, even if they executed
     asynchronously to that clocking event.
     At each clocking event of a clocking block, each clocking output
     of that clocking block shall be treated as follows:
     (a) _Scheduling of assignment to the clocking output_
         If one or more clocking drive to that output matures on the
         current clocking event, a single nonblocking assignment to that
         output shall be scheduled for the current or future timestep
         specified by its output skew.
         If no clocking drive to that output matures on the current
         clocking event, no such assignment to that output shall be scheduled.
     (b) _Value assigned to the clocking output_
         If exactly one clocking drive to that output matures, the value
         assigned as described in (a) above shall be the value evaluated
         by that clocking drive when it executed.  However, if two or more
         clocking drives to that output mature, the value assigned shall
         be determined by resolving all those drives' values, as if each of
         those values had been driven on to the same net of wire type by a
         continuous assign statement of (strong0, strong1) drive strength.
     </proposed LRM text>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[JB] I'm happy with the remaining proposed modifications to clause 15.
     However, the implicit NBAs described in 15.14.2 above have an impact
     on clause 16 (program) which should be mentioned also in clause 15.

     Clocking outputs are updated by NBA (or, in the case of clocking
     outputs that are nets, by continuous assign from an implicit variable
     that's updated by NBA).  Consequently, as I understand it,

         ** it can in no circumstances be legal for any
            program variable to be a clocking output    **

     because that would be equivalent to writing a program variable
     by NBA, and we know that to be a bad idea.

     And, for the avoidance of any argument, let's note that a program's
     output port that is a variable is a program variable, not a design
     variable.  You *can* get design variables visible through ports of
     a program, by passing them through ref ports, so this is not a
     limitation.  But we don't want the program to be able to read,
     directly, one of its own variables that has been updated in the
     NBA region by a clocking block - even if that variable happens
     also to be one of its output ports that's driving a design signal.

-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223                   Email: jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573                           Web: http://www.doulos.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.