RE: [sv-ec] Review of Mantis 890 (clocking blocks)

From: Warmke, Doug <doug_warmke_at_.....> Date: Thu Sep 21 2006 - 23:43:58 PDT · This archive was generated by hypermail 2.1.8 : Thu Sep 21 2006 - 23:44:25 PDT

SV-EC,

Thanks very much to everyone for giving 890 the attention it's
been craving for a long time.  This is all good discussion.

To help simplify processing of the issues, I split the Mantis item
into two.  890 will continue to center on clocking blocks, and the
new 1604 has been created to deal with programs.

More responses forthcoming...
Doug

> -----Original Message-----
> From: owner-sv-ec@server.eda.org 
> [mailto:owner-sv-ec@server.eda.org] On Behalf Of Jonathan Bromley
> Sent: Thursday, September 21, 2006 8:12 AM
> To: sv-ec@server.eda.org
> Subject: [sv-ec] Review of Mantis 890 (clocking blocks)
> 
> Since I was cast in the role of chief troublemaker on clocking
> blocks at the last sv-ec meeting, I thought I'd try to live 
> up to that...
> 
> 
> Background
> ~~~~~~~~~~
> At the last SV-EC meeting (Monday Sept 11) there was an incomplete
> discussion of Mantis 890.  Mehdi very sensibly suggested that Doug
> Warmke's proposal SV-890-3.pdf should be reviewed point by point.
> This note attempts to do that.
> 
> Since I can't easily edit the PDF document, I've copied relevant
> fragments of its text here with what I hope is self-evident markup;
> my observations and proposed amendments are indented and have the
> marginal mark [JB].  Apologies in advance for any inconvenience.
> 
> Many of my comments are "friendly amendments" - rewording, proposed
> clarifications and so on.  I've tried to capture the sense of last
> week's meeting as well as various other emails that went before it.
> 
> There is, I think, only one potentially controversial point, relating
> to clause 15.12 where Doug proposed an addition to the text that I
> find hard to accept.
> 
> In a nutshell, the difficulty is that clocking blocks work well only
> in one specific use case: as a bridge (I think Arturo Salz called it
> a "trampoline") between the scheduling regimes in a program and in
> a design (modules and interfaces).  The rather complicated interaction
> between Active, Reactive and NBA regions of the scheduler, together
> with the sampling behaviour of clockings, makes this work reliably
> and without races.  In short, a clocking block has two "ends" -
> a "signal end" that hooks into design code, and a "testbench end"
> that should be manipulated only by program code.  Any other
> use model gives rise to many opportunities for races or unexpected
> behaviour.
> 
> The offending proposal in SV-890-3 is a workaround to make clockings
> behave sensibly when the "testbench end" is manipulated by module
> code instead of program code.  It's been suggested that this matches
> the sample() behaviour of covergroups, but I think that's a spurious
> comparison; sampling a covergroup affects only the coverage data,
> but updating a clocking block's sampled inputs could have extensive
> knock-on throughout the rest of the testbench and I would need a
> lot of convincing that this workaround assures freedom from races.
> Furthermore, I suspect the proposal is completely broken in the
> case of #0 input sampling; I have tried to discuss that issue in
> more detail in the appropriate place below.
> 
> Thanks for your consideration.
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~
> Comments from Jonathan Bromley <jonathan.bromley@doulos.com>
> on document SV-890-3.pdf associated with Mantis item 890
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~
> 
> 15.2 Clocking Block Declaration
> [snip]
> 
> [JB] This change seems fine.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~
> 
> 15.10 Cycle delay
> ...
> What constitutes a cycle is determined by the default 
> clocking in effect
> (see 15.11). If no default clocking has been specified for the current
> module, interface, or program then the compiler shall issue an error.
> 
> Example:
> ## 5; // wait 5 cycles (clocking events) using the default clocking
> ## (j + 1); // wait j+1 cycles (clocking events) using the 
> default clocking
> 
> <insert>
> If a ## cycle delay operator is executed at a simulation time 
> that does
> not correspond to a default clocking event (perhaps due to 
> the use of a #
> delay control or an asynchronous @ event control), the 
> processing of the
> cycle delay is postponed until the time of the next default clocking
> event. Thus a ##1 cycle delay shall always be guaranteed to 
> wait at least
> one full clock cycle.
> </insert>
> 
> [JB] This formulation is mostly clear, but has some strange effects.
>      (Once again I'm not the only one who's unhappy here; existing
>      implementations don't fully match the described behaviour.)
>      It leads to behaviour that is completely at odds with the usual
>      behaviour of Verilog @ timing controls - if I say 
> "@(posedge clk)"
>      at a time that's halfway between two clock events, I expect
>      to wait for half a cycle rather than 1.5 cycles.  And, 
> in particular,
>      it makes life very difficult if you want to do something on the
>      very first clock event.  Surely if I write
> 
>        initial begin
>          ##1 sig <= expr;
> 
>      my intent was that 'sig' should be driven at the FIRST clock, not
>      the second?  I realise that it may now be too late to 
> rescind this
>      decision.  To rescue the situation, can we use ##0 to mean "wait
>      until the current-or-next clocking event"?  If so, all is well
>      (despite the discontinuity with regular @).
> 
>      There's a further ambiguity here.  If I use the clocking 
> block's name
>      as an event, using the @cb event control, do I get 
> *exactly* the same
>      behaviour as ##1?  I guess so, but, especially in view 
> of the problems
>      I outline above, I think this should be made explicit.
> 
>      Finally, using the phrase "default clocking event" in 
> this context
>      is clearly wrong.  If I say
>        ##1 cb.out <= ...
>      then the ##1 is a cycle of cb, which is not necessarily the same
>      as a cycle of the default clocking.
> 
>      So, my conclusions: If we wish to keep the current proposals of
>      SV-890-3 here,
>      (1) it is essential that we explicitly define the behaviour of
>          ##0, so that there's a way of reaching the next-or-current
>          clocking event;
>      (2) there should be a note clarifying the stark difference in
>          behaviour between ## and the regular @ event control, and
>          clearly stating the equivalence (if any) between @cb and ##1.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~
> 
> In 15.12, MODIFY the text as follows:
> 
> [JB] I have a number of issues with this, which I'll take one piece
>      at a time...
> 
> 15.12 Input sampling
> All clocking block inputs (input or inout) are sampled at the
> corresponding clocking event. If the input skew is not an explicit #0,
> then the value sampled corresponds to the signal value at the 
> Postponed
> region of the time step skew time-units prior to the clocking 
> event (see
> Figure 15-1 in 15.3).
> <strikeout>
> If the input skew is an explicit #0, then the value sampled
> corresponds to the signal value in the Observed region.
> </strikeout>
> If the input skew is an explicit #0, several additional considerations
> shall govern input sampling. First, the value sampled 
> corresponds to the
> signal value in the Observed region.
> 
> [JB] OK so far.
> 
> <insert>
> Next, when the clocking event occurs, the sampled value shall be
> updated and available for reading the instant the clocking event
> takes place and before any other statements are executed.
> 
> [JB] This new stipulation appears to be necessary to legitimize
>      the approach taken in some vendors' verification methodologies
>      that don't use program blocks for the test bench.  It apparently
>      aims to sidestep the write/read race condition that pertains if
>      you have a clocking whose clocking event is on a design variable
>      and whose inputs are examined in design code.  Can we be 
> confident
>      that this new stipulation is (a) appropriate, (b) general?  It is
>      almost equivalent to creating a new scheduler region 
> (Pre-active?!).
>      If we accept this new behaviour, it is absurd to accept the
>      caveat that follows:
> 
> <insert>
> Finally, if the clocking event occurs due to activity on a
> program object, there is a race condition between the update
> of the clocking block input's value and the execution of
> program code that reads that value.
> </insert>
> 
> [JB] The internal contradictions here are in my opinion insupportable.
>      In effect it says:
> 
>        Clocking event on a design object, clocking inputs read
>        in design code:
>           NO RACE because of special treatment of clocking inputs.
> 
>        Clocking event on a program object, clocking inputs read
>        in program code:
>          RACE because update of the clocking input happens in
>          the same scheduler region as reading of that input.
> 
>      There is a fundamental problem here.  Clocking inputs are updated
>      as a result of occurrence of their clocking event; this is sure
>      to race with reading of the clocking input, *unless* the clocking
>      event is on a design variable but the clocking inputs are read in
>      program code.  This is, as I understand it, precisely 
> the scenario
>      for which clockings were originally designed and in which they
>      can be expected to work reliably without races.  I don't really
>      understand the need to shoe-horn them into other scenarios where
>      straightforward module code would do just as well.
> 
>      I also completely fail to understand how this approach can yield
>      meaningful behaviour when #0 input sampling is specified, because
>      it implies that the clocking block's sampled input values should
>      be updated BEFORE the Observed region where the sampling is
>      specified to take place!  I discuss this in more detail below.
> 
>      I would prefer to see this new stipulation (clocking 
> inputs update
>      before anything else happens) completely removed, and in 
> its place
>      a warning added that clocking inputs can be read in a 
> race-free way
>      only if all the following conditions are met:
>      * the clocking event occurs in the design regions of the 
> scheduler
>      * the clocking input observes a design net or variable
>      * the clocking input is read only from code running in 
> the program
>        regions of the scheduler
> 
>      I also wish to see a note to the effect that input #0 sampling
>      has unusual behaviour.  It samples its input signal *after* the
>      design regions have iterated, and therefore (in most cases)
>      *after* the clocking event has occurred.  It seems to me that
>      this works sensibly only if the sampled "input #0" is read in
>      program code rather than in design code.  Reading it from design
>      code will introduce an additional cycle's delay before the result
>      is visible.
> 
>      input #0 and output #0 appear to have been intended to provide
>      the useful effect of giving, to signals read or driven through a
>      clocking, exactly the same timing behaviour as you would see
>      from a program that reads and drives those signals without an
>      intervening clocking block.  Insisting that input 
> samples be updated
>      instantaneously on the clocking event will break that 
> model, since
>      the sampled value will be updated before the Observe region; this
>      update will presumably obtain the value that was sampled in the
>      Observe region of the *previous* clocking event's timestep.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~
> 
> 15.14 Synchronous drives
> 
> Clocking block outputs (output or inout) are used to drive values onto
> their corresponding signals, but at a specified time. That is, the
> corresponding signal changes value at the indicated clocking event as
> modified by the output skew.
> <insert>
> For zero skew clocking block outputs with no cycle delay, synchronous
> drives shall schedule new values in the NBA region of the current time
> unit. This has the effect of causing the big loop in Figure 
> 9-1 to iterate
> from the reactive/re-inactive regions back into the NBA region of the
> current time unit. For clocking block outputs with non-zero 
> skew or non-
> zero cycle delay, the corresponding signal shall be scheduled 
> to change
> value in the NBA region of a future time unit.
> </insert>
> 
> Examples:
> [snip]
> Regardless of when the drive statement executes (due to event_count
> delays), the driven value is assigned to the corresponding signal
> only at the time specified by the output skew.
> 
> [JB] In the last sentence, the parenthetical remark is entirely
>      bewildering and should be removed.  In fact, given the various
>      other changes and clarifications proposed, I suspect the whole
>      sentence could be removed without loss.
> 
> <insert>
> It is possible for a drive statement to execute 
> asynchronously at a time
> that does not correspond to its associated clocking event. Such drive
> statements shall be processed as if they had executed at the 
> time of the
> next clocking event. Any values read on the right hand side 
> of the drive
> statement are read immediately, but the processing of the statement is
> delayed until the time of the next clocking event. This has 
> implications
> on synchronous drive resolution (See 15.14.2) and ## cycle delay
> scheduling.
> Note: The synchronous drive syntax does not allow 
> intra-assignment delays
> like a regular procedural assignment does.
> 
> [JB] This is good.  However, with apologies for the pedantry,
>      can we please reword the final "Note" sentence as follows?
> 
>         Note: Unlike blocking and nonblocking procedural
>         assignment, the synchronous drive syntax does not
>         allow intra-assignment delays.
> 
> 15.14.1 Drives and nonblocking assignments
> <strikeout>
> Synchronous signal drives are processed as nonblocking assignments.
> </strikeout>
> <insert>
> Note: While the non-blocking assignment operator is used in the
> synchronous drive syntax, these assignments are different than non-
> blocking variable assignments. The intention of using this 
> operator is to
> remind readers of certain similarities shared by synchronous 
> drives and
> non-blocking assignments. One main similarity is that 
> variables and wires
> connected to clocking block outputs and inouts are driven in the NBA
> region.
> </insert>
> Another key NBA-like feature of inout clocking block 
> variables signals and
> synchronous drives is that a drive does not change the clocking block
> input. This is because reading the input always yields the 
> last sampled
> value, and not the driven value.
> 
> [JB] Excellent.
> 
> <insert>
> One difference between synchronous drives and classic NBA 
> assignments is
> that transport delay is not performed by synchronous drives 
> (except in the
> presence of the intra-assignment cycle delay operator). Another key
> difference is drive value resolution, discussed in the next section.
> </insert>
> 
> [JB] It seems to me that synchronous drive *does* perform transport
>      delay, albeit in a rather unusual way:  first there is a 
> transport
>      delay from the execution of the drive to its maturation, and then
>      there is a second transport delay associated with the clocking
>      output's skew.  I suspect it would be better to remove entirely
>      the sentence about transport delay.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~
> 
> 15.14.2 Drive value resolution
> ...
> The driven value of nibble is 4'b0xx1, regardless of whether 
> nibble is a
> reg or a wire.
> <insert>
> If a given clocking output is driven by more than one 
> assignment in the
> same time unit, but the assignments are scheduled to mature 
> at different
> future times due to the use of cycle delay, then no drive 
> value resolution
> shall be performed.  The drives shall be applied with classic 
> Verilog NBA
> transport delay semantics in this case.
> If a given clocking output is driven asynchronously at different time
> units within the same clock cycle, then drive value 
> resolution is performed
> as if all such assignments were made at the same time unit in 
> which the next
> clocking event occurs.
> </insert>
> 
> [JB] I don't think this is as helpful as it could be.  It describes
>      the behaviour from the point of view of the clocking drive,
>      whereas it is clearer and more general to describe it from the
>      point of view of the cycle in which the assignment(s) mature.
>      I'd like to suggest the following re-wording, which is somewhat
>      heavy going but seems to me to be more precise:
> 
>      <proposed LRM text>
>      Assignment to a clocking output using the syntax
>        clocking_name.output_name <= [##N] expr;
>      is known as a clocking drive.  A clocking drive shall take
>      effect (mature) at a current or future clocking event, 
> as follows:
>      * If the intra-assignment cycle delay ##N is absent or N
>        is zero, the drive shall mature at the next occurrence of
>        the clocking event, or immediately if the clocking
>        event has occurred in the current timestep.
>      * If N is greater than zero, the drive shall mature at
>        the corresponding future clocking event.
>      In this way, all clocking drives shall mature in the timestep of
>      a clocking event of their clocking block, even if they executed
>      asynchronously to that clocking event.
>      At each clocking event of a clocking block, each clocking output
>      of that clocking block shall be treated as follows:
>      (a) _Scheduling of assignment to the clocking output_
>          If one or more clocking drive to that output matures on the
>          current clocking event, a single nonblocking 
> assignment to that
>          output shall be scheduled for the current or future timestep
>          specified by its output skew.
>          If no clocking drive to that output matures on the current
>          clocking event, no such assignment to that output 
> shall be scheduled.
>      (b) _Value assigned to the clocking output_
>          If exactly one clocking drive to that output 
> matures, the value
>          assigned as described in (a) above shall be the 
> value evaluated
>          by that clocking drive when it executed.  However, 
> if two or more
>          clocking drives to that output mature, the value 
> assigned shall
>          be determined by resolving all those drives' values, 
> as if each of
>          those values had been driven on to the same net of 
> wire type by a
>          continuous assign statement of (strong0, strong1) 
> drive strength.
>      </proposed LRM text>
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~
> 
> [JB] I'm happy with the remaining proposed modifications to clause 15.
>      However, the implicit NBAs described in 15.14.2 above 
> have an impact
>      on clause 16 (program) which should be mentioned also in 
> clause 15.
> 
>      Clocking outputs are updated by NBA (or, in the case of clocking
>      outputs that are nets, by continuous assign from an 
> implicit variable
>      that's updated by NBA).  Consequently, as I understand it,
> 
>          ** it can in no circumstances be legal for any
>             program variable to be a clocking output    **
> 
>      because that would be equivalent to writing a program variable
>      by NBA, and we know that to be a bad idea.
> 
>      And, for the avoidance of any argument, let's note that 
> a program's
>      output port that is a variable is a program variable, 
> not a design
>      variable.  You *can* get design variables visible 
> through ports of
>      a program, by passing them through ref ports, so this is not a
>      limitation.  But we don't want the program to be able to read,
>      directly, one of its own variables that has been updated in the
>      NBA region by a clocking block - even if that variable happens
>      also to be one of its output ports that's driving a 
> design signal.
> 
> -- 
> Jonathan Bromley, Consultant
> 
> DOULOS - Developing Design Know-how
> VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
> 
> Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, 
> Hampshire, BH24 1AW, UK
> Tel: +44 (0)1425 471223                   Email: 
> jonathan.bromley@doulos.com
> Fax: +44 (0)1425 471573                           Web: 
> http://www.doulos.com
> 
> The contents of this message may contain personal views which 
> are not the views of Doulos Ltd., unless specifically stated.
> 
>