Since I was cast in the role of chief troublemaker on clocking blocks at the last sv-ec meeting, I thought I'd try to live up to that... Background ~~~~~~~~~~ At the last SV-EC meeting (Monday Sept 11) there was an incomplete discussion of Mantis 890. Mehdi very sensibly suggested that Doug Warmke's proposal SV-890-3.pdf should be reviewed point by point. This note attempts to do that. Since I can't easily edit the PDF document, I've copied relevant fragments of its text here with what I hope is self-evident markup; my observations and proposed amendments are indented and have the marginal mark [JB]. Apologies in advance for any inconvenience. Many of my comments are "friendly amendments" - rewording, proposed clarifications and so on. I've tried to capture the sense of last week's meeting as well as various other emails that went before it. There is, I think, only one potentially controversial point, relating to clause 15.12 where Doug proposed an addition to the text that I find hard to accept. In a nutshell, the difficulty is that clocking blocks work well only in one specific use case: as a bridge (I think Arturo Salz called it a "trampoline") between the scheduling regimes in a program and in a design (modules and interfaces). The rather complicated interaction between Active, Reactive and NBA regions of the scheduler, together with the sampling behaviour of clockings, makes this work reliably and without races. In short, a clocking block has two "ends" - a "signal end" that hooks into design code, and a "testbench end" that should be manipulated only by program code. Any other use model gives rise to many opportunities for races or unexpected behaviour. The offending proposal in SV-890-3 is a workaround to make clockings behave sensibly when the "testbench end" is manipulated by module code instead of program code. It's been suggested that this matches the sample() behaviour of covergroups, but I think that's a spurious comparison; sampling a covergroup affects only the coverage data, but updating a clocking block's sampled inputs could have extensive knock-on throughout the rest of the testbench and I would need a lot of convincing that this workaround assures freedom from races. Furthermore, I suspect the proposal is completely broken in the case of #0 input sampling; I have tried to discuss that issue in more detail in the appropriate place below. Thanks for your consideration. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Comments from Jonathan Bromley <jonathan.bromley@doulos.com> on document SV-890-3.pdf associated with Mantis item 890 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15.2 Clocking Block Declaration [snip] [JB] This change seems fine. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15.10 Cycle delay ... What constitutes a cycle is determined by the default clocking in effect (see 15.11). If no default clocking has been specified for the current module, interface, or program then the compiler shall issue an error. Example: ## 5; // wait 5 cycles (clocking events) using the default clocking ## (j + 1); // wait j+1 cycles (clocking events) using the default clocking <insert> If a ## cycle delay operator is executed at a simulation time that does not correspond to a default clocking event (perhaps due to the use of a # delay control or an asynchronous @ event control), the processing of the cycle delay is postponed until the time of the next default clocking event. Thus a ##1 cycle delay shall always be guaranteed to wait at least one full clock cycle. </insert> [JB] This formulation is mostly clear, but has some strange effects. (Once again I'm not the only one who's unhappy here; existing implementations don't fully match the described behaviour.) It leads to behaviour that is completely at odds with the usual behaviour of Verilog @ timing controls - if I say "@(posedge clk)" at a time that's halfway between two clock events, I expect to wait for half a cycle rather than 1.5 cycles. And, in particular, it makes life very difficult if you want to do something on the very first clock event. Surely if I write initial begin ##1 sig <= expr; my intent was that 'sig' should be driven at the FIRST clock, not the second? I realise that it may now be too late to rescind this decision. To rescue the situation, can we use ##0 to mean "wait until the current-or-next clocking event"? If so, all is well (despite the discontinuity with regular @). There's a further ambiguity here. If I use the clocking block's name as an event, using the @cb event control, do I get *exactly* the same behaviour as ##1? I guess so, but, especially in view of the problems I outline above, I think this should be made explicit. Finally, using the phrase "default clocking event" in this context is clearly wrong. If I say ##1 cb.out <= ... then the ##1 is a cycle of cb, which is not necessarily the same as a cycle of the default clocking. So, my conclusions: If we wish to keep the current proposals of SV-890-3 here, (1) it is essential that we explicitly define the behaviour of ##0, so that there's a way of reaching the next-or-current clocking event; (2) there should be a note clarifying the stark difference in behaviour between ## and the regular @ event control, and clearly stating the equivalence (if any) between @cb and ##1. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In 15.12, MODIFY the text as follows: [JB] I have a number of issues with this, which I'll take one piece at a time... 15.12 Input sampling All clocking block inputs (input or inout) are sampled at the corresponding clocking event. If the input skew is not an explicit #0, then the value sampled corresponds to the signal value at the Postponed region of the time step skew time-units prior to the clocking event (see Figure 15-1 in 15.3). <strikeout> If the input skew is an explicit #0, then the value sampled corresponds to the signal value in the Observed region. </strikeout> If the input skew is an explicit #0, several additional considerations shall govern input sampling. First, the value sampled corresponds to the signal value in the Observed region. [JB] OK so far. <insert> Next, when the clocking event occurs, the sampled value shall be updated and available for reading the instant the clocking event takes place and before any other statements are executed. [JB] This new stipulation appears to be necessary to legitimize the approach taken in some vendors' verification methodologies that don't use program blocks for the test bench. It apparently aims to sidestep the write/read race condition that pertains if you have a clocking whose clocking event is on a design variable and whose inputs are examined in design code. Can we be confident that this new stipulation is (a) appropriate, (b) general? It is almost equivalent to creating a new scheduler region (Pre-active?!). If we accept this new behaviour, it is absurd to accept the caveat that follows: <insert> Finally, if the clocking event occurs due to activity on a program object, there is a race condition between the update of the clocking block input's value and the execution of program code that reads that value. </insert> [JB] The internal contradictions here are in my opinion insupportable. In effect it says: Clocking event on a design object, clocking inputs read in design code: NO RACE because of special treatment of clocking inputs. Clocking event on a program object, clocking inputs read in program code: RACE because update of the clocking input happens in the same scheduler region as reading of that input. There is a fundamental problem here. Clocking inputs are updated as a result of occurrence of their clocking event; this is sure to race with reading of the clocking input, *unless* the clocking event is on a design variable but the clocking inputs are read in program code. This is, as I understand it, precisely the scenario for which clockings were originally designed and in which they can be expected to work reliably without races. I don't really understand the need to shoe-horn them into other scenarios where straightforward module code would do just as well. I also completely fail to understand how this approach can yield meaningful behaviour when #0 input sampling is specified, because it implies that the clocking block's sampled input values should be updated BEFORE the Observed region where the sampling is specified to take place! I discuss this in more detail below. I would prefer to see this new stipulation (clocking inputs update before anything else happens) completely removed, and in its place a warning added that clocking inputs can be read in a race-free way only if all the following conditions are met: * the clocking event occurs in the design regions of the scheduler * the clocking input observes a design net or variable * the clocking input is read only from code running in the program regions of the scheduler I also wish to see a note to the effect that input #0 sampling has unusual behaviour. It samples its input signal *after* the design regions have iterated, and therefore (in most cases) *after* the clocking event has occurred. It seems to me that this works sensibly only if the sampled "input #0" is read in program code rather than in design code. Reading it from design code will introduce an additional cycle's delay before the result is visible. input #0 and output #0 appear to have been intended to provide the useful effect of giving, to signals read or driven through a clocking, exactly the same timing behaviour as you would see from a program that reads and drives those signals without an intervening clocking block. Insisting that input samples be updated instantaneously on the clocking event will break that model, since the sampled value will be updated before the Observe region; this update will presumably obtain the value that was sampled in the Observe region of the *previous* clocking event's timestep. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15.14 Synchronous drives Clocking block outputs (output or inout) are used to drive values onto their corresponding signals, but at a specified time. That is, the corresponding signal changes value at the indicated clocking event as modified by the output skew. <insert> For zero skew clocking block outputs with no cycle delay, synchronous drives shall schedule new values in the NBA region of the current time unit. This has the effect of causing the big loop in Figure 9-1 to iterate from the reactive/re-inactive regions back into the NBA region of the current time unit. For clocking block outputs with non-zero skew or non- zero cycle delay, the corresponding signal shall be scheduled to change value in the NBA region of a future time unit. </insert> Examples: [snip] Regardless of when the drive statement executes (due to event_count delays), the driven value is assigned to the corresponding signal only at the time specified by the output skew. [JB] In the last sentence, the parenthetical remark is entirely bewildering and should be removed. In fact, given the various other changes and clarifications proposed, I suspect the whole sentence could be removed without loss. <insert> It is possible for a drive statement to execute asynchronously at a time that does not correspond to its associated clocking event. Such drive statements shall be processed as if they had executed at the time of the next clocking event. Any values read on the right hand side of the drive statement are read immediately, but the processing of the statement is delayed until the time of the next clocking event. This has implications on synchronous drive resolution (See 15.14.2) and ## cycle delay scheduling. Note: The synchronous drive syntax does not allow intra-assignment delays like a regular procedural assignment does. [JB] This is good. However, with apologies for the pedantry, can we please reword the final "Note" sentence as follows? Note: Unlike blocking and nonblocking procedural assignment, the synchronous drive syntax does not allow intra-assignment delays. 15.14.1 Drives and nonblocking assignments <strikeout> Synchronous signal drives are processed as nonblocking assignments. </strikeout> <insert> Note: While the non-blocking assignment operator is used in the synchronous drive syntax, these assignments are different than non- blocking variable assignments. The intention of using this operator is to remind readers of certain similarities shared by synchronous drives and non-blocking assignments. One main similarity is that variables and wires connected to clocking block outputs and inouts are driven in the NBA region. </insert> Another key NBA-like feature of inout clocking block variables signals and synchronous drives is that a drive does not change the clocking block input. This is because reading the input always yields the last sampled value, and not the driven value. [JB] Excellent. <insert> One difference between synchronous drives and classic NBA assignments is that transport delay is not performed by synchronous drives (except in the presence of the intra-assignment cycle delay operator). Another key difference is drive value resolution, discussed in the next section. </insert> [JB] It seems to me that synchronous drive *does* perform transport delay, albeit in a rather unusual way: first there is a transport delay from the execution of the drive to its maturation, and then there is a second transport delay associated with the clocking output's skew. I suspect it would be better to remove entirely the sentence about transport delay. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15.14.2 Drive value resolution ... The driven value of nibble is 4'b0xx1, regardless of whether nibble is a reg or a wire. <insert> If a given clocking output is driven by more than one assignment in the same time unit, but the assignments are scheduled to mature at different future times due to the use of cycle delay, then no drive value resolution shall be performed. The drives shall be applied with classic Verilog NBA transport delay semantics in this case. If a given clocking output is driven asynchronously at different time units within the same clock cycle, then drive value resolution is performed as if all such assignments were made at the same time unit in which the next clocking event occurs. </insert> [JB] I don't think this is as helpful as it could be. It describes the behaviour from the point of view of the clocking drive, whereas it is clearer and more general to describe it from the point of view of the cycle in which the assignment(s) mature. I'd like to suggest the following re-wording, which is somewhat heavy going but seems to me to be more precise: <proposed LRM text> Assignment to a clocking output using the syntax clocking_name.output_name <= [##N] expr; is known as a clocking drive. A clocking drive shall take effect (mature) at a current or future clocking event, as follows: * If the intra-assignment cycle delay ##N is absent or N is zero, the drive shall mature at the next occurrence of the clocking event, or immediately if the clocking event has occurred in the current timestep. * If N is greater than zero, the drive shall mature at the corresponding future clocking event. In this way, all clocking drives shall mature in the timestep of a clocking event of their clocking block, even if they executed asynchronously to that clocking event. At each clocking event of a clocking block, each clocking output of that clocking block shall be treated as follows: (a) _Scheduling of assignment to the clocking output_ If one or more clocking drive to that output matures on the current clocking event, a single nonblocking assignment to that output shall be scheduled for the current or future timestep specified by its output skew. If no clocking drive to that output matures on the current clocking event, no such assignment to that output shall be scheduled. (b) _Value assigned to the clocking output_ If exactly one clocking drive to that output matures, the value assigned as described in (a) above shall be the value evaluated by that clocking drive when it executed. However, if two or more clocking drives to that output mature, the value assigned shall be determined by resolving all those drives' values, as if each of those values had been driven on to the same net of wire type by a continuous assign statement of (strong0, strong1) drive strength. </proposed LRM text> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [JB] I'm happy with the remaining proposed modifications to clause 15. However, the implicit NBAs described in 15.14.2 above have an impact on clause 16 (program) which should be mentioned also in clause 15. Clocking outputs are updated by NBA (or, in the case of clocking outputs that are nets, by continuous assign from an implicit variable that's updated by NBA). Consequently, as I understand it, ** it can in no circumstances be legal for any program variable to be a clocking output ** because that would be equivalent to writing a program variable by NBA, and we know that to be a bad idea. And, for the avoidance of any argument, let's note that a program's output port that is a variable is a program variable, not a design variable. You *can* get design variables visible through ports of a program, by passing them through ref ports, so this is not a limitation. But we don't want the program to be able to read, directly, one of its own variables that has been updated in the NBA region by a clocking block - even if that variable happens also to be one of its output ports that's driving a design signal. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK Tel: +44 (0)1425 471223 Email: jonathan.bromley@doulos.com Fax: +44 (0)1425 471573 Web: http://www.doulos.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.Received on Thu Sep 21 08:26:34 2006
This archive was generated by hypermail 2.1.8 : Thu Sep 21 2006 - 08:27:09 PDT