[sv-bc] Special characters in strings - Mantis 1507

From: Bresticker, Shalom <shalom.bresticker_at_.....>
Date: Sun May 11 2008 - 04:36:41 PDT
5.9 says, "Nonprinting and other special characters are preceded with a
backslash."

There is an example of ""Hello world\n", which is considered a
12-character string. That is, "\n" is considered a single character,
even though it takes more than 1 character to write it.

Is that true for all the characters in Table 5-1?

What if a backslash is followed by a character that does not appear in
Table 5-1? Is it one character or two? 20.2.1.1 says, "An escaped
character not appearing in Table 20-1 shall cause the character to be
printed by itself," but that relates to printing and not necessarily to
the literal itself.

What about "%%"? As a display format, it has a special meaning, to just
display "%". But is it one character or two?

There is need for clarification in the LRM. Mantis 1507 more or less
covers this issue.

I started checking with four different simulators and here is what I
found.

1. The special characters appearing in Table 5-1 and Table 20-1
consisting of a backslash followed by a single character, are stored in
a string literal as a single character, the ASCII code corresponding to
the special character denoted. That is, "\n" is stored with the ASCII
code corresponding to Carriage Return, "\t" is stored with the ASCII
code corresponding to Tab, etc.

2. A backslash followed by a character not appearing in those tables is
also stored as a single character, but as the ASCII code corresponding
to that character, the same as if not escaped. So, for example, "\m" is
stored as x6D, the same as "m", and "\o" is stored as x6F, but "\n" is
stored as x0A, not as x6E.

3. What about the new special characters added in 1800-2005 3.6: \v, \f,
\a? So far as I can see, none of the simulators implements them yet, and
they are just treated as normal, i.e., "\v" is the same as "v", etc. One
simulator gives a warning that these special characters are not
implemented yet.

4. Presumably one would expect them to be stored as their special
characters. I.e., "\a" would be stored with the ASCII code corresponding
to "Bell" and not the same as "a". This means that moving between
Verilog and SystemVerilog modes would change the behavior in this case,
not just when printing them as strings, but also in their internal
storage value, and thus their printed value in non-string formats, etc.
It changes the string length as well. Admittedly, this is normally a
corner case.

5. 20.2.1.1 includes a new line from Mantis 1101: "An escaped character
not appearing in Table 20-1 shall cause the character to be printed by
itself. For example, a string argument "\a" shall print simply "a"."
This was fine for 1364, but unfortunately "\a" is now also a special
character, so the example should be changed to some other character.
Stu, please note.

6. What about octal codes, a backslash followed by numerals? "\0"
becomes x00, "\7" becomes x07, but "\8" becomes the ASCII code for the
character "8". OK. That is according to the rule that a backslashed
non-special character becomes the same as the character without the
backslash.

7. What about an octal number greater than \377? The LRM says that a
tool "may" issue an error. One tool did, the others truncated the octal
number to 8 bits from the left, i.e., used the least significant 8 bits.

8. What about hexadecimal codes, \x followed by a one or two digit code?
It seems that like the other special characters added in 1800-2005, none
of the simulators implements this yet. So they just look at this as
"\x", which is the same as "x", followed by a digit string. As with
other new special characters, implementing this changes the behavior and
internal stored value from 1364.

9. Oh, what about "%%"? In contrast to backslashed characters, this
really is stored as 2 "%" characters, and therefore does not and should
not appear in Table 5-1. It only becomes a single "%" character when
displayed. This is the only difference between Table 5-1 and Table 20-1.
Probably there should be only one table and "%%" should be described
separately.

Regards,
Shalom

Shalom Bresticker
Intel Jerusalem LAD DA
+972 2 589-6582
+972 54 721-1033

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
Received on Sun May 11 04:39:57 2008

This archive was generated by hypermail 2.1.8 : Sun May 11 2008 - 04:40:38 PDT