RE: [sv-bc] Special characters in strings - Mantis 1507

From: Bresticker, Shalom <shalom.bresticker_at_.....>
Date: Mon May 12 2008 - 00:58:40 PDT
Here are a few more curiosities:
 
10. You can escape the quotation mark and write "\"". This is a
one-character string containing the quotation mark. The LRM defines this
for text macros, but not for string literals. For that reason, you also
have to use a double backslash if you want to end a string literal with
a backslash, otherwise it escapes the closing quotation mark.
 
11. Because the special characters like \n are really a single
character, they are only recognized if you write them together in a
single string literal. For example, if you try to concatenate two string
literals, one that ends with a backslash and one that begins with 'n',
it will stay two characters, a backslash followed by an 'n'.
 
12. In contrast to special characters like \n, %% is recognized as a
single percent character only if identified as a format string. So
$display("%%"); displays a single %, whereas $display("%s", "%%");
displays %%. Generally, unless displayed via a %s format, a string
literal in a $display may be interpreted as a format string with
everything that that implies.
 
13. What about a concatenation as a format string, e.g.,
$display({"a","b"});? Here some simulators identify it as a string and
display "ab", whereas others convert it to a numeric value and display a
number.
 
14. If you end a format string with a % symbol, $display("a%");, some
tools will print the % symbol, even though only one appears and not two,
and others issue an error that a format character is missing following
the % symbol.
 
Shalom


________________________________

	From: owner-sv-bc@server.eda.org
[mailto:owner-sv-bc@server.eda.org] On Behalf Of Bresticker, Shalom
	Sent: Sunday, May 11, 2008 2:37 PM
	To: sv-bc
	Cc: Geoffrey.Coram
	Subject: [sv-bc] Special characters in strings - Mantis 1507
	
	

	5.9 says, "Nonprinting and other special characters are preceded
with a backslash." 

	There is an example of ""Hello world\n", which is considered a
12-character string. That is, "\n" is considered a single character,
even though it takes more than 1 character to write it.

	Is that true for all the characters in Table 5-1? 

	What if a backslash is followed by a character that does not
appear in Table 5-1? Is it one character or two? 20.2.1.1 says, "An
escaped character not appearing in Table 20-1 shall cause the character
to be printed by itself," but that relates to printing and not
necessarily to the literal itself.

	What about "%%"? As a display format, it has a special meaning,
to just display "%". But is it one character or two? 

	There is need for clarification in the LRM. Mantis 1507 more or
less covers this issue. 

	I started checking with four different simulators and here is
what I found. 

	1. The special characters appearing in Table 5-1 and Table 20-1
consisting of a backslash followed by a single character, are stored in
a string literal as a single character, the ASCII code corresponding to
the special character denoted. That is, "\n" is stored with the ASCII
code corresponding to Carriage Return, "\t" is stored with the ASCII
code corresponding to Tab, etc.

	2. A backslash followed by a character not appearing in those
tables is also stored as a single character, but as the ASCII code
corresponding to that character, the same as if not escaped. So, for
example, "\m" is stored as x6D, the same as "m", and "\o" is stored as
x6F, but "\n" is stored as x0A, not as x6E.

	3. What about the new special characters added in 1800-2005 3.6:
\v, \f, \a? So far as I can see, none of the simulators implements them
yet, and they are just treated as normal, i.e., "\v" is the same as "v",
etc. One simulator gives a warning that these special characters are not
implemented yet.

	4. Presumably one would expect them to be stored as their
special characters. I.e., "\a" would be stored with the ASCII code
corresponding to "Bell" and not the same as "a". This means that moving
between Verilog and SystemVerilog modes would change the behavior in
this case, not just when printing them as strings, but also in their
internal storage value, and thus their printed value in non-string
formats, etc. It changes the string length as well. Admittedly, this is
normally a corner case.

	5. 20.2.1.1 includes a new line from Mantis 1101: "An escaped
character not appearing in Table 20-1 shall cause the character to be
printed by itself. For example, a string argument "\a" shall print
simply "a"." This was fine for 1364, but unfortunately "\a" is now also
a special character, so the example should be changed to some other
character. Stu, please note.

	6. What about octal codes, a backslash followed by numerals?
"\0" becomes x00, "\7" becomes x07, but "\8" becomes the ASCII code for
the character "8". OK. That is according to the rule that a backslashed
non-special character becomes the same as the character without the
backslash.

	7. What about an octal number greater than \377? The LRM says
that a tool "may" issue an error. One tool did, the others truncated the
octal number to 8 bits from the left, i.e., used the least significant 8
bits.

	8. What about hexadecimal codes, \x followed by a one or two
digit code? It seems that like the other special characters added in
1800-2005, none of the simulators implements this yet. So they just look
at this as "\x", which is the same as "x", followed by a digit string.
As with other new special characters, implementing this changes the
behavior and internal stored value from 1364.

	9. Oh, what about "%%"? In contrast to backslashed characters,
this really is stored as 2 "%" characters, and therefore does not and
should not appear in Table 5-1. It only becomes a single "%" character
when displayed. This is the only difference between Table 5-1 and Table
20-1. Probably there should be only one table and "%%" should be
described separately.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
Received on Mon May 12 01:06:22 2008

This archive was generated by hypermail 2.1.8 : Mon May 12 2008 - 01:07:44 PDT