checkpoint
This commit is contained in:
parent
2634795b5f
commit
78e51a8c66
314 changed files with 48199 additions and 300 deletions
96
man/man7/utf.html
Normal file
96
man/man7/utf.html
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
<head>
|
||||
<title>utf(7) - Plan 9 from User Space</title>
|
||||
<meta content="text/html; charset=utf-8" http-equiv=Content-Type>
|
||||
</head>
|
||||
<body bgcolor=#ffffff>
|
||||
<table border=0 cellpadding=0 cellspacing=0 width=100%>
|
||||
<tr height=10><td>
|
||||
<tr><td width=20><td>
|
||||
<tr><td width=20><td><b>UTF(7)</b><td align=right><b>UTF(7)</b>
|
||||
<tr><td width=20><td colspan=2>
|
||||
<br>
|
||||
<p><font size=+1><b>NAME </b></font><br>
|
||||
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=2><td><tr><td width=20><td>
|
||||
|
||||
UTF, Unicode, ASCII, rune – character set and format<br>
|
||||
|
||||
</table>
|
||||
<p><font size=+1><b>DESCRIPTION </b></font><br>
|
||||
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=2><td><tr><td width=20><td>
|
||||
|
||||
The Plan 9 character set and representation are based on the Unicode
|
||||
Standard and on the ISO multibyte UTF-8 encoding (Universal Character
|
||||
Set Transformation Format, 8 bits wide). The Unicode Standard
|
||||
represents its characters in 16 bits; UTF-8 represents such values
|
||||
in an 8-bit byte stream. Throughout this
|
||||
manual, UTF-8 is shortened to UTF.
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table>
|
||||
|
||||
In Plan 9, a <i>rune</i> is a 16-bit quantity representing a Unicode
|
||||
character. Internally, programs may store characters as runes.
|
||||
However, any external manifestation of textual information, in
|
||||
files or at the interface between programs, uses a machine-independent,
|
||||
byte-stream encoding called UTF.
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table>
|
||||
|
||||
UTF is designed so the 7-bit ASCII set (values hexadecimal 00
|
||||
to 7F), appear only as themselves in the encoding. Runes with
|
||||
values above 7F appear as sequences of two or more bytes with
|
||||
values only from 80 to FF.
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table>
|
||||
|
||||
The UTF encoding of the Unicode Standard is backward compatible
|
||||
with ASCII: programs presented only with ASCII work on Plan 9
|
||||
even if not written to deal with UTF, as do programs that deal
|
||||
with uninterpreted byte streams. However, programs that perform
|
||||
semantic processing on ASCII graphic characters must convert
|
||||
from UTF to runes in order to work properly with non-ASCII input.
|
||||
See <a href="../man3/rune.html"><i>rune</i>(3)</a>.
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table>
|
||||
|
||||
Letting numbers be binary, a rune x is converted to a multibyte
|
||||
UTF sequence as follows:
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table>
|
||||
|
||||
01. x in [00000000.0bbbbbbb] → 0bbbbbbb<br>
|
||||
10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb<br>
|
||||
11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb<br>
|
||||
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table>
|
||||
|
||||
Conversion 01 provides a one-byte sequence that spans the ASCII
|
||||
character set in a compatible way. Conversions 10 and 11 represent
|
||||
higher-valued characters as sequences of two or three bytes with
|
||||
the high bit set. Plan 9 does not support the 4, 5, and 6 byte
|
||||
sequences proposed by X-Open. When there are
|
||||
multiple ways to encode a value, for example rune 0, the shortest
|
||||
encoding is used.
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=5><td></table>
|
||||
|
||||
In the inverse mapping, any sequence except those described above
|
||||
is incorrect and is converted to rune hexadecimal 0080.<br>
|
||||
|
||||
</table>
|
||||
<p><font size=+1><b>SEE ALSO </b></font><br>
|
||||
|
||||
<table border=0 cellpadding=0 cellspacing=0><tr height=2><td><tr><td width=20><td>
|
||||
|
||||
<a href="../man1/ascii.html"><i>ascii</i>(1)</a>, <a href="../man1/tcs.html"><i>tcs</i>(1)</a>, <a href="../man3/rune.html"><i>rune</i>(3)</a>, <i>The Unicode Standard</i>.<br>
|
||||
|
||||
</table>
|
||||
|
||||
<td width=20>
|
||||
<tr height=20><td>
|
||||
</table>
|
||||
<!-- TRAILER -->
|
||||
<table border=0 cellpadding=0 cellspacing=0 width=100%>
|
||||
<tr height=15><td width=10><td><td width=10>
|
||||
<tr><td><td>
|
||||
<center>
|
||||
<a href="../../"><img src="../../dist/spaceglenda100.png" alt="Space Glenda" border=1></a>
|
||||
</center>
|
||||
</table>
|
||||
<!-- TRAILER -->
|
||||
</body></html>
|
||||
Loading…
Add table
Add a link
Reference in a new issue