Advertisement
kwabenasapong

Untitled

Apr 20th, 2023
196
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.17 KB | None | 0 0
  1. ASCII characters (code points 0-127) are represented using a single byte with the same binary value as the ASCII code.
  2.  
  3. Code points 128-2047 are represented using two bytes. The first byte starts with the binary value 110, followed by the 5-bit binary value of the first 5 bits of the code point. The second byte starts with the binary value 10, followed by the remaining 6 bits of the code point.
  4.  
  5. Code points 2048-65535 are represented using three bytes. The first byte starts with the binary value 1110, followed by the 4-bit binary value of the first 4 bits of the code point. The second and third bytes start with the binary value 10, followed by the remaining 6 bits of the code point split between the two bytes.
  6.  
  7. Code points 65536-1114111 are represented using four bytes. The first byte starts with the binary value 11110, followed by the 3-bit binary value of the first 3 bits of the code point. The second, third, and fourth bytes start with the binary value 10, followed by the remaining 6 bits of the code point split between the three bytes.
  8.  
  9. UTF-8 bytes that do not conform to any of the above rules are considered invalid and should not be used in UTF-8 encoded strings.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement