IntegerSerializer format documentation

Jul 16th, 2015
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. The data consists of a sequence of blocks, representing the contents of the matrix.
  2. The blocks come in three kinds: single blocks, run blocks, and end blocks. The data consists of an arbitrary sequence of zero or more single and run blocks, in any order, followed by a single end block indicating the end of the data. The blocks are placed one after the other, with no separators.
  3. All the blocks contain a sequence of numbers, the meaning of which are defined by the block itself. All of the numbers are in the same format, to be described below; and they all represent 64-bit signed integers. In addition, each number has a "kind" bit, that has a specific meaning according to the block.
  4. The first number of a block determines the type of block it is: if the first number has the kind bit set, it is a run block; and if it has the kind bit clear, it is a single block. However, if the first number is a 0 and has the kind bit clear, it is an end block.
  6. Numbers:
  7. All numbers represent 64-bit integers, but they are encoded in as few bytes as possible to save space. Numbers might take up any number of bytes from one to nine, other than eight. (In the following description, bits are numbered in right-to-left order starting from 0, so that bit 0 is the least significant bit, and, in a single byte, bit 7 is the most significant bit.)
  8. In the first seven bytes, bit number 7 is a continuation bit, that indicates whether the value continues in the next byte: if this bit is set, the number continues in the following byte; but if this bit is clear, the number ends in that byte. This allows for the length of the number to be determined. (If the continuation bit for the seventh byte is set, that means that the number must have nine bytes, since eight bytes are impossible. These two last bytes have no continuation bit.)
  9. Bit 6 of the first byte is the kind bit, and it has the special meaning that is described in this specification. Bit 5 is the sign bit: if this bit is set, a one's complement (i.e., bitwise NOT) must be applied to the encoded value in order to obtain the actual value. Encoded values are 63-bit unsigned: this bit provides the 64th bit and the possibility of encoding negative numbers.
  10. The remaining bits of all bytes (5 for the first byte, 7 for the following six bytes, and 8 for the last two) are data bits, in which the number is encoded. The encoding is little-endian: bit 0 of the first byte is bit 0 of the encoded number, bit 0 of the second byte (if it exists) is bit 5 of the encoded number, bit 0 of the third byte (if it exists, again) is bit 12 of the encoded number, and so on.
  12. Coordinates:
  13. All blocks (other than the end block) contain coordinates, that represent positions in the data matrix. Coordinates are specified as two numbers, the first number representing the X coordinate, and the second number representing the Y coordinate. For each coordinate, the encoded number may be absolute (i.e., the coordinate itself, without any transformation applied) or relative (i.e., the difference between the current coordinate and the previous coordinate in the same axis). If the kind bit is set, the coordinate is relative; if it is clear, the coordinate is absolute. For the very first coordinates in the file, where no previous coordinates exist, the "previous coordinate" is taken to be 64.
  15. Blocks:
  17. Single blocks:
  18. These blocks represent a single point in the data matrix, and are indicated by having the kind bit clear in their first number. They contain three numbers, the first one representing the value to introduce in the data matrix, and the remaining two being the coordinates where that value should be inserted.
  20. Run blocks:
  21. These blocks represent a group of points in the data matrix that contain the same value, and may contain any odd amount of numbers other than one. They are indicated by having the kind bit set in their first number. The first number indicates the value to introduce in the data matrix for each coordinate pair in the block, and the following numbers, in pairs, represent coordinates; each pair of numbers represents one set of coordinates at which the value stated initially must be inserted. The coordinates are terminated by two 0x40 bytes, each representing a value of 0 with the kind bit set; these terminators should not be interpreted as coordinates.
  23. End block:
  24. This block represents the end of the data. It is used to allow for other types of data to be concatenated after the end. The end block is a single 0x00 byte, representing a number 0 with the kind bit clear.
  27. Example data set:
  28. 0000: 03 41 62 05 03 22 42 92 - 01 B4 01 41 61 41 40 40
  29. 0010: 40 F2 D4 03 05 C3 02 40 - 40 01 00 14 00
  31. Interpretation:
  32. 03 41 62: Single block, coordinates are relative +1, -3, value is 3. Since the "previous coordinates" default to 64 for the first block, this represents position (65, 61).
  33. 05 03 22: Single block, coordinates are absolute 3, -3, value is 5. Absolute coordinates mean that this is just (3, -3).
  34. 42: Beginning of a run block, value is 2:
  35. 92 01 B4 01: coordinates are absolute, (50, -53)
  36. 41 61: coordinates are relative +1, -2; so (51, -55)
  37. 41 40: coordinates are relative +1, +0; so (52, -55)
  38. 40 40: end of run block
  39. F2 D4 03: Beginning of another run block, value is -14995:
  40. 05 C3 02: coordinates are mixed absolute/relative: X is absolute 5, Y is relative +67; so (5, 12)
  41. 40 40: end of run block
  42. 01 00 14: Single block, coordinates are absolute, (0, 20)
  43. 00: End block
  45. Therefore, the data would be:
  46. [65, 61] = 3
  47. [3, -3] = 5
  48. [50, -53] = 2
  49. [51, -55] = 2
  50. [52, -55] = 2
  51. [5, 12] = -14995
  52. [0, 20] = 1
  53. every other coordinate = 0
RAW Paste Data Copied