aboutsummaryrefslogtreecommitdiff
path: root/docs/PORTABLE_STORAGE.md
diff options
context:
space:
mode:
authorJeffrey <jeffryan@tamu.edu>2022-04-23 14:15:28 -0500
committerJeffrey Ryan <jeffreyryan@tutanota.com>2022-05-04 11:08:48 -0500
commitfc9b77d855bf1014b6278dfc42853e8e19f3e832 (patch)
treeb1f15ee37afad5d9d5e90c6bd08a858df6f36077 /docs/PORTABLE_STORAGE.md
parentDocs: Add documentation for EPEE Portable Storage (diff)
downloadmonero-fc9b77d855bf1014b6278dfc42853e8e19f3e832.tar.xz
Changes to PORTABLE_STORAGE.md
* More information about array entries (especially nesting) * Varint encoding examples * Expanded string and integer encoding information
Diffstat (limited to 'docs/PORTABLE_STORAGE.md')
-rw-r--r--docs/PORTABLE_STORAGE.md112
1 files changed, 77 insertions, 35 deletions
diff --git a/docs/PORTABLE_STORAGE.md b/docs/PORTABLE_STORAGE.md
index 0b3643749..675ca818c 100644
--- a/docs/PORTABLE_STORAGE.md
+++ b/docs/PORTABLE_STORAGE.md
@@ -15,15 +15,19 @@ documentation. Unfortunately, whilst the rest of the library is fairly
straightforward to decipher, the Portable Storage is less-so. Hence this
document.
-## Preliminaries
+## String and Integer Encoding
-### String and integer encoding
+### Integers
-#### varint
+With few exceptions, integers serialized in epee portable storage format are serialized
+as little-endian.
-Varints are used to pack integers in an portable and space optimized way. The
-lowest 2 bits store the amount of bytes required, which means the largest value
-integer that can be packed into 1 byte is 63 (6 bits).
+### Varints
+
+Varints are used to pack integers in an portable and space optimized way. Varints are stored as little-endian integers, with the lowest 2 bits storing the amount of bytes required, which means the largest value integer that can be packed into 1 byte is 63
+(6 bits).
+
+#### Byte Sizes
| Lowest 2 bits | Size value | Value range |
|---------------|---------------|-----------------------------------|
@@ -32,20 +36,47 @@ integer that can be packed into 1 byte is 63 (6 bits).
| b10 | 4 bytes | 16384 to 1073741823 |
| b11 | 8 bytes | 1073741824 to 4611686018427387903 |
-#### string
+#### Represenations of Example Values
+| Value | Byte Representation (hex) |
+|----------------------|---------------------------|
+| 0 | 00 |
+| 7 | 1c |
+| 101 | 95 01 |
+| 17,000 | A2 09 01 00 |
+| 7,942,319,744 | 03 BA 98 65 07 00 00 00 |
+
+### Strings
+
+These are simply length (varint) prefixed char strings without a null
+terminator (though one can always add one if desired). There is no
+specific encoding enforced, and in fact, many times binary blobs are
+stored as these strings. This type should not be confused with the keys
+in sections, as those are restricted to a maximum length of 255 and
+do not use varints to encode the length.
-These are simply length (varint) prefixed char strings.
+ "Howdy" => 14 48 6F 77 64 79
-## Packet format
+### Section Keys
+
+These are similar to strings except that they are length limited to 255
+bytes, and use a single byte at the front of the string to describe the
+length (as opposed to a varint).
+
+ "Howdy" => 05 48 6F 77 64 79
+
+## Binary Format Specification
### Header
-A packet starts with a header:
+The format must always start with the following header:
+
+| Field | Type | Value |
+|------------------|----------|------------|
+| Signature Part A | UInt32 | 0x01011101 |
+| Signature Part B | UInt32 | 0x01020101 |
+| Version | UInt8 | 0x01 |
-| Header | Type | Value |
-|---------------|-----------|-----------------------|
-| Signature | 8 bytes | 0x0111010101010201| |
-| Version | byte | 0x01 |
+In total, the 9 byte header will look like this (in hex): `01 11 01 01 01 01 02 01 01`
### Section
@@ -63,18 +94,12 @@ Which is followed by the section's name-value [entries](#Entry) sequentially:
| Entry | Type |
|-------------------|-----------------------|
-| Name | string<sup>1</sup> |
+| Name | section key |
| Type | byte |
-| Count<sup>2</sup> | varint |
+| Count<sup>1</sup> | varint |
| Value(s) | (type dependant data) |
-
-<sup>1</sup> Note, the string used for the entry name is not prefixed with a
-varint, it is prefixed with a single byte to specify the length of the name.
-This means an entry name cannot be more that 255 chars, which seems a reasonable
-restriction.
-
-<sup>2</sup> Note, this is only present if the entry type has the array flag
+<sup>1</sup> Note, this is only present if the entry type has the array flag
(see below).
#### Entry types
@@ -90,7 +115,7 @@ The types defined are:
#define SERIALIZE_TYPE_UINT32 6
#define SERIALIZE_TYPE_UINT16 7
#define SERIALIZE_TYPE_UINT8 8
-#define SERIALIZE_TYPE_DUOBLE 9
+#define SERIALIZE_TYPE_DOUBLE 9
#define SERIALIZE_TYPE_STRING 10
#define SERIALIZE_TYPE_BOOL 11
#define SERIALIZE_TYPE_OBJECT 12
@@ -103,11 +128,14 @@ The entry type can be bitwise OR'ed with a flag:
#define SERIALIZE_FLAG_ARRAY 0x80
```
-This signals there are multiple *values* for the entry. When we are dealing with
-an array, the next value is a varint specifying the array length followed by
-the array item values. For example:
+This signals there are multiple *values* for the entry. Since only one bit is
+reserved for specifying an array, we can not directly represent nested arrays.
+However, you can place each of the inner arrays inside of a section, and make
+the outer array type `SERIALIZE_TYPE_OBJECT | SERIALIZE_FLAG_ARRAY`. Immediately following the type code byte is a varint specifying the length of the array.
+Finally, the all the elements are serialized in sequence with no padding and
+without any type information. For example:
-<p style="padding-left:1em; font:italic larger serif">name, type, count,
+<p style="padding-left:1em; font:italic larger serif">type, count,
value<sub>1</sub>, value<sub>2</sub>,..., value<sub>n</sub></p>
#### Entry values
@@ -123,17 +151,31 @@ Note, I have not yet seen the type `SERIALIZE_TYPE_ARRAY` in use. My assumption
is this would be used for *untyped* arrays and so subsequent entries could be of
any type.
-## Monero specifics
-
-### Entry values
+### Overall example
+
+Let's put it all together and see what an entire object would look like serialized. To represent our data, let's create a JSON object (since it's a format
+that most will be familiar with):
+
+```json
+{
+ "short_quote": "Give me liberty or give me death!",
+ "long_quote": "Monero is more than just a technology. It's also what the technology stands for.",
+ "signed_32bit_int": 20140418,
+ "array_of_bools": [true, false, true, true],
+ "nested_section": {
+ "double": -6.9,
+ "unsigned_64bit_int": 11111111111111111111
+ }
+}
+```
-#### Strings
+This would translate to:
-These are prefixed with a varint to specify the string length.
+![Epee binary storage format example](/docs/images/storage_binary_example.png)
-#### Integers
+## Monero specifics
-These are stored little endian byte order.
+### Entry values
#### Hashes, Keys, Blobs