Primary Keys
The primary key is always the first column of any container.
To get started you must download the database using the command below
curl -fsSL https://tytodb.pages.dev/installer.sh | bash
To use it, you must write commands, queries, or transactions by using connection handlers like the TytoDB-Rust-Client. To configure the database or manage the program files manually, the files are usually in ~/TytoDB, since it uses $HOME/TytoDB to store its files.
The database is written in Rust and C, which means that it is portable across processors and systems. However, the program relies on io_uring, which means that the kernel must be version 5.1 or newer. If you are on an ARM device or want to compile the program yourself, the C part of the code (io.c) uses a global include #include
TytoDB is a high-performance, non-relational column-row database designed with a strong emphasis on ACID compliance. It excels in equality searches involving the primary key and is optimized for speed in database scans, as well as in insert and delete operations.
In TytoDB, data is stored in containers, which are files where the database data is stored. The disk layout of the containers involves three key components: the metadata layout, the row structure, and tombstones.
Container files store metadata at the start of the file, organized as a tuple of two arrays:
When reading the metadata, the two arrays are interpreted by "matching" both arrays: the first column has the element zero as its name and the first element of the byte array as its type, and so on for the other columns.
Rows are stored right after the metadata as an array of elements of the same size (determined by the type size). They are placed as elements in a sequential list (akin to a Vec or Array), which means that reads from the disk can be performed using a relative pointer (index of that pseudo-array).
After deleting a row, the space on the disk is filled with "full" bytes (0xFF
, also represented as 255u8
), marking that row as a tombstone. Afterwards, the pointer to that row is pushed into an in-memory BTreeSet
, called the graveyard, so the space can be recycled later by having new data overwrite it. However, this does not guarantee that the space will be reused with every insert. When a row is inserted, it either gets the index at the end of the file if the graveyard is empty or pops a pointer from the structure. This pointer can be lost if the program rolls back or crashes. An operation called Vacuum is available to clean up the tombstones left behind.
Referring to data types, Alba types is the name given to all the individual types a column can have in the database (Alba was the name of the query language the database once had in its first versions). The table below shows all the types available to be used within the program and their properties.
AlbaType ID | Type Name | Max Content Size | Disk Size | Description |
---|---|---|---|---|
0 | NONE | – | 0 bytes | Absence of value (used only in metadata) |
1 | CHAR | 1 Unicode scalar (4 bytes) | 4 bytes | Rust char (UTF-32) |
2 | INT | 32-bit signed integer | 4 bytes | Rust i32 |
3 | BIGINT | 64-bit signed integer | 8 bytes | Rust i64 |
4 | BOOL | Boolean | 1 byte | Rust bool |
5 | FLOAT | 64-bit IEEE-754 | 8 bytes | Rust f64 |
6 | TEXT | up to MAX_STR_LEN |
dynamic | String , variable-length (deprecated) |
7 | NANO-STRING | up to 10 bytes | 10 + 8 = 18 bytes | Small fixed-max string with 8-byte length prefix |
8 | SMALL-STRING | up to 100 bytes | 100 + 8 = 108 bytes | Fixed-max small string with length prefix |
9 | MEDIUM-STRING | up to 500 bytes | 500 + 8 = 508 bytes | Fixed-max medium string with length prefix |
10 | BIG-STRING | up to 2,000 bytes | 2,000 + 8 = 2,008 bytes | Fixed-max large string with length prefix |
11 | LARGE-STRING | up to 3,000 bytes | 3,000 + 8 = 3,008 bytes | Fixed-max very large string with length prefix |
12 | NANO-BYTES | up to 10 bytes | 10 + 8 = 18 bytes | Small fixed-max blob with length prefix |
13 | SMALL-BYTES | up to 1,000 bytes | 1,000 + 8 = 1,008 bytes | Fixed-max small blob with length prefix |
14 | MEDIUM-BYTES | up to 10,000 bytes | 10,000 + 8 = 10,008 bytes | Fixed-max medium blob with length prefix |
15 | BIG-BYTES | up to 100,000 bytes | 100,000 + 8 = 100,008 bytes | Fixed-max large blob with length prefix |
16 | LARGE-BYTES | up to 1,000,000 bytes | 1,000,000 + 8 = 1,000,008 bytes | Fixed-max very large blob with length prefix |
TytoDB performs its indexing with an on-disk hashmap, which stores two-item u64
tuples: the first 64-bit unsigned integer is the primary key (hashed using the ahash
algorithm), and the second is the pointer to the row in the container. Indexing is used in most operations like insert, delete, edit, and search. It is employed during an insert to guarantee the uniqueness of a primary key (which is crucial for the indexing to keep working) and, for edit and delete operations, it can sometimes (depending on the query plan) be used to determine which rows must be affected by the operation.
The primary key is always the first column of any container.
Creates a container using as arguments a tuple of arrays: the first one being a string array and the second one a byte array for generating the metadata. The last argument is a string that specifies the container name.
Creates a row in a container using the following arguments: an array of strings, an array of values, and a string. The first and second arrays implement the row layout; it is not necessary to mention all the values of all columns to create a row. When not mentioned, the value's zero is assigned. This makes the explicit definition of primary keys desirable, although not required.
Makes a search in the specified container and edits the rows (edits go into the MVCC) returned by the search operation. This operation has four parameters: the first is an array of text which contains the names of the columns being edited, the second is the new values, the third is the name of the container that the operation will run on, and the last is a condition set for the search operation to filter the rows that match these conditions and edit only those rows.
Performs a search in the specified container and deletes the rows the search returns. It requires two arguments: the container name to specify where the operation will run, and the condition set for the search.
Deletes a container. It receives a container name as an argument.
Uses the query_type
internal function to determine what sort of search must be done based on the condition set; the possible sorts are either scans or indexed searches. An indexed search uses the index hashmap to get a row, while a scan searches the dataset, returning the results of the rows matching the conditions (all rows if no conditions are entered). The operation requires three arguments: the first one is a string array that determines what columns are going to be fetched, the second is a string (container name) that determines the container where the operation will be run.
The Commit operation flushes data from the MVCC to the container file. It first iterates through the MVCC, gathering the data within it and detecting what must be written. There is a flag on every element in the MVCC structure that determines whether it is a deletion or not. If it is a deletion, the operation will write into the file at the row location an array of "full" bytes (0xFF
or 255u8
) with the same size as the original data and remove entries from the index file with the same key as the deleted row. Insertions can be either an insertion or an edit (the process is very similar in practice). For insertions, the commit writes the serialized row at the location it must be at and inserts into the hashmap the pointer to that row using the primary key as the key. Edits are similar to insertions; the only difference is that before writing the index into the hashmap, it removes entries with the old primary key to avoid collisions of any sort.
Clears the MVCC and graveyard, effectively discarding all pending changes and reverting the database to its state before the transaction began.
Vacuum is a function that, when called by the database, runs in a container and cleans the tombstones in its file. This execution starts by scanning the entire dataset and pushing a boolean value that indicates whether the row is or is not a tombstone into a bitmap. Afterwards, two variables are declared: the first one is "cursor" and the other "back-cursor"; the cursor starts with a value of zero, and the back-cursor starts at N-1
, where N
is the bitmap length (count of elements). A loop runs until cursor > back-cursor
, increasing the cursor value by one until it finds a tombstone in the bitmap at the index the cursor holds. Then, the loop state changes, and it starts decreasing the value of back-cursor until it finds a non-tombstone. After finding both a tombstone and a non-tombstone, both values are pushed into a vector that stores tuples of numbers (cursor and back-cursor positions).
After the loop involving direct interaction with the variables cursor and back-cursor ends, a for loop starts iterating through the vector of tuples, making swaps in the disk and the bitmap. After each swap, the references in the index map are updated, followed by an fsync
. Although this slows the process down, it ensures data safety since a crash or power outage during this operation could be catastrophic, especially while there is no backup system like the one commit has with the MVCC.
MVCC (Multi-Version Concurrency Control) is a mutex-protected hashmap that every container has in memory. Insertions into this structure are done so that the Commit operation can flush the data changes from it to the container file. After every insertion into the MVCC, appends are made to the MVCC WAL (Write-Ahead Log), which gets an fsync
after every append. Both the append and fsync
are run asynchronously to avoid blocking the operation that performed the push.
The database is network-based and listens on a port on a host (configured in the configurations), using TCP to operate. It employs FalcoTCP to handle the low-level networking alongside the cryptography of data being transmitted and connections. When a client sends bytes to the server, it tries to interpret them, and that is how commands are submitted to TytoDB. To handle those "bytes" the database receives and interprets, a connection handler built on top of FalcoTCP must be used. That handler will use an API/UI (whichever method is chosen) to help generate the byte instructions.
The configurations change the way the database does certain things. The configuration file is found in the database's directory ($HOME/TytoDB/
), named settings.yaml
, which contains several elements:
min_columns
: The minimum count of columns every container must have on creation (will not affect existing containers).max_columns
: The maximum count of columns every container can have on creation (will not affect existing containers).ip
: The host the database will listen to.port
: The port on the host the database will be listening to.vacuum
: The schedule of the "Vacuum" function.This setting in the YAML file is a list where every element is a string tuple. The first item is the container name (where the vacuum will happen), and the second is when the vacuum must happen, which can follow 4 different patterns:
X
is a number and unit
is a unit of time (available units: "seconds", "minutes", "hours", "years", or "decades"). When using this schedule pattern, the function will be run every X * unit
. For example, if X
is 10 and the unit is "years", then a vacuum will happen every 10 years.General Example:
vacuum:
- ["users", "once"]
- ["logs", "12:00:00"]
- ["data", "5 minutes"]
This document specifies the low-level binary protocol for TytoDB operations. It is derived directly from the serialization and deserialization logic in the Rust TytoDB Client source code (`commands.rs`, `types.rs`, `dynamic_int.rs`) and is intended for developers creating compatible connection handlers or alternative database implementations.
To efficiently encode the size of variable-length data, TytoDB uses a custom `DynamicInteger` format. This structure consists of a 1-byte flag that signals the size of the subsequent integer, which holds the actual length. This allows for compact representation of small numbers while supporting lengths up to 2^64-1. All integers are written in little-endian byte order.
The encoding is determined as follows:
u8
(0-255), the stream will contain the flag 0x00
followed by the 1-byte length.u16
, the stream will contain the flag 0x01
followed by the 2-byte length.u32
, the stream will contain the flag 0x02
followed by the 4-byte length.0x03
followed by the 8-byte u64
length.All data values are serialized using the `AlbaTypes` format. Each serialized value begins with a 1-byte Type ID, immediately followed by the encoded data. All multi-byte numerical values are encoded in little-endian byte order.
0x00
): A 1-byte Type ID (0x00
), followed by a DynamicInteger
representing the string's byte length, followed by the raw UTF-8 bytes of the string.0x0b
): A 1-byte Type ID (0x0b
), followed by a DynamicInteger
for the length, followed by the raw bytes.U8
-U128
, I32
, I64
): A 1-byte Type ID (e.g., 0x01
for U8, 0x0a
for I64), followed by the fixed-size, little-endian bytes of the number (1, 2, 4, 8, or 16 bytes).F32
, F64
): A 1-byte Type ID (0x06
or 0x07
), followed by the 4 or 8 bytes of the IEEE 754 representation.0x08
): A 1-byte Type ID (0x08
), followed by a single byte: 0x01
for true, 0x00
for false.This reusable block is used to define filters for `EditRow`, `DeleteRow`, and `Search` commands. It is a sequence of two distinct parts: the conditions themselves and the logic that combines them.
First, a single u8
byte specifies the number of condition tuples (D). This is followed by D condition tuples. Each tuple is serialized by writing the condition's column name (as a u8
-prefixed string), followed by the 1-byte ID of the logical operator (e.g., `Equal`, `Higher`), followed by the value for comparison, which is serialized as a complete `AlbaType` structure.
After all condition tuples, a second u8
byte specifies the number of logic operators (L). This is followed by L logic tuples. Each logic tuple consists of a 1-byte index pointing to a condition and a 1-byte operator (0x01
for 'AND', 0x00
for 'OR') that links it to the preceding condition.
The byte stream begins with the opcode 0x00
. It is followed by the container name, encoded as a single u8
for its length and then the name's raw ASCII bytes. Next, a single u8
specifies the number of columns (C). This is followed by a sequence of C column name definitions, where each name is individually prefixed by its u8
length. Finally, a contiguous block of C bytes is appended, where each byte is the u8
Type ID for the corresponding column in the order they were defined.
The stream starts with opcode 0x01
. It is followed by the container name, encoded as a u8
-prefixed string. Next, a u8
indicates the number of columns (C) for which values are being provided. This is followed by C column names, each encoded as a u8
-prefixed string. Finally, a sequence of C values are appended, with each value being fully serialized according to its `AlbaType` structure (Type ID + data).
The stream begins with opcode 0x02
, followed by the u8
-prefixed container name. After the name, a u8
specifies the number of changes (C). This is followed by C pairs of a u8
-prefixed column name and a fully serialized `AlbaType` value. After the changes block, a complete Condition Block is appended to specify which rows to update.
The instruction starts with opcode 0x03
, followed by the u8
-prefixed container name. A single byte flag follows; if it is 0x01
, a complete Condition Block is appended. If the flag is 0x00
, no further data is present, indicating a deletion of all rows (if supported by the server's query plan).
This instruction starts with opcode 0x04
. It is immediately followed by the raw ASCII bytes of the container name. It is critical to note that this is an exceptional case: the container name is not prefixed with its length.
The search instruction is uniquely structured. It begins with opcode 0x05
. This is followed by a u8
indicating the number of columns to fetch (C), then C instances of u8
-prefixed column names. After the column names, a complete Condition Block is appended. Following the conditions, an 8-byte u64
specifies the length (L) of the final data block. This final block contains the target container, encoded as a standard u8
-prefixed string.
Starts with opcode 0x06
. A 1-byte flag follows: if 0x01
, a u8
-prefixed container name is appended to commit changes only for that container. If 0x00
, no more data follows, and the commit applies to all containers with pending changes.
Starts with opcode 0x07
. A 1-byte flag follows: if 0x01
, a u8
-prefixed container name is appended to roll back changes for that container only. If 0x00
, the rollback is for all containers.
The instruction begins with opcode 0x08
, followed by a u8
-prefixed container name. Next, a u8
specifies the number of columns (C), followed by C u8
-prefixed column names. After the schema information, a 4-byte u32
indicates the number of rows (R) in the batch. Finally, a flattened sequence of R * C values, each serialized as a complete `AlbaType`, is appended.
This instruction starts with opcode 0x09
. It is followed by a 4-byte signed integer (i32
). The absolute value of this integer represents the number of commands (C) in the batch. If the integer is negative, the entire batch should be executed as a single transaction. The integer is followed by C command blocks. Each block consists of a 4-byte u32
indicating the length of the subsequent command's byte stream, followed by the complete byte stream of that command (including its own opcode and data).
The serialization logic imposes the following maximum sizes:
MAX_CONTAINER_NAME_LENGTH
).MAX_CONTAINER_COLUMN_LENGTH
).u8::MAX
).u8::MAX
).