Update btree article with new math delimiters

Signed-off-by: Danila Fedorin <danila.fedorin@gmail.com>
This commit is contained in:
Danila Fedorin 2024-05-13 19:07:44 -07:00
parent 20d8b18a9b
commit 15beddf96b

View File

@ -41,7 +41,7 @@ The root node of this tree is 5, its left child is 2, and its right child is 7.
In this tree, the root node is 1, and the right child is 2. None of the nodes have a left child. Looking through this tree would be as slow as looking through a list - we'd have to look through all the numbers before we find 9. No good.
__Although the average efficiency of a Binary Search Tree is \\(O(\log n)\\), meaning that for a tree with \\(n\\) nodes, it will only need to examine up to \\(logn\\) of these nodes, it can nonetheless be as inefficient as \\(O(n)\\), meaning that it will have to examine every node in the tree.__
__Although the average efficiency of a Binary Search Tree is \(O(\log n)\), meaning that for a tree with \(n\) nodes, it will only need to examine up to \(logn\) of these nodes, it can nonetheless be as inefficient as \(O(n)\), meaning that it will have to examine every node in the tree.__
This isn't good enough, and many clever algorithms have been invented to speed up the lookup of the tree by making sure that it remains _balanced_ - that is, it _isn't_ arranged like a simple list. Some of these algorithms include [Red-Black Trees](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree), [AVL Trees](https://en.wikipedia.org/wiki/AVL_tree), and, of course, B-Trees.
@ -71,7 +71,7 @@ The "nodes" in the BTreeDB are called "blocks" and are one of three types - "ind
To be able to read a Starbound BTree, the first thing that needs to be done is to read the general information about the tree. For this, we read the very beginning of the file, called the _header_. [GitHub user blixt has made an excellent table of values that make up the header](https://github.com/blixt/py-starbound/blob/master/FORMATS.md#btreedb5). The ones that matter to us are `Whether to use root node #2 instead` (hereby referred to as "swap", `Root node #1 block index` and `Root node #2 block index`. We also need to know the key size and block size of the B-Tree to be able to jump around in the file.
Once the header has been parsed (this is an exercise left up to the reader), the next step is to find the root node. This is following exactly the general lookup algorithm for a B-Tree. The index in the file (by bytes) of a block is \\(headerSize + blockSize \times n\\), where \\(n\\) is the block number. Depending on the value of `swap` (whether to use the second root node), we use either the index of root node 1 or 2 for \\(n\\), and move to that position.
Once the header has been parsed (this is an exercise left up to the reader), the next step is to find the root node. This is following exactly the general lookup algorithm for a B-Tree. The index in the file (by bytes) of a block is \(headerSize + blockSize \times n\), where \(n\) is the block number. Depending on the value of `swap` (whether to use the second root node), we use either the index of root node 1 or 2 for \(n\), and move to that position.
Next, we proceed to identify the node that we've found. The first two bytes in that node are either the ASCII values of 'F', 'L', or 'I', representing, you guessed it, "Free", "Leaf", or "Index". If the node is an index node, we need to search it for the next node we need to examine. To do so, we first read two values from the node, two 32-bit integers. The first is the number of keys in the block, and the next is the block number of the first child node.
@ -89,7 +89,7 @@ As you can see, the number of children is one more than the number of keys, and
Simply, if the value we're searching for is bigger than the first key only, we go to the second child, if it's bigger than the second key, we go to the third child, etc. If our value is not bigger than any of the keys, we go to the 1st child. After we move to the index of the new child, we once again examine its type, and if it's still "II", we repeat the process.
Once we reach a "leaf" node, we're very close. After the two ASCII characters describing its type, the leaf node will contain a 32-bit integer describing the number of key-data pairs it has. Each key-data pair is made up of the key, a variable integer describing the length of the data, and the data itself. We examine one pair after another, carefully making sure that we don't exceed the end of the block, located at \\(headerSize + blockSize \times (n + 1) - 4\\). The reason that we subtract four from this equation is to save space for the address of the next block. As I mentioned above, leaf nodes, if their data is bigger than the block size, contain the block number of the next leaf node to which we can continue if we reach the end of the current leaf. If we do reach the end of the current leaf, we read the 32-bit integer telling us the number of the next block, jump to its index, and, after reading the first two bytes to ensure it's actually a leaf, continue reading our data. Once that's done, voila! We have our bytes.
Once we reach a "leaf" node, we're very close. After the two ASCII characters describing its type, the leaf node will contain a 32-bit integer describing the number of key-data pairs it has. Each key-data pair is made up of the key, a variable integer describing the length of the data, and the data itself. We examine one pair after another, carefully making sure that we don't exceed the end of the block, located at \(headerSize + blockSize \times (n + 1) - 4\). The reason that we subtract four from this equation is to save space for the address of the next block. As I mentioned above, leaf nodes, if their data is bigger than the block size, contain the block number of the next leaf node to which we can continue if we reach the end of the current leaf. If we do reach the end of the current leaf, we read the 32-bit integer telling us the number of the next block, jump to its index, and, after reading the first two bytes to ensure it's actually a leaf, continue reading our data. Once that's done, voila! We have our bytes.
If we're using this method to read a Starbound world file, we need to also know that the data is zlib-compressed. I won't go far into the details, as it appears like both Python and Java provide libraries for the decompression of such data.