<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title><![CDATA[Louis' imperfect blog]]></title><description><![CDATA[Louis' imperfect blog's RSS feed]]></description><link>https://louissven.xyz</link><copyright>copyright Louis Sven Goulet 2023-2026</copyright><pubDate>2025-08-12</pubDate><ttl>1800</ttl><item><title><![CDATA[How I do (type-safe) container types in C – type-safe(r) container types]]></title><link>https://louissven.xyz/article/how_I_do_container_types_in_C.md</link><description><![CDATA[<!DOCTYPE html><html><head><base href="/"><link rel="stylesheet" href="data/site.css"><!--<meta name="viewport" content="width=device-width">--><link rel="apple-touch-icon" sizes="180x180" href="/data/favicon_io/apple-touch-icon.png"><link rel="icon" type="image/png" sizes="32x32" href="/data/favicon_io/favicon-32x32.png"><link rel="icon" type="image/png" sizes="16x16" href="/data/favicon_io/favicon-16x16.png"><link rel="manifest" href="/data/favicon_io/site.webmanifest"><title>How I do (type-safe) container types in C</title><meta charset="UTF-8"><meta name="description" content="type-safe(r) container types"><meta name="author" content="Louis"><link rel="stylesheet" href="/data/highlight/styles/nord.min.css"><script src="/data/highlight/highlight.min.js"></script><script>hljs.highlightAll();</script></head><body><header><div id="header_top_div"><a href="/" class="header_element">home</a><a href="/articles" class="header_element">articles</a><a href="/data-policy" class="header_element">data policy</a><a href="https://github.com/lorlouis" class="header_element">github</a><a href="/rss" class="header_element">rss</a></div></header><main><h1>How I do (type-safe) container types in C</h1>
<p>Recently, after seeing two articles on how to achieve container-types in C,
I decided I'd also write one.</p>
<p><a href="https://uecker.codeberg.page/2025-07-20.html">Martin Uecker's article</a>
| <a href="https://uecker.codeberg.page/2025-07-20.html">HN thread</a></p>
<p><a href="https://danielchasehooper.com/posts/typechecked-generic-c-data-structures/">Daniel Hooper's article</a>
| <a href="https://lobste.rs/s/s4po4y/how_i_write_type_safe_generic_data">lobste.rs thread</a></p>
<h2>Why am I not satisfied with these two articles</h2>
<p>The only correct reason is that I suffer from the not-invented-here syndrome,
but I have some complaints that would make me not want to use these implementations.</p>
<h3>Uecker's way</h3>
<blockquote>
<pre><code class="language-C">#define vec(T) struct vec_##T { ssize_t N; T data[/* .N */]; }
</code></pre>
<p>-- <em>Martin Uecker, Generic Containers in C: vec</em></p>
</blockquote>
<p>This is how I started doing &quot;generics&quot; in C and I quickly ran into issues with
having the macro define the name of the Vec.</p>
<p>For &quot;simple types&quot; this works great, <code>vec(int)</code> would expand to:</p>
<pre><code class="language-C">struct vec_int { ssize_t N, int data[] }
</code></pre>
<p>But for more complex types this would break down pretty quickly.</p>
<pre><code class="language-C">struct MyValue {int a, int b}

vec(struct MyValue)
</code></pre>
<p>Would expand to:</p>
<pre><code class="language-C">struct vec_struct MyValue { ssize_t N, struct MyValue data[] }
</code></pre>
<p>And it would result in invalid C. This can be worked around by <code>typedef</code>-ing the
<code>struct</code> instead but it would force me to <code>typedef</code> pointers to values and I don't
like how it &quot;pollutes&quot; my namespace. I'm also not that imaginative so if I can
<em>not</em> name something I usually go that route.</p>
<p>Overall it's not a bad way of doing things, but my real gripe comes with the
way that the logic is implemented.</p>
<blockquote>
<pre><code class="language-C">#define vec_push(T, v, x)                                  \
   ({                                                      \
       vec(T) **_vp = (v);                                 \
       ssize_t _N = (*_vp)-&gt;N + 1;                         \
       ssize_t _S = _N * (ssize_t)sizeof((*_vp)-&gt;data[0])  \
               + (ssize_t)sizeof(vec(T));                  \
       if (!(*_vp = realloc(*_vp, _S))) abort();           \
       (*_vp)-&gt;N++;                                        \
       (*_vp)-&gt;data[_N - 1] = (x);                         \
   })
</code></pre>
<p>-- <em>Martin Uecker, Generic Containers in C: vec</em></p>
</blockquote>
<p>Having done that in the past, I now tend to avoid including too much logic
inside my macros because I find that they lead to cryptic error messages and
sometimes variable name clashes. (Again I suck I naming things)
I've spent too much time grep-ing through the output of <code>cpp</code> and I've now switched
to doing something else.</p>
<h3>Hooper's way</h3>
<p>I find that Hooper does container types in a very similar way to myself.</p>
<blockquote>
<pre><code class="language-C">#define List(type) union { \
    ListNode *head; \
    type *payload; \
}
</code></pre>
<p>-- <em>Daniel Hooper, Type Safe Generic Data Structures in C</em></p>
</blockquote>
<p>Defining an unnamed union avoids the complex type problem we ran into with the
other implementation, but as Hooper points out, without doing anything else,
this would result in type errors when expanding the macro more than once.</p>
<p>From Hooper's article:</p>
<blockquote>
<pre><code class="language-C">List(Foo) a;
List(Foo) b = a; // error

void my_function(List(Foo) list);
my_function(a); // error: incompatible type
</code></pre>
<p>Even though the variables have identical type definitions, the compiler
still errors because they are two distinct definitions.
A <code>typedef</code> avoids the issue:</p>
<pre><code class="language-C">typedef List(Foo) ListFoo; // this makes it all work

ListFoo a;
ListFoo b = a; // ok

void my_function(ListFoo list);
my_function(a); // ok

List(Foo) local_foo_list; // still works 
</code></pre>
<p>-- <em>Daniel Hooper, Type Safe Generic Data Structures in C</em></p>
</blockquote>
<p>I personally don't like how expansions of the same macro won't point back to
the same type. C23 &quot;fixed&quot; this behaviour with it's named record equivalence
rule, but for it to work we would need to make the name of the type part of the
macro and we would run in the same issue with complex types.</p>
<h2>My way</h2>
<p>I do it much in the same way as Hooper. I declare a &quot;base implementation&quot;
of my datastructure that every generic version will wrap.</p>
<pre><code class="language-C">// aligned on eights (or fours on 32 bit machines)
struct Vec {
    size_t len;
    size_t cap;
    void *data;
};
</code></pre>
<p>And a macro to define a new type of that datastructure.</p>
<pre><code class="language-C">#define VecDef(_type) \
    typedef struct { \
        struct Vec inner; \
        _type *phantom[0]; \
    }
</code></pre>
<p>Inserting the typedef directly in the macro allows me to define this type and
to export it rather than re-expanding the same macro every time. The main
drawback of this approach is that typedefs cannot be forward declared but
<code>structs</code> can.
Take note that the <code>phantom</code> field is a zero sized array of pointers to <code>_type</code>,
this way I can forward declare <code>_type</code>. Also as a bonus, the zero size array is
a zero-sized type (duh) and, in this case, adds no additional padding.</p>
<pre><code class="language-C">VecDef(int) IntVec;
VecDef(struct Pos) PosVec;
</code></pre>
<p>To then get some type safety, I make use of C11's <code>_Generic</code> keyword.</p>
<pre><code class="language-C">#define vecPush(vec, data) _Generic((data), typeof(**((vec)-&gt;phantom)): \
    vec_push(&amp;(vec)-&gt;inner, sizeof(**((vec)-&gt;phantom)), &amp;(data)))
</code></pre>
<p>By using <code>_Generic</code> here I'm able to check that the type passed in matches
exactly the type expected.</p>
<pre><code class="language-C">VecDef(int) IntVec;
IntVec a = {};

char b = 10;
// Controlling expression type 'char' not compatible
// with any generic association type
vecPush(&amp;a, b);
</code></pre>
<p>Hooper's way of type checking might be superior since the compiler
will tell you which type it expected instead of just saying your the type is incompatible.</p>
<p>He uses the ternary operator to assert that both the
parameter and the inner type match.</p>
<pre><code class="language-C">1 ? (param) : *(vec)-&gt;type
</code></pre>
<p>Sadly when it comes to reading a value out, I haven't found a way to have as
much control, I instead cast the pointer, which works great for pointer types,
but I open myself to C type casting rules if I try to dereference that pointer.</p>
<pre><code class="language-C">#define vecGetPtr(vec, idx) ((typeof(*(vec)-&gt;phantom))vec_get_ptr(\
    &amp;(vec)-&gt;inner, sizeof(**((vec)-&gt;phantom)), idx))

IntVec a = {};
// incompatible pointer types
double *r1 = vecGetPtr(&amp;a, 1);
// C silent type casting, meh
double r2 = *vecGetPtr(&amp;a, 1);
</code></pre>
<p>This dovetails pretty nicely with C23's <code>auto</code> keyword were I basically never
have to worry about type mismatch.</p>
<pre><code class="language-C">// r3 will always be the correct type
auto r3 = *vecGetPtr(&amp;a, 1);
</code></pre>
<p>I've found that this technique works pretty well and I've been able to build
all the reusable data-structure I've needed with it:</p>
<p>An Hashmap</p>
<pre><code class="language-C">#define HMapDef(type) \
    typedef struct { \
        struct HMap inner; \
        type *phantom[0]; \
    }
</code></pre>
<p>A Queue</p>
<pre><code class="language-C">#define QueueDef(type) \
    typedef struct { \
        struct Queue inner; \
        type *phantom[0]; \
    }
</code></pre>
<p>And many more.</p>
<p>The only generic data-structure I use not written in this way is
my implementation of a primary queue, and I'm planning to rewrite it this way in
order to make it type-safe, I just haven't taken the time to do it yet.</p>
<p><a href="https://gist.github.com/lorlouis/ba227cf544fe917aae0365b41e8c2d04">github gist with code examples used in this article</a></p>
</main><footer><div id="page_link_div"></div><p id="copyright">Found a typo?<a href="https://www.github.com/lorlouis/blog"> open a pr!</a><br>copyright Louis Sven Goulet 2023-2026</p></footer></body></html>]]></description></item><item><title><![CDATA[I wrote a bug and It made me reflect on OOP – It's not an OOP bashing post, surprisingly]]></title><link>https://louissven.xyz/article/i_wrote_a_bug.md</link><description><![CDATA[<!DOCTYPE html><html><head><base href="/"><link rel="stylesheet" href="data/site.css"><!--<meta name="viewport" content="width=device-width">--><link rel="apple-touch-icon" sizes="180x180" href="/data/favicon_io/apple-touch-icon.png"><link rel="icon" type="image/png" sizes="32x32" href="/data/favicon_io/favicon-32x32.png"><link rel="icon" type="image/png" sizes="16x16" href="/data/favicon_io/favicon-16x16.png"><link rel="manifest" href="/data/favicon_io/site.webmanifest"><title>I wrote a bug and It made me reflect on OOP</title><meta charset="UTF-8"><meta name="description" content="It's not an OOP bashing post, surprisingly"><meta name="author" content="Louis"><link rel="stylesheet" href="/data/highlight/styles/nord.min.css"><script src="/data/highlight/highlight.min.js"></script><script>hljs.highlightAll();</script></head><body><header><div id="header_top_div"><a href="/" class="header_element">home</a><a href="/articles" class="header_element">articles</a><a href="/data-policy" class="header_element">data policy</a><a href="https://github.com/lorlouis" class="header_element">github</a><a href="/rss" class="header_element">rss</a></div></header><main><h1>I wrote a bug and It made me reflect on OOP</h1>
<h2>The feature</h2>
<p>I wrote a piece of code that would search for matches in a text block and try
to correlate text position with line numbers.</p>
<pre><code class="language-rust">let re = Regex::new('substr');
// impl Iterator&lt;Item=usize&gt;
let source = re.find(text_block).map(|m| m.start());

let line_ends = [10, 14, 30];

for p in positions {
    let line_idx = line_ends.upper_bound(&amp;p);
    ...
}
</code></pre>
<p>The implementation was made to be generic and accept any iterator of <code>usize</code> as
a source. This made testing easier and I didn't have to specify the whole type
of <code>re.find(text_block).map(|m| m.start())</code> which is quite wordy.</p>
<h2>A performance improvement</h2>
<p>A few days after writing this patch, I found myself profiling the system, and
this part of the system turned out to be a bit slow. You might have already
noticed, but the regex always returns positions in sorted order: <code>1, 3, 6, 67...</code>. This made it pretty easy to ignore parts of the line-end array that had
already been searched.</p>
<pre><code class="language-rust">let re = Regex::new('substr');
// impl Iterator&lt;Item=usize&gt;
let source = re.find(text_block).map(|m| m.start());

let line_ends = [10, 14, 30];

let mut off = 0;

for p in positions {
    let line_idx = line_ends[off..].upper_bound(&amp;p);
    off = line_idx;
    ...
}
</code></pre>
<h2>The bug</h2>
<p>Requirements changed, along with some code and a new way to search for text was
added. It was still an iterator of <code>usize</code> and mostly returned positions in
increasing order, so it appeared to work great. It was faster than the previous
regex method, but it would sometimes return unsorted results sadly none of the
test cases caught that behaviour, so we ended up missing some line indices from
the returned value.</p>
<pre><code class="language-rust">let source = SuffixFinder::new(&quot;ends with this&quot;, text_block);

let line_ends = [10, 14, 30];

let mut off = 0;

for p in positions {
    // `p` could sometimes be lower than the position
    // present at `line_ends[off]` because the
    // positions are not returned in sorted order
    let line_idx = line_ends[off..].upper_bound(&amp;p);
    off = line_idx;
    ...
}
</code></pre>
<p>The fix was pretty simple but I kept thinking about this bug.</p>
<h2>Why is OOP relevant to this discussion</h2>
<p>When I was learning programming, a huge part of the curriculum was dedicated to
so-called &quot;object-oriented design&quot;. If I were to model the previous problem in
terms of object inheritance and interfaces, it would look like this.</p>
<pre><code class="language-text">               _____________________________________________
              |              Interface Searcher             |
              | * fn find_line(line_ends: [usize]) -&gt; usize |
              |___~_line_ends.upper_bound(self.p)___________|
 ___________________//________________________     ____\\______
|           Interface SortedSearcher          |   |SuffixFinder|
| * fn last_pos() -&gt; usize                    |
| * fn find_line(line_ends: [usize]) -&gt; usize |
|   ~ line_ends[self.last_pos()..]            |
|___~__________.upper_bound(self.p)___________|
           _____//____
          |RegexFinder|

fn find_position_line(s: Interface Searcher, line_ends: [usize]) -&gt; [usize]
~   let lines = Vec::new();
~   loop
~       let p = s.find_line(line_ends);
~       if p == usize::MAX
~           return lines
~       lines.push(p)
</code></pre>
<p>The optimisation would be implemented using specialisations and only trigger
when an object inherits from <code>SortedSearcher</code>. I would probably have had to
write a newtype that would have wrapped the regex type since it comes from a
library.</p>
<p>Generally, I dislike this type of modelling, it tends to result in poor data
locality due to a lot of objects having to be allocated in languages like Java.
And, in my opinion, it makes the code harder to think about,
<code>find_position_line</code> function now depends on two abstract classes implementing
parts of it's logic, so it requires a lot of jumping around in the code and
every time logic is updated in the <code>Searcher</code>, <code>SortedSearcher</code>'s
implementation needs to be checked and/or updated.</p>
<p>But had I had taken the time to model multiple levels of interfaces and written
the logic inside those interfaces (dependency injection) I would not have
written that bug, and that bugs <em>me</em>.</p>
</main><footer><div id="page_link_div"></div><p id="copyright">Found a typo?<a href="https://www.github.com/lorlouis/blog"> open a pr!</a><br>copyright Louis Sven Goulet 2023-2026</p></footer></body></html>]]></description></item><item><title><![CDATA[You don't have to boot from just 512 bytes – As long as you boot from a CD]]></title><link>https://louissven.xyz/article/your_stage_1_bootloader_can_be_as_large_as_you_want.md</link><description><![CDATA[<!DOCTYPE html><html><head><base href="/"><link rel="stylesheet" href="data/site.css"><!--<meta name="viewport" content="width=device-width">--><link rel="apple-touch-icon" sizes="180x180" href="/data/favicon_io/apple-touch-icon.png"><link rel="icon" type="image/png" sizes="32x32" href="/data/favicon_io/favicon-32x32.png"><link rel="icon" type="image/png" sizes="16x16" href="/data/favicon_io/favicon-16x16.png"><link rel="manifest" href="/data/favicon_io/site.webmanifest"><title>You don't have to boot from just 512 bytes</title><meta charset="UTF-8"><meta name="description" content="As long as you boot from a CD"><meta name="author" content="Louis"><link rel="stylesheet" href="/data/highlight/styles/nord.min.css"><script src="/data/highlight/highlight.min.js"></script><script>hljs.highlightAll();</script></head><body><header><div id="header_top_div"><a href="/" class="header_element">home</a><a href="/articles" class="header_element">articles</a><a href="/data-policy" class="header_element">data policy</a><a href="https://github.com/lorlouis" class="header_element">github</a><a href="/rss" class="header_element">rss</a></div></header><main><h1>You don't have to boot from just 512 bytes</h1>
<h2>Wait, what?</h2>
<p>Conventional wisdom says that you can only boot from the first sector of a
floppy (512 bytes) or something that looks and behaves like the first sector of
a floppy. But it doesn't have to be the case as long as you boot from a CD.</p>
<h3>The &quot;normal&quot; booting process</h3>
<p>Historically the IBM PC did not ship with a hard drive; it had a BASIC
interpreter in its ROM and up to 2 floppy disk drives. If you wanted a proper
operating system, the PC had to boot from a floppy containing an OS. The BIOS
looked for the magic numbers <code>[0x55, 0xAA]</code> at the end of each floppy's first
segment to detect if it could boot from it. Once a bootable drive was found,
the segment was loaded into memory at address <code>0x7C00</code>, and the CPU started
executing at that address. When hard drives came along, a similar technique was
used to boot from the <a href="https://en.wikipedia.org/wiki/Master_boot_record">MBR</a>,
but only 446 byes were available<sup><a href="#user-content-fn-1" id="user-content-fnref-1" data-footnote-ref="" aria-describedby="footnote-label">1</a></sup> compared to the floppies' 510.</p>
<p>Unsurprisingly, most modern PCs still support this booting mechanism. However,
some manufacturers have started to remove support for legacy BIOS booting in
favour of UEFI, but that's a story/rant for another time.</p>
<h2>A Minimal Bootable ISO</h2>
<h3>A tiny bit of context</h3>
<p>An ISO file is <em>just</em> a file containing an ISO 9660 file system which is the
file system that CDs use. PCs they boot off CDs via the <code>El Torito</code><sup><a href="#user-content-fn-2" id="user-content-fnref-2" data-footnote-ref="" aria-describedby="footnote-label">2</a></sup>
extension to ISO 9660 standard.</p>
<p>The format is pretty straight forwards:</p>
<pre><code class="language-no-hi">An ISO 9660 with EL TORITO extension
  (the bits to boot a PC at least)
  Offset
  0x0000_ _____________
         |    ....     |
         |  &lt;unused&gt;   |
         |    ....     |
  0x8000_|_____________|
  0x8800_|_primary_vol_|
         |_boot_record_| --.
         |    ....     |    |
         &lt;other volumes&gt;    |  addr
         |    ....     |    | of boot
         |_____________|    | catalog
         |__terminator_|    |
     .-- |_boot_catalog| &lt;-´
     `-&gt; |__boot_image_|
         |    ....     |
         |&lt;rest of the |
         | file system&gt;|
         |    ....     |
          ¯¯¯¯¯¯¯¯¯¯¯¯¯
</code></pre>
<ul>
<li>
<p>The first 0x8000 bytes are unused, go wild and use them however you want.
These bytes were left unused by the specification to allow for other booting
systems to work on CDs. When &quot;burning&quot; an ISO to a thumb drive, this section
generally contains an MBR or the UEFI equivalent.</p>
</li>
<li>
<p>The CD is segmented into fixed-sized segments, in most cases, 2048 bytes each.</p>
</li>
<li>
<p>The first segment used is at offset 0x8000, called the <code>Primary Volume Descriptor</code>.</p>
</li>
<li>
<p>The second segment at offset 0x8800 <em>may</em> be a <code>Boot Record</code>.</p>
</li>
<li>
<p>It is not required to read the CD's filesystem to boot from a CD.</p>
</li>
</ul>
<p>That last point piqued my interest. If you only care about finding something
that looks like a floppy and boot it, you can ignore most of the filesystem.
I wanted to see just how little of the spec I had to implement to build a
<code>Minimal Bootable ISO</code>.</p>
<h3>El Torito basics</h3>
<p>El Torito defines two sections of the CD, the <code>Booting Catalog</code>, which is
comprised of multiple entries containing information about one or more bootable
payloads. And the <code>Boot Record Volume</code>, which the BIOS uses to find the boot
catalog.</p>
<h4>Boot Record Volume Descriptor</h4>
<pre><code class="language-no-hi">          Boot Record Volume Descriptor
 _______________________________________________
|Offset_|__type___|____________Desc_____________|
|_0x000_|___u8____|__boot_record_indicator_=_0__|
| 0x001 |         |                             |
|  ...  | [u8; 5] | ISO-9660 identifier =&quot;CD001&quot;|
|_0x005_|_________|_____________________________|
|_0x006_|___u8____|_________version_=_1_________|
| 0x007 |         |   Boot system identifier    |
|  ...  | [u8;32] | =&quot;EL TORITO SPECIFICATION&quot;  |
|_0x026_|_________|_____________________________|
| 0x027 |         |                             |
|  ...  | [u8;32] |     Unused, &quot;must&quot; be 0     |
|_0x046_|_________|_____________________________|
| 0x047 |         |Sector id of the boot catalog|
|  ...  |   u32   |   sec_id * 2048 = offset    |
|_0x04a_|_________|_____________________________|
| 0x04b |         |                             |
|  ...  |[u8;1977]|     Unused, &quot;must&quot; be 0     |
| 0x7ff |         |                             |
 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    * All multi byte numbers are in little endian
</code></pre>
<p>The only value that changes is the <code>sector id of the boot catalog</code> (bytes <code>0x47</code>
to <code>0x4a</code>). Everything else is either zeroed or a magic value of some sort.</p>
<h4>The Boot Catalog</h4>
<p>The boot catalog defines where the boot payload(s) are located.
And is stored across one or more segments and is composed of a
series of entries.</p>
<pre><code class="language-no-hi">                    The Boot Catalog
bytes 0x00 ........ 0x1f
 0x00 [Validation Entry] &lt;- makes sure the data is not corrupted
 0x20 [  Initial Entry ] &lt;- contains info about a boot payload
 0x40 [ Section Header ] &lt;- info about section entries (optional)
 0x60 [ Section Entry 1] &lt;- info about a boot image 1 (optional)
 0x80 [  Entry Ext 1   ] &lt;- 13 bytes of data* (optional)
  --  |       :        |
 0x?? [ Section Entry N]
 0x?? [   Enty Ext N   ]

    * Multiple `Entry Ext` can be chained together.
</code></pre>
<p>For an ISO containing only one boot payload, we only need to consider the
<code>Validation Entry</code> and the <code>Initial Entry</code>.</p>
<h3>The Validation Entry</h3>
<p>The validation entry is used to detect if the content is corrupted.</p>
<pre><code class="language-no-hi">              Validation Entry
 ______________________________________________
|Offset|__type___|____________Desc_____________|
|_0x00_|___u8____|________header_id_=_1________|
|_0x01_|___u8____|____platform_id_=(1|2|3)_____|
| 0x02 |   u16   |     Unused, &quot;must&quot; be 0     |
|_0x03_|_________|_____________________________|
| 0x04 |         |                             |
|  ..  | [u8;24] |      manufacturer id        |
|_0x1b_|_________|_____________________________|
| 0x1c |   u16   |      checksum reserved      |
|_0x1d_|_________|_____________________________|
|_0x1e_|___u8____|____________0x55_____________|
| 0x1f |   u8    |            0xaa             |
 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
</code></pre>
<p><code>Platform id</code> is interesting because it was originally defined as:</p>
<pre><code class="language-rust">#[repr(u8)]
enum PlatformId {
    x86     = 0x0,
    PowerPC = 0x1,
    Mac     = 0x2,
}
</code></pre>
<p>But Mac, in this case, the Mac platform pre-Intel, never implemented booting
off a CD using El Torito. Although not in the standard, <code>0xef</code> is commonly used
to identify bootable images that rely on UEFI.</p>
<p>This is the enum I ended up using:</p>
<pre><code class="language-rust">#[repr(u8)]
pub enum Platform {
    X86 = 0,
    PPC = 1,
    Mac = 2, // mac is never used ?
    UEFI = 0xef, // not part of the spec..
}
</code></pre>
<p>The other noteworthy field is <code>checksum reserved</code>. A checksum is computed by
summing up the whole segment as a list of <code>u16</code>. This reserved <code>u16</code> is used to
ensure the sum wraps around to zero.</p>
<h3>The Initial Entry</h3>
<p>The second entry in the catalog is the initial entry; it contains info on a
segment containing a bare metal 16-bit &quot;real mode&quot; executable and how to load
it into memory.</p>
<pre><code class="language-no-hi">                Initial Entry
 ______________________________________________
|Offset|__type___|____________Desc_____________|
|_0x00_|___u8____|_boot_indicator_=(0x88|0x00)_|
|_0x01_|___u8____|___boot_media_type_=(0..=4)__|
| 0x02 |   u16   |      Load Segment addr      |
|_0x03_|_________|_____________________________|
|_0x04_|___u8____|_________system_type_________|
|_0x05_|___u8____|_____Unused_&quot;must&quot;_be_0______|
| 0x06 |   u16   |       Sector Count          |
|_0x07_|_________|_____________________________|
| 0x08 |         |    Block address of the     |
|  ..  |   u32   |         bootloader          |
|_0x0b_|_________|_____________________________|
| 0x0c |         |                             |
|  ..  | [u8;17] |     Unused &quot;must&quot; be 0      |
| 0x1f |         |                             |
 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
</code></pre>
<p>A  boot indicator of value <code>0x88</code> marks the entry as bootable, which, in
practice, is almost always set. The boot media type lets the BIOS expose this
sector to the executable as if it were a floppy, a hard drive or a CD. This
lets older operating systems like DOS boot and read data from a CD as if it
were a floppy without needing any extra drivers.</p>
<h4>Sector count</h4>
<p><code>Sector count</code> tells the BIOS how many sectors of the emulated device it should
load into memory. This lets you load more than one floppy segment. In CD mode,
this would let you load up to <code>128MB</code> of data into memory, crushing the merger
<code>510B</code> (if that) level 1 bootloaders need to restrict themselves to.</p>
<h2>Booting a payload larger than 512 bytes</h2>
<p>I uploaded the code I used to build my <code>Minimal Bootable ISO</code> to GitHub under
<a href="https://github.com/lorlouis/iso9660">https://github.com/lorlouis/iso9660</a>. Calling <code>make</code> will create a disk image
and run it via QEMU.
<a href="https://github.com/lorlouis/iso9660/blob/main/src/bin/bootable.rs"><code>src/bin/bootable.rs</code></a>
contains the steps to create the ISO. The steps loosely resemble:</p>
<ol>
<li>Create a primary header</li>
</ol>
<pre><code class="language-rust">let primary_header = VD {
    ty: VDType::PrimaryVD,
    version: 1,
};
</code></pre>
<p>This is needed as <a href="https://www.seabios.org/SeaBIOS">SeaBIOS</a>, the default i386
BIOS implementation in QEMU, checks to see if what's in the CD drive <em>really</em>
is a CD.</p>
<ol start="2">
<li>Create a boot record of the El Torito variety</li>
</ol>
<pre><code class="language-rust">let boot_record = BootRecord::el_torito(18);
</code></pre>
<p><code>18</code> here denotes the sector 18 at which the boot catalog will be placed. The
first 15 sectors are unused, the primary volume descriptor uses the 16th, and
the 17th is the boot record, which leaves the 18th sector free.</p>
<ol start="3">
<li>Create a validation entry</li>
</ol>
<pre><code class="language-rust">let validation = ValidationEntry {
    header_id: 1,
    platform_id: Platform::X86,
    manufacturer_id: None,
};
</code></pre>
<p>SeaBIOS does not check the sector's checksum so the <code>checksum reserved</code>
field is filled with 0s.</p>
<ol start="4">
<li>Create the initial entry</li>
</ol>
<pre><code class="language-rust">let initial = InitialEntry {
    boot_indicator: BootIndicator::Bootable,
    boot_media: BootMedia::Floppy1_44,
    load_segment: 0, // ie default value (I know it should be an option)
    sys_type: 0,  // no idea what it's supposed to be, idk it felt right
    sector_count: 4, // hmm intresting
    virtual_disk_addr: 19, // the last segment
};
</code></pre>
<p><code>sector_count</code> is set to 4 because of the boot media emulation. Floppy sectors
are 512 bytes long, and a CD sector is 2048 bytes long. It is possible to load
more than that, but I did not see any need for this proof of concept.</p>
<ol start="5">
<li>The last step is to concatenate the files into an ISO</li>
</ol>
<pre><code class="language-make"># create the 20 sectors required
dd if=/dev/zero of=$(ISO_FILE) count=20 bs=2048
# copy iso data in sector 17 and 18
dd if=$(ISO_DATA) of=$(ISO_FILE) seek=16 count=3 bs=2048 conv=notrunc
# copy stage 1
dd if=$(STAGE1_BIN) of=$(ISO_FILE) seek=$((19*4)) count=4 bs=512 conv=notrunc
</code></pre>
<p>The executable I loaded in the last sector was generated from this assembly</p>
<pre><code class="language-x86asm">org 0x7c00 ; address at which the bios will load this executable
bits 16 ; 16 bit mode

    ; initialise pointers
    mov ax, 0
    mov ds, ax ; data segment 0
    mov ss, ax ; stack segment 0
    mov es, ax ; extra segment 0?
    mov sp, 0x7c00 ; set stack pointer at the start of this executable

_start:
    mov si, hello
    call puts
    jmp other ; jump into code after the 512th byte

; si=str, cl=strlen
puts:
    lodsb
    or al, al
    jz .done
    call putc
    jmp puts
.done:
    ret

; al=char
putc:
    mov ah, 0eh
    int 10h
    ret

hello: db 'hello world!', 10, 13, 0
hello_len: equ $-hello

meme: db 'hello meme!', 0
meme_len: equ $-meme

times 510 - ($ - $$) db 0 ; fill with 0s until bytes 511
db 0x55, 0xaa ; mark the sector as bootable by setting the bytes 511 and 512

other:
    mov si, meme
    call puts
    hlt

times 2048 - ($ - $$) db 0 ; fill the rest of the disk sector with 0s
</code></pre>
<h2>Is this even remotely useful?</h2>
<p><strong>No.</strong></p>
<p>Most of the questions asking how to boot off more than 512 bytes come from
people trying to avoid writing a multi-stage bootloader, even though there are
many benefits to separating your bootloader in stages. This article details a
quirk of booting off a CD on the PC platform. None of it applies to ISOs burnt
to USB drives or booting from a hard drive and thus would still require you to
implement multi-stage booting.</p>
<section data-footnotes="" class="footnotes"><h2 id="footnote-label" class="sr-only">Footnotes</h2>
<ol>
<li id="user-content-fn-1">
<p><a href="https://en.wikipedia.org/wiki/Master_boot_record#Sector_layout">https://en.wikipedia.org/wiki/Master_boot_record#Sector_layout</a> <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Back to content" class="data-footnote-backref">↩</a></p>
</li>
<li id="user-content-fn-2">
<p><a href="https://pdos.csail.mit.edu/6.828/2014/readings/boot-cdrom.pdf">https://pdos.csail.mit.edu/6.828/2014/readings/boot-cdrom.pdf</a> <a href="#user-content-fnref-2" data-footnote-backref="" aria-label="Back to content" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>
</main><footer><div id="page_link_div"></div><p id="copyright">Found a typo?<a href="https://www.github.com/lorlouis/blog"> open a pr!</a><br>copyright Louis Sven Goulet 2023-2026</p></footer></body></html>]]></description></item><item><title><![CDATA[Building a blog with Rust in 2023 – (the stupid way)]]></title><link>https://louissven.xyz/article/building_a_blog_with_rust_in_2023.md</link><description><![CDATA[<!DOCTYPE html><html><head><base href="/"><link rel="stylesheet" href="data/site.css"><!--<meta name="viewport" content="width=device-width">--><link rel="apple-touch-icon" sizes="180x180" href="/data/favicon_io/apple-touch-icon.png"><link rel="icon" type="image/png" sizes="32x32" href="/data/favicon_io/favicon-32x32.png"><link rel="icon" type="image/png" sizes="16x16" href="/data/favicon_io/favicon-16x16.png"><link rel="manifest" href="/data/favicon_io/site.webmanifest"><title>Building a blog with Rust in 2023</title><meta charset="UTF-8"><meta name="description" content="(the stupid way)"><meta name="author" content="Louis"><link rel="stylesheet" href="/data/highlight/styles/nord.min.css"><script src="/data/highlight/highlight.min.js"></script><script>hljs.highlightAll();</script></head><body><header><div id="header_top_div"><a href="/" class="header_element">home</a><a href="/articles" class="header_element">articles</a><a href="/data-policy" class="header_element">data policy</a><a href="https://github.com/lorlouis" class="header_element">github</a><a href="/rss" class="header_element">rss</a></div></header><main><h1>Building a blog with rust in 2023</h1>
<p>As per the name, this blog is <em>imperfect</em>. I built it with the idea of making it
as easy as possible for me to get something released without <em>bikeshedding</em> too
much. Building it in rust made it easier to do that, but it still came at a
cost.</p>
<h2>Requirements</h2>
<p>There were a few thing I was not willing to compromise on:</p>
<ul>
<li>I don't want to have to write HTML to write blog posts, markdown all the way</li>
<li>I don't want to have to deal with NGINX (too many knobs to tune)</li>
<li>I don't want to rely on any JavaScript (I'm already making you read bad
prose, I can't also have you run awful JS)</li>
<li>No PHP (it's personal I just don't like to write PHP)</li>
<li>HTTP and HTTPS</li>
</ul>
<h2>Nice to have</h2>
<ul>
<li>Only one binary with minimal configuration</li>
<li>Adding an article should not require me to restart the server</li>
<li>It should run on Linode's base tier machine AKA: a toaster</li>
<li>I'd like to be able to mix code and templates (PHP style)</li>
<li>No need for docker</li>
<li>It should be readable in <a href="http://links.twibright.com/">Links</a></li>
</ul>
<h2>The options I considered</h2>
<h3>Any static site generator build around Markdown</h3>
<p>Let's be honest. This blog is mainly composed of static pages. I could have
gotten away with using a static website generator and hosting the
HTML files on GitHub or something. But I wanted to avoid having individual HTML
pages for each page of &lt;/articles&gt;. I looked into a couple of options, but
honestly, it looked more fun to build my own thing than to use someone else's.</p>
<h3>Briefly considering Python</h3>
<p>I looked into using Python, more specifically
<a href="https://flask.palletsprojects.com/en/2.2.x/">Flask</a>, as I have used it in the
past, but having maintained Python projects, they tend to <em>rot</em>. Python lacking
a way to pin down a version of a dependency, and Pip being generally
unhygienic; I found that trying to get a Python project deployed outside a
Docker container is more complicated than it should be. And I did not want to
have to use Docker. I know there are ways to make it work, but I did not want
to end up in dependency hell, and there was something else I wanted to try...</p>
<h3>What about <del>re</del>writing it in Rust?</h3>
<p>I've been working in Rust for the last year or so at my <code>$JOB</code>, and I must
say, it has grown on me. Some parts of the languages are not as mature as I'd
like them to be (custom allocators, async traits, etc), but overall I'd say
it's a good replacement for projects where you'd typically reach for C++ or
Java. I've heard a few people talking about using it to build server backends,
and I wanted to learn more about the state of Rust frameworks for the web.</p>
<h2>The popular Rust web frameworks</h2>
<p>Early on, I learned about different web frameworks, mostly through <a href="https://github.com/flosse/rust-web-framework-comparison">Flosse's
rust web framework comparison
</a> rust web framework
comparison. The list makes it easy to know if a given framework support a
common feature, and I encourage anyone who thinks about using Rust to build web
applications to give it a look. It contains a list of both frontend frameworks
and backend frameworks. The frontend frameworks compile to WebAssembly; even
though it's not JavaScript, I still wanted to stay clear from requiring the
user to run code to display this website.</p>
<h3>Rocket</h3>
<p>Even though <a href="https://rocket.rs/">Rocket</a> is marked as &quot;outdated&quot; in <a href="https://github.com/flosse/rust-web-framework-comparison">Flosse's
list</a>, development
seems to still be going strong. In fact, the most recent commits are only about
2 weeks old, at the time of writing. Rocket is very much a &quot;batteries included&quot;
type of framework. I was quickly able to get an early version of the
&lt;/articles&gt; page going, but I ended up not using it because I found the number
of dependencies a bit too high and the build times (on my old decaying laptop)
too slow for my liking. It looks to be a great framework that comes with
everything you would need to build complex websites with forms and stuff, but
it felt overkill for my usage.</p>
<h3>Warp</h3>
<p>I was looking for a framework that would be a tad smaller. Having used
<a href="https://docs.rs/warp/latest/warp/">warp</a> at my <code>$JOB</code> before, I briefly
considered it. Warp is built on top of Rust Generics and its type system. This
means that a lot of it feels magic, just add a few filters and some
<code>serde::Deserialize</code> implementing types, and you'll have a working API
endpoint in no time... Except that warp, due to its <em>liberal</em> use of generics,
contributes a lot to the overall time it takes to build our projects. But my
biggest gripe with warp is that when things go wrong (which is a compile-time,
at least) it generates compile errors that compete with some of the worse C++
template errors I've had the displeasure of seeing.</p>
<h3>Actix Web</h3>
<p><a href="https://actix.rs/docs/whatis">Actix Web</a> describes itself as a
&quot;micro-framework&quot; (much like flask is often described). It handles routing,
HTTP/1, HTTP/2, HTTPS and typed HTML queries (<code>q?key=value</code>). The only thing I
needed was templating. It required an <code>async main</code> since it's built on top of
tokio, but it does not manage every part of the program the same way Rocket
would. To me, it felt easier to compose with other libraries, so I stuck with
it.</p>
<h2>HTML templating</h2>
<p>When I was experimenting with Rocket, I also tried
<a href="https://docs.rs/handlebars/latest/handlebars/">handlebars</a> as the main
templating engine. It worked well, but to me, it felt awkward to have the code
and the format in 2 different places. The last time I did any kind of web
development was back in college, most of which was done with PHP. Although I
don't really like PHP, there was one thing I really liked (and apparently other
people don't): you can mix HTML and code.</p>
<h3>typed-html</h3>
<p>When I found <a href="https://github.com/bodil/typed-html">typed-html</a> it seemed to be
exactly what I was looking for, I could embed HTML with the <code>html!</code> macro
and I could use rust expressions within that macro to build web pages
server-side. The first page I build was the &lt;/articles&gt; page and I quickly ran
into a limitation of typed-html, due to typed-html's goal of making it easy to
build correct HTML through type safety it won't allow you to use a code block
as the first child of certain tags.</p>
<pre><code class="language-html">&lt;!-- Not allowed --&gt;
&lt;head&gt; { /* rust code */ } &lt;/head&gt;

&lt;!-- Ok  --&gt;
&lt;head&gt;
    &lt;h1&gt;&quot;A title&quot;&lt;/h1&gt;
    { /* rust code */ }
&lt;/head&gt;
</code></pre>
<p>This is done so that It can guarantee a certain level of correctness, IE: no
<code>&lt;head&gt;</code>s in <code>&lt;head&gt;</code>s. I wanted to have functions to define common headers and
common footers for each page, but this limitation made it pretty awkward. One
thing I took away from the experiment (probably the wrong one based on the
library's name): I could use Rust macros to embed arbitrary tokens in my Rust
code.</p>
<h2>Building my own version of the <code>html!</code> macro</h2>
<p>I wanted to be able to reference variables and evaluate expressions, not just
enumerations, in <code>{ }</code> brackets. I found typed-html pretty limiting, and I
hit maximum recursion a few times while trying to build fairly simple pages; I
had to write my own. I won't go into details, but I made the code available on
GitHub under <a href="https://github.com/lorlouis/html_template">https://github.com/lorlouis/html_template</a>, the code is
definitely not perfect, but it worked well enough to build this blog.</p>
<p>With that, I had all the elements I needed to build this blog.</p>
<p>Here's part of the code I use to turn a Markdown file into an article</p>
<pre><code class="language-rust">...
let body: Root = html!{
    &lt;!DOCTYPE html&gt;
    &lt;html&gt;
        &lt;head&gt;
            {common_head(real_title.clone(), author.cloned(), blurb.cloned())}
        &lt;/head&gt;
        &lt;body&gt;
            &lt;header&gt;
            { common_header() }
            &lt;/header&gt;
            &lt;main&gt;
            { markdown.to_html() }
            &lt;/main&gt;
            &lt;footer&gt;
            { common_footer() }
            &lt;/footer&gt;
        &lt;/body&gt;
    &lt;/html&gt;
}.into();
...
</code></pre>
<h2>Final Thoughts</h2>
<p>In the end, I built a fairly unsophisticated blog using mostly pre-existing
libraries. The downside of this approach is that while I was paying attention
to not pull-in too many dependencies, I now depend on 168 external
dependencies. Using Actic-web made routing and handling query parameters
really easy. I'm also glad I built
<a href="https://github.com/lorlouis/html_template">html_template</a> as it was the first
time I had ever used Rust's proc-macros, and it made building HTML pages
in-code much easier.</p>
</main><footer><div id="page_link_div"></div><p id="copyright">Found a typo?<a href="https://www.github.com/lorlouis/blog"> open a pr!</a><br>copyright Louis Sven Goulet 2023-2026</p></footer></body></html>]]></description></item><item><title><![CDATA[An imperfect blog – for the sake of getting it out-there]]></title><link>https://louissven.xyz/article/an_imperfect_blog.md</link><description><![CDATA[<!DOCTYPE html><html><head><base href="/"><link rel="stylesheet" href="data/site.css"><!--<meta name="viewport" content="width=device-width">--><link rel="apple-touch-icon" sizes="180x180" href="/data/favicon_io/apple-touch-icon.png"><link rel="icon" type="image/png" sizes="32x32" href="/data/favicon_io/favicon-32x32.png"><link rel="icon" type="image/png" sizes="16x16" href="/data/favicon_io/favicon-16x16.png"><link rel="manifest" href="/data/favicon_io/site.webmanifest"><title>An imperfect blog</title><meta charset="UTF-8"><meta name="description" content="for the sake of getting it out-there"><meta name="author" content="Louis"><link rel="stylesheet" href="/data/highlight/styles/nord.min.css"><script src="/data/highlight/highlight.min.js"></script><script>hljs.highlightAll();</script></head><body><header><div id="header_top_div"><a href="/" class="header_element">home</a><a href="/articles" class="header_element">articles</a><a href="/data-policy" class="header_element">data policy</a><a href="https://github.com/lorlouis" class="header_element">github</a><a href="/rss" class="header_element">rss</a></div></header><main><h1>An imperfect blog</h1>
<p>I have trouble finishing projects, my <code>Projects/</code> directory is filled
with half-finished parsers, barely working game engines, three slightly different
versions of a web server and more than a dozen other side projects I started
but never finished. This blog is no exception; I started considering having
my own website two years ago, and in the meantime, the only things I built towards
that goal are:</p>
<ol>
<li>A web server written in C that I gave up on while trying to handle multiple
connections per thread</li>
<li>A few CGI scripts to learn how CGI works (</li>
<li>And a barely functioning templating system built around the C preprocessor</li>
</ol>
<p>As you can see, I made zero progress on <strong>actually</strong> writing a blog. I thought
about it and decided I should start finishing projects more often.</p>
<h2>The average life cycle of a project</h2>
<pre><code class="language-no-hi">      Get a project idea
              ¦ &lt;---------------.
              v                  \
 Try and build the &quot;hard&quot; part    \
        /             \            ¦
   [success]       [failure]       ¦
       v               v          /
Get bored and move   learn more  /
to another project   on the subject

    ??? -&gt; Finished project
</code></pre>
<p>Whenever I start a new project, I tend to try to build what I consider the
&quot;hardest&quot; part first. This means that when I made a 2.5D &quot;engine&quot; <em>à la</em>
Wolfenstein, I only wrote the bare minimum to get to the code that renders a maze.
This meant that I learned a lot about how to write a raycaster, and I even
got it to &quot;pan&quot; the camera up and down. But once I got that working, I completely
lost interest and started a new project to learn about some other concept I
thought was interesting at the time.</p>
<h2>Chasing perfection</h2>
<p>Usually, when I work on a project, it is to learn about some concept I
heard of recently. 99% of my projects are never indented to be shared, but it
did not use to be the case. I used to put whatever I was working on, finished
or not, in a public repo on my GitHub. As I learned more, I became critical of
my code. It allowed me to improve my craft considerably, but it came at a cost:
rereading a piece of code a few months after having written, makes me ashamed.</p>
<h2>Getting myself to finish <em>something</em></h2>
<p>I don't want to push myself to finish every project I start; otherwise I'll end
up not starting any new projects. I think I should prioritise certain projects,
such as this blog, and get them released.</p>
<p>If you can read this, I managed to get something imperfect out the door.</p>
</main><footer><div id="page_link_div"></div><p id="copyright">Found a typo?<a href="https://www.github.com/lorlouis/blog"> open a pr!</a><br>copyright Louis Sven Goulet 2023-2026</p></footer></body></html>]]></description></item></channel></rss>