Adventures in Preformatted Text

Published 9^th November, 2021 in Technology on A Place For My Head. Tagged Blogging, Colophon, CSS, Eleventy, HTML, JavaScript, Markup & Web Development. #21 in the Colophon: Finding A Place For My Head series.

I improved the markup for output. Something like this:

```output
terminal output
```

Normally produces markup similar to:

<pre class="language-output"><code language="language-output">terminal output</code></pre>

But it should actually use samp to indicate quoted output instead of code. To fix this, I’ve added a post-processing step. The same Markdown now produces:

<pre class="output"><samp class="">terminal output</samp></pre>

Which can be styled differently from code if I so choose. (The empty class attribute is a bit of laziness on my part and could be omitted.)

One complication is that, for example, a shell session might include commands typed by the user intermingled, so it might be more correct to have parts of the session in samp and parts in kbd. However, neither does Markdown have support for anything like this nor can I imagine any straightforward syntax. Nor do I think it’s incorrect, strictly speaking—technically, the shell session includes the result of the user pressing keys.

Line numbers and code blocks

I wanted to add line numbers to code blocks. The hitch was that I would need to do it after converting them to HTML, by adding span elements around each line. I expected this to require a complicated algorithm that kept track of the current stack. Then I realized I could do it in a more roundabout way, with a remark plugin that:

Converts the contents of each pre into a string.
Splits it into lines.
Uses an existing library to parse each line, automatically closing open tags at the end.
Converts those parsed lines back into strings.
Parses those strings into the hast format remark needs.

I tried this approach with sanitize-html, which didn’t work because it double-escaped >, and then clean-html, which didn’t preserve whitespace. Through trial and error, I determined I could instead parse the incomplete lines with parse5, the output from which can be directly converted to hast.

This worked to wrap the lines in spans. Then I added the numbers. I found myself adding another layer of spans around the contents of each line for styling purposes, and had to make the remark plugin communicate with the CSS through style attributes and Custom Properties (though this would have been unnecessary if display: subgrid were supported), but I managed to get it working:

Two formatted code blocks with line numbers.

This revealed three new issues. First, my main reason for wanting to split the blocks into lines was so I could use white-space: pre-wrap and avoid horizontal scrollbars, but the blocks looked very odd with that change. Second, when I reverted to the default of white-space: pre, the blocks collapsed into a single line. Why? Because, unaware that it was parsing preformatted text, parse5 discarded the newline characters I had added between the spans. Third, I had to use word-break: break-all to make it wrap the lines properly, which is acceptable in an editor like Emacs, which can display continuation markers in the margins, but doesn’t look right on the web, which can’t.

Now, there are solutions. For example, I could explicitly wrap the lines in pre elements before passing them to parse5 and extract only the children when returning them to remark… but why do that? The result is unappealing. Considering the lengths I went to only to arrive at a disappointing conclusion, I decided to undo the changes, though I made sure to keep them in the repository history for future inspiration.