Adventures in Preformatted Text
I improved the markup for output. Something like this:
```output
terminal output
```
Normally produces markup similar to:
<pre class="language-output"><code language="language-output">terminal output</code></pre>
But it should actually use samp to indicate quoted
output instead of code. To fix
this, I’ve added a post-processing step. The same Markdown now produces:
<pre class="output"><samp class="">terminal output</samp></pre>
Which can be styled differently from code if I so choose. (The empty class attribute is a bit of
laziness on my part and could be omitted.)
One complication is that, for example, a shell session might include commands typed by the user
intermingled, so it might be more correct to have parts of the session in samp and parts in kbd.
However, neither does Markdown have support for anything like this nor can I imagine any
straightforward syntax. Nor do I think it’s incorrect, strictly speaking—technically, the shell
session includes the result of the user pressing keys.
Line numbers and code blocks
I wanted to add line numbers to code blocks. The hitch was that I would need to do it after
converting them to HTML, by adding span elements around each line. I expected this to
require a complicated algorithm that kept track of the current stack. Then I realized I could do it
in a more roundabout way, with a remark plugin that:
- Converts the contents of each
preinto a string. - Splits it into lines.
- Uses an existing library to parse each line, automatically closing open tags at the end.
- Converts those parsed lines back into strings.
- Parses those strings into the hast format remark needs.
I tried this approach with sanitize-html, which
didn’t work because it double-escaped >, and then
clean-html, which didn’t preserve whitespace. Through
trial and error, I determined I could instead parse the incomplete lines with
parse5, the output from which can be directly converted to
hast.
This worked to wrap the lines in spans. Then I added the numbers. I found myself adding another
layer of spans around the contents of each line for styling purposes, and had to make the remark
plugin communicate with the CSS through style attributes and Custom Properties (though this
would have been unnecessary if display: subgrid were
supported), but I managed to get it working:
This revealed three new issues. First, my main reason for wanting to split the blocks into lines was
so I could use white-space: pre-wrap and avoid horizontal
scrollbars, but the blocks looked very odd with that change. Second, when I reverted to the default
of white-space: pre, the blocks collapsed into a single line. Why? Because, unaware that it was
parsing preformatted text, parse5 discarded the newline characters I had added between the spans.
Third, I had to use word-break: break-all to make it wrap the lines properly, which is acceptable
in an editor like Emacs, which can display continuation markers in the margins, but doesn’t look
right on the web, which can’t.
Now, there are solutions. For example, I could explicitly wrap the lines in pre elements before
passing them to parse5 and extract only the children when returning them to remark… but why do that?
The result is unappealing. Considering the lengths I went to only to arrive at a disappointing
conclusion, I decided to undo the changes, though I made sure to keep them in the repository
history for future inspiration.