upvote
> em dashes are, in most locales, not to be surrounded by a space

This is definitely not the case for at least French and Russian, which means markup renderers now have to guess text language or force authors to declare such in some metadata header. And it gets even more complicated with inclusion of block quotes in different languages.

reply
It’s not hard and doesn’t need language awareness; I described how to detect it: if there’s no space before an end-of-line em dash, suppress the segment-break-replacing space.
reply
There seem to be some locales or styles that use asymmetric spacing. From the Zen of Python—note different spacing based on context and position within the sentence:

    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    [...]
    Namespaces are one honking great idea -- let's do more of those!
reply
More folks should define their own lightweight markup languages! It’s fun and makes your writing and notes feel more like your own.

I created a convention for defining sub-notes (with frontmatter) in a Markdown note and have found it really helpful over the past few years.

reply
Unicode has U+200B ZERO WIDTH SPACE for that purpose. In HTML and hence Markdown you can also use `<wbr>`. If you’re using a custom setup anyway, you can have it be inserted automatically by regex replacement, as a pre-rendering step.
reply
I think you’ve misunderstood something? This is about suppressing the turning of a segment break into a space, not about line break opportunities.

> Unicode has U+200B ZERO WIDTH SPACE for that purpose.

ZWSP is not at all “for that purpose”. If you mean this:

  A—&ZeroWidthSpace;
  B
Well, I am mildly surprised to find that no extra space is added in Gecko or Blink. But in WebKit, a space is still added; for this is part of the “UA-defined” bit I quoted.

And if you’re willing to do preprocessing, you can just merge the lines, that’d actually work.

> In HTML and hence Markdown you can also use `<wbr>`.

I fail to see how <wbr> is relevant.

reply
Indeed, I skimmed a bit and misread “unable to break” to mean that you wanted a line-break opportunity but the renderer didn’t allow for it when a letter is directly following an em dash. But it’s the other way around, you want a line break in the source after an em dash to not translate into a space in the rendering. This would likewise be possible to handle by regex replacement as a pre-rendering step.

More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.

reply
I'm a died-in-the-wool, responsive, readable, internationalizable, accessible, standards-based, enshyenist:

Instead of using an unbreakable em dash to rigidly and unbreakably connect two phrases by their last and first words, I prefer using an en dash, followed by a shy hyphen, and then another en dash, to elegantly hyphenate words connected by em dashes when they don't fit on the line. ;)

&ndash;&shy;&ndash;

reply