upvote
I'm a died-in-the-wool, responsive, readable, internationalizable, accessible, standards-based, enshyenist:

Instead of using an unbreakable em dash to rigidly and unbreakably connect two phrases by their last and first words, I prefer using an en dash, followed by a shy hyphen, and then another en dash, to elegantly hyphenate words connected by em dashes when they don't fit on the line. ;)

–­–

reply
Few fonts will render this nicely; the dashes are unlikely to join. Also if it does break at the soft hyphen, you’ve got an extraneous hyphen added on the first line.

If I were doing that, I’d probably use a zero-width space instead of a soft hyphen. Same break opportunity, removes the extraneous undesirable hyphen if it breaks, but introduces a new word boundary so that wordwise selection can now split your wonky dash. Therefore I suggest <span style=user-select:all>–&ZeroWidthSpace;–</span> because if you’re going to do something ridiculous you might as well embrace the ridiculosity.

reply
More folks should define their own lightweight markup languages! It’s fun and makes your writing and notes feel more like your own.

I created a convention for defining sub-notes (with frontmatter) in a Markdown note and have found it really helpful over the past few years.

reply
I used to do this with RST, though a backslash is needed at the end of the line to escape the newline.
reply
I don’t like reStructuredText’s backslash behaviour, because it means two completely different things. Or arguably three. Normally it means to interpret the next character literally, but if it’s followed by whitespace (typically space or newline) it instead removes that next character. Except… not entirely in the case of newline, because it’s character-level markup, and at the end of a block it just does nothing. In

  a\
   b
you might expect to get “a b” or an error, but actually you get a single-item definition list with term “a” and definition “b”, just the same as if you had omitted the backslash.

A far more logical meaning of a trailing backslash is to escape the newline, meaning, in HTML terms, insert <br>. That’s what I chose in my LML, and I later learned CommonMark chose that too.

reply
> meaning of a trailing backslash is to escape the newline

That's what it does in this example. Don't have to use other cases, and don't believe I did.

reply
In hindsight “escape” was a poor choice of word, but I did explain it and you omitted that from your quote: “meaning, in HTML terms, insert <br>”. And that’s not what reStructuredText does. Rather, at the end of a line, backslash acts like a line continuation character (… that only works in certain circumstances), a behaviour commonly found in programming languages inside at least string literals, but such languages aren’t using backslash as “escape the next character”, but rather they have a fixed set of escape sequences like \n or \uXXXX.
reply
> em dashes are, in most locales, not to be surrounded by a space

This is definitely not the case for at least French and Russian, which means markup renderers now have to guess text language or force authors to declare such in some metadata header. And it gets even more complicated with inclusion of block quotes in different languages.

reply
It’s not hard and doesn’t need language awareness; I described how to detect it: if there’s no space before an end-of-line em dash, suppress the segment-break-replacing space.
reply
There seem to be some locales or styles that use asymmetric spacing. From the Zen of Python—note different spacing based on context and position within the sentence:

    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    [...]
    Namespaces are one honking great idea -- let's do more of those!
reply
You have missed a joke: https://bugs.python.org/issue3364.
reply
Well, so I have!
reply
Unicode has U+200B ZERO WIDTH SPACE for that purpose. In HTML and hence Markdown you can also use `<wbr>`. If you’re using a custom setup anyway, you can have it be inserted automatically by regex replacement, as a pre-rendering step.
reply
I think you’ve misunderstood something? This is about suppressing the turning of a segment break into a space, not about line break opportunities.

> Unicode has U+200B ZERO WIDTH SPACE for that purpose.

ZWSP is not at all “for that purpose”. If you mean this:

  A—&ZeroWidthSpace;
  B
Well, I am mildly surprised to find that no extra space is added in Gecko or Blink. But in WebKit, a space is still added; for this is part of the “UA-defined” bit I quoted.

And if you’re willing to do preprocessing, you can just merge the lines, that’d actually work.

> In HTML and hence Markdown you can also use `<wbr>`.

I fail to see how <wbr> is relevant.

reply
Indeed, I skimmed a bit and misread “unable to break” to mean that you wanted a line-break opportunity but the renderer didn’t allow for it when a letter is directly following an em dash. But it’s the other way around, you want a line break in the source after an em dash to not translate into a space in the rendering. This would likewise be possible to handle by regex replacement as a pre-rendering step.

More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.

reply
> More generally, I see markup languages and the details of how they are rendered as largely orthogonal. You don’t necessarily need to invent a different markup language in order to adjust the rendering.

There’s not much to a markup language beyond how it’s rendered. If you don’t ever want to render it to something other than plain text, just write plain text however you desire. The reason for choosing a particular markup language is to express intended semantics (for plain-text and rendered use), and to render it. The semantics aspect is legitimate, so I won’t say the language and rendering are identical or parallel, but they’re definitely nothing like orthogonal. If you’re using a CommonMark pipeline, any preprocessing you do means you’re not actually writing in CommonMark, but an incompatible variant of it. You may well deem it worthwhile, but it’s no longer the same markup language.

reply