undefined

upvote

points

by Aachen10 hours ago |

upvote

by cogman109 hours ago|

[-]

> Who would ever want this?

The main case I can think of is wanting some forum functionality. Perhaps you want to allow your users to be able to write in markdown. This would provide an extra layer of protection as you could take the HTML generated from the markdown and further lock it down to only an allowed set of elements like `h1`. Just in case someone tried some of the markdown escape hatches that you didn't expect.

reply

upvote

by Aachen9 hours ago|

[-]

> This would provide an extra layer of protection

I think this might be the answer. There's no point to it by itself (either you separate data and code or you don't and let the user do anything to your page), but if you're already using a sanitiser and you can't use `textContent` because (such as with Markdown) there'll be HTML tags in the output, then this could be extra hardening. Thanks!

reply

upvote

by iLoveOncall5 hours ago|

[-]

You'd never want to store the processed HTML anyway, this is website building 101.

reply

upvote

by efilife3 hours ago|

[-]

I store both, to serve processed HTML faster, and to be able to rebuild it just in case. Is this ok?

reply

upvote

by piccirello8 hours ago|

[-]

`setHTML` is meant as a replacement for `innerHTML`. In the use case you describe, you would have never wanted `innerHTML` anyway. You'd want `innerText` or `textContent`.

reply

upvote

by iLoveOncall5 hours ago|

[-]

But that's what setHTML isn't at all a replacement for innerHTML.

You still need innerHTML when you want to inject HTML tags in the page, and you could already use innerText when you didn't want to.

Having something in between is seriously useless.

reply

upvote

by Dylan168072 hours ago|

[-]

> You still need innerHTML when you want to inject HTML tags in the page

What makes you say this?

reply

upvote

by itishappy9 hours ago|

[-]

> If the default configuration of setHTML( ) is too strict (or not strict enough) for a given use case, developers can provide a custom configuration that defines which HTML elements and attributes should be kept or removed.

reply

upvote

by Aachen9 hours ago|

[-]

Injecting markup into someone else's website isn't what I'd call too strict a default configuration

If you mean to convey that it's possible to configure it to filter properly, let me introduce you to `textContent` which is older than Firefox (I'm struggling to find a date it's so old)

reply

upvote

by itishappy9 hours ago|

[-]

That's the whole point of the setHTML.

How would I set a header level using textContent?

reply

upvote

by Aachen9 hours ago|

[-]

The traditional way: separating data and code

    document.createElement("h1").textContent = `Hello, ${username}!`

If you allow <h1> in the setHTML configuration or use the default, users with the tag in their username also always get it rendered as markup

reply

upvote

by itishappy8 hours ago|

[-]

It sounds like you're arguing against a specific usecase, rather than the technology itself. If you don't want arbitrary markup in usernames, setHTML would absolutely be the wrong choice, but that's not really a good argument against setHTML.

reply

upvote

by matsemann9 hours ago|

[-]

Which is why you only use it where you want to allow some kind of html..?

reply

upvote

by byproxy9 hours ago|

[-]

> but this still allows arbitrary markup to the page (even <style> CSS rules) if I'm reading the docs correctly.

If that's true, seems like it's still a security risk given what you can do with CSS these days: https://news.ycombinator.com/item?id=47132102

reply

upvote

by circuit108 hours ago|

[-]

You can use selectors to gain some information about things like input fields, e.g. https://www.invicti.com/blog/web-security/private-data-stole...

Or I guess you could completely restyle and change the text of UI elements so it looks like the user is doing one thing when they're actually doing something completely different like sending you money

reply

upvote

by qingcharles8 hours ago|

[-]

Back in 2002 (?) I got banned from a certain auction site because I managed to inject HTML into my username that made it so once I had bid the "Bid" button disappeared for all subsequent users.

reply

upvote

by jerf9 hours ago|

[-]

If I'm reading this right,

    .setHTML("<h1>Hello</h1>", new Sanitizer({}))

will strip all elements out. That's not too difficult.

Plus this is defense-in-depth. Backends will still need to sanitize usernames on some standard anyhow (there's not a lot of systems out there that should take arbitrary Unicode input as usernames), and backends SHOULD (in the RFC sense [1]) still HTML-escape anything they output that they don't want to be raw HTML.

[1]: https://www.rfc-editor.org/rfc/rfc2119

reply

upvote

by evilpie7 hours ago|

[-]

You aren't reading it right.

  new Sanitizer({})

This Sanitizer will allow everything by default, but setHTML will still block elements/attributes that can lead to XSS.

You might want something like:

  new Sanitizer({ replaceWithChildrenElements: ["h1"], elements: [], attributes: [] })

This will replace <h1> elements with their children (i.e. text in this case), but disallow all other elements and attributes.

reply

upvote

by benmmurphy8 hours ago|

[-]

i think the use case for setHTML is for user content that contains rich text and to display that safely. so this is not an alternative for escaping text or inserting text into the DOM but rather a method for displaying rich text. for example maybe you have an editor that produces em, and strong tags so now you can just whitelist those tags and use setHTML to safely put that rich text into the DOM without worrying about all the possible HTML parsing edge cases.

reply

upvote

by embedding-shape10 hours ago|

[-]

> So you can still inject <h1> or <br><br><br>... etc into your username, in the given example

How exactly, given that setHTML sanitizes the input? If you don't want to have any HTML tags allowed, seems you can configure that already? https://wicg.github.io/sanitizer-api/#built-in-safe-default-...

reply

upvote

by Aachen9 hours ago|

[-]

> How exactly, given that setHTML sanitizes the input?

The article says that the output is:

    <h1>Hello my name is</h1>

So it keeps (non-script) html tags (and presumably also attributes) in the input. Idk how you're asking "how" since it's the default behavior

Stripping HTML tags completely has always been possible with the drop-in replacement `textContent`. Making a custom configuration object for that is much more roundabout

reply

upvote

by embedding-shape9 hours ago|

[-]

Yes, because that's the default configuration, if you don't want that, stop using the default configuration? It's still sanitizing away the common XSS holes, hence it's a safer alternative to .innerHTML, and a more flexible alternative to .innerText

reply

upvote

by Aachen9 hours ago|

[-]

Shouldn't use innerText anyway (nonstandard, worse performance, tries to parse the HTML and gives you unexpected behavior if e.g. a style is set that makes an element invisible but still has text inside, doesn't work on all DOM nodes...)

I can see how it's a way of allowing some tags like bold and italic without needing a library or some custom parser, but I didn't understand what the point of this default could be and so why it exists (a sibling comment proposed a plausible answer: hardening on top of another solution)

> Yes, because that's the default configuration, if you don't want that, stop using the default configuration?

"don't use it if it's not what you want" is perhaps the silliest possible answer to the question "what's the use-case for this"

reply

upvote

by embedding-shape9 hours ago|

[-]

> Shouldn't use innerText anyway (nonstandard, worse performance, tries to parse the HTML and gives you unexpected behavior if e.g. a style is set that makes an element invisible but still has text inside, doesn't work on all DOM nodes...)

Maybe you meant .innerHTML? .innerText AFAIK doesn't try to parse HTML (why would it?), but I don't understand what you mean with nonstandard, both .innerHTML and .innerText are part of the standards, and I think they've been for a long time.

> but I didn't understand what the point of this default could be and so why it exists (a sibling comment proposed a plausible answer: hardening on top of another solution) [...] the question "what's the use-case for this"

I guess maybe third time could be the charm: it's for preventing XSS holes that are very common when people use .innerHTML

reply

upvote

by Aachen9 hours ago|

[-]

> maybe third time could be the charm: it's for preventing XSS holes

That information is in the question, so sadly no this still doesn't make sense to me because I don't understand any scenario in which this is what the developer wants. You always still need more code (to filter the right tags) or can just use textContent (separating data and code completely, imo the recommended solution)

> Maybe you meant .innerHTML? .innerText AFAIK doesn't try to parse HTML (why would it?)

No, I didn't mean that, yes it does, and no I don't know why it is this way. If you don't believe me and don't want to check it out for yourself, I'm not sure what more I can say

reply

upvote

by lelanthran5 hours ago|

[-]

> I don't understand any scenario in which this is what the developer wants.

Client-side includes.

reply

upvote

by benregenspan9 hours ago|

[-]

It seems like the goal of the default configuration is preventing script injection while being otherwise very permissive. Basically, "safer than innerHTML, even when used very lazily". But I would expect guidance to evolve saying that it almost never makes sense to use the default and instead to specify a configuration that makes contextual sense for a given field.

The default might be suitable for something like an internal blog where you want to allow people to sometimes go crazy with `<style>` tags etc, just not inject scripts, but I would expect it to almost always make sense to define a specific allowed tag and attribute list, as is usually done with the userland predecessors to this API.

reply

upvote

by lelanthran5 hours ago|

[-]

> Who would ever want this?

Your lack of imagination is disturbing :-)

https://github.com/lelanthran/ZjsComponent

reply

upvote

by kccqzy8 hours ago|

[-]

There’s innerText if you don’t want markup. Or more verbosely, document.createTextNode followed by whatever.appendChild.

reply

upvote

by afavour8 hours ago|

[-]

> Who would ever want this?

Anyone who wants to provide some level of flexibility but within bounds. Say, you want to allow <strong> and <em> in a forum post but not <script>. It's not too difficult to imagine uses.

reply

upvote

by goatlover6 hours ago|

[-]

Forums would already have code that sanitizes user input when it's submitted. Users aren't directly setting html elements.

reply

upvote

by afavour6 hours ago|

[-]

And is that sanitization perfect? Kept up to date?

With a safe API like this one that's tied to the browser's own interpretation of HTML (i.e. it is perfectly placed to know exactly what is and isn't dangerous given it is the one rendering it) wouldn't it be much better to rely on that?

reply

upvote

by dheera7 hours ago|

[-]

> So you can still inject <h1> or <br><br><br>... etc into your username

Are we taking out all the fun of the web? I absolutely loved the <marquee> names people had in the early days of Facebook, it was all harmless fun.

If injection of frontend code takes down your backend, your backend sucks, fix it.

reply