[1] https://developer.mozilla.org/en-US/docs/Web/API/Element/set...
But for html snippets you can pretty much just check that tags follow a couple simple rules between <> and that they're closed or not closed correctly.
> set a global property somewhere (as a web developer) that disallows[…] `innerHTML`
Object.defineProperty(Element.prototype, "innerHTML", {
set: (() => { throw Error("No!") })
});
(Not that you should actually do this—anyone who has to resort to it in their codebase has deeper problems.)Good idea to ship that one first, when it's easier to implement and is going to be the unsafe fallback going forward.
Oddly though, the Sanitizer API that it's built on doesn't appear to be in Safari. https://developer.mozilla.org/en-US/docs/Web/API/Sanitizer
Or how any potential driver is familiar with seat belts which is why everybody wears them and nobody’s been thrown from a car since they were invented.
The issue isn’t that the word “safe” doesn’t appear in safe variants, it’s that “unsafe” makes your intentions clear: “I know this is unsafe, but it’s fine because of X and Y”.
Legacy and backwards compatibility hampers this, but going forward…
The mythical refactor where all deprecated code is replaced with modern code. I'm not sure it has ever happened.
I don't have an alternative of course, adding new methods while keeping the old ones is the only way to edit an append-only standard like the web.
(Assuming transpilers have stopped outputting it, which I'm not confident about.)
For example, esbuild will emit var when targeting ESM, for performance and minification reasons. Because ESM has its own inherent scope barrier, this is fine, but it won't apply the same optimizations when targeting (e.g.) IIFE, because it's not fine in that context.
Having an alternative to innerHTML means you can ban it from new code through linting.
But I can see what you mean, even if then it would still be better for it to print the code that does what you want (uses a few Wh) than doing the actual transformation itself (prone to mistakes, injection attacks, and uses however many tokens your input data is)
And, in your opinion, this is one of those cases?
Maybe the last 10 years saw so much more modern code than the last cumulative 40+ years of coding and so modern code is statistically more likely to be output? Or maybe they assign higher weights to more recent commits/sources during training? Not sure but it seems to be good at picking this up. And you can always feed the info into its context window until then
>> Maybe the last 10 years saw so much more modern code than the last cumulative 40+ years of coding and so modern code is statistically more likely to be output?
The rate of change has made defining "modern" even more difficult and the timeframe brief, plus all that new code is based on old code, so it's more like a leaning tower than some sort of solid foundation.
Huh? It's been a decade.
It was, and there is: setting elementNode.textContent is safe for untrusted inputs, and setting elementNode.innerHTML is unsafe for untrusted inputs. The former will escape everything, and the latter won't escape anything.
You are right that these "sanitizers" are fundamentally confused:
> "HTML sanitization" is never going to be solved because it's not solvable.¶ There's no getting around knowing whether or any arbitrary string is legitimate markup from a trusted source or some untrusted input that needs to be treated like text. This is a hard requirement.
<https://news.ycombinator.com/item?id=46222923>
The Web platform folks who are responsible for getting fundamental APIs standardized and implemented natively are in a position to know better, and they should know better. This API should not have made it past proposal stage and should not have been added to browsers.
It is not a hard requirement that untrusted input is "treated like text". And this API lets you customize exactly what tags/attributes are allowed in the untrusted input. That's way better than telling everyone to write their own; it's not trivial.
It's also not a hard requirement that I defend the position that there's a hard requirement for untrusted input to be treated like text. That isn't my position, and it's not what I wrote.
Given that it is not a hard requirement that untrusted input be treated like text, it wouldn't make sense for anyone to claim that it is—and therefore it doesn't make sense for someone, presented with I did write, to strenuously argue with me that such a tortured, implausible, uncharitable, non-sensical interpretation of what I wrote was something that I have to account for (versus the interpretation that does match what I wrote and is actually true and makes sense).
You are, willfully or not, misconstruing what I have written.
> That's way better than telling everyone to write their own; it's not trivial.
Right, it's not trivial. It's so far the opposite of trivial that it's (as I said the first time—and again, just now) not solvable.
No one should be writing their own.
No one should be trying to write their own.
No one should be using this API at all.
And no one should have pushed for its implementation.
It's a bad API.
Briefly though, if you have an untrusted string then you need to either treat it like text or sanitize it. I don't see any other options.
So if people shouldn't use this sanitizer or write their own, then the only option left is treating the string as text. But you're vehemently arguing that's not what you said.
What's the other way to use an untrusted string? Other than "don't", but that means not taking input and only works for toy apps.
"Everyone who files a tax return should know whether they need to pay at least $1000 in unpaid taxes to the IRS."
"Everyone who files a tax return needs to pay at least $1000 in unpaid taxes to the IRS."
> You divided strings going into HTML into two categories, where one category uses textContent and the other category uses innerHTML.
No, I didn't:
> setting elementNode.textContent is safe for untrusted inputs, and setting elementNode.innerHTML is unsafe for untrusted inputs
That's what I wrote: a statement containing two claims (both true—and not even in the part of my comment that you actually quoted and pretended to be replying to).
Those claims are different but not in a way that analogizes to the HTML conversation.
Anyway, I see you edited your previous post after I wrote my reply.
If you weren't trying to divide things into two categories, you wrote it very confusingly. When you say how to handle trusted strings, then say how to handle untrusted strings, then say "There's no getting around knowing whether or any arbitrary string is legitimate markup from a trusted source or some untrusted input that needs to be treated like text. This is a hard requirement." it really sounds like that's supposed that's supposed to cover all cases.
Me thinking you were using two categories is an honest mistake, not malicious misquoting.
And reading your original post that way is the interpretation that makes it stronger. If there are more categories then SetHTML is no longer "fundamentally confused". Your argument against it falls apart.
Content-Security-Policy: require-trusted-types-for 'script'
…then it blocks you from passing regular strings to the methods that don't sanitize.
But I agree, my default approach has usually been to only use innerText if it has untrusted content:
So if their demo is this:
container.SetHTML(`<h1>Hello, {name}</h1>`);
Mine would be: let greetingHeader = container.CreateElement("h1");
greetingHeader.innerText = `Hello, {name}`;Edit: I don't mean this flippantly. If I want to render, say, my blog entry on your site, will I need to select every markup element from a dropdown list of custom elements that only accept text a la Wordpress?
(It's a joke, but it is also 100% XSS, SQL injection, etc. safe and future proof)
What is safe depends on where the sanitized HTML is going, on what you're doing with it.
It isn't possible to "sanitize HTML" after collecting it so that, when you use it in the future, it will be safe. "Safe" is defined by the use.
But it is possible to sanitize it before using it, when you know what the use will be.
Don't even try to allow inline <svg> from untrusted sources! (and then you still must sanitise any svg files you host)
Even with this being a native API, there are still two parsers that need to be maintained. What a native API achieves is to shift the onus for maintaining synchronicity between the two onto the browser makers. That's not nothing, but it's also not the sort of free lunch that some people naively believe it is.
There's setHTML and setHTMLUnsafe. That seems about as clear as you can get.