undefined

upvote

points

by jedimastert17 hours ago |

upvote

by wongarsu15 hours ago|

[-]

Back in the day it was reasonably common for CMSs and forums to only have an index.php, and routing entirely by query string (in form-urlencoded form, people were not savages). So you would have index.php?p=home and index.php?p=shop. Or index.php?action=showthread&forum=42&thread=17976. It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

In fact lots of sites still work like that, they just hide it behind a couple rewrite rules in apache/nginx for SEO reasons

reply

upvote

by Semiapies14 hours ago|

[-]

If you're routing like it's 1999, sure, 404.

On the other hand, if it's a CRUD app and you're filtering a list of entities by various field values? Returning that no items matched your selection (or an empty list, if an API) makes more sense than a 404, which would more appropriate for an attempt to pull up a nonexistent entity URI.

reply

upvote

by rswail3 hours ago|

[-]

    204 No Content

for nothing found is both not an error (because 2xx code) but also indicates there was nothing found to match the request.

If it's an API, a 200 with an empty JSON object or array in the body is legitimate as well, but a 204 is explicit.

reply

upvote

by Sander_Marechal13 hours ago|

[-]

There is no reason you can return that "no items matched your selection" with a 404 HTTP response code instead of a 200.

reply

upvote

by evanspa28 minutes ago|

[-]

Another reason not to return a 404 in that case is that chances there will be monitor tooling in place that will treat a 404 as an "error" that will show up in your alerting, but would not be ideal; it will just be noise.

reply

upvote

by onion2k5 hours ago|

[-]

You can return whatever HTTP response code you want, but if you care about knowing whether your site is working being about to look at the logs and see "That user requested a page that doesn't exist" being different to "That user requested a page that exists but had no results" is quite useful. In coding terms it's the difference between a null and an empty array.

reply

upvote

by ah15084 hours ago|

[-]

In this case I don't think the status should depend on the number of results. Here are you results, [] is a valid response body when there are no result. Returning 404 if there are no result (GET /books?title=a for instance) is misleading, the caller may think that /books is a non existent route and may conclude that books are reachable via another URI. To me, the querystring has no influence on the response status.

/books/1 could return 200 or 404 depending on the existence of the book#1, here it make sense because if /books/1 does not exist the API must tell it explicitly. However 404 belongs to the 4XX family which means "client error", is it an error to ask for a non existing book ? If you enter in a bookshop and ask for a book they don't have you did not "make a mistake". It's not like if you asked for a chainsaw. But in an API, especially with hypermedia, you are not supposed to request a resource that does not exist (unless the API provides a link to an existing resource that is was deleted before the caller try to reach it).

reply

upvote

by kilburn2 hours ago|

[-]

If you enter a bookshop and you ask for a book that does not exist then it's definitely your mistake.

If you ask for a book they don't have it's a different matter.

In any case, when you ask for a book in a library you are using their "search" endpoint. The equivalent to opening a books/1 url would be asking for a specific instance of a book by serial number or so. Then it's clear that you made a mistake uf you do that for an unexistent serial number...

reply

upvote

by threatofrain12 hours ago|

[-]

A response code of 204 seems more appropriate but the problem is you're not allowed to send further information, which would make that descriptive response... not descriptive enough.

reply

upvote

by Xirdus6 hours ago|

[-]

Code 204 is just code 200 with the "yes the body really is zero bytes this is not an error it's supposed to be like this" bit set.

reply

upvote

by IgorPartola8 hours ago|

[-]

I think of it like this:

/users/ returns a 404 in an API means that this resource does not exist. As in, this is not a part of the API.

/users/123 returns a 404 means this user record does not exist.

Yes this means that a 404 is context dependent but in a way that makes it easier for a human to think of and reason about.

reply

upvote

by onion2k5 hours ago|

[-]

Yes, and this is obvious if /users/ exists and returns a 400 if the ID is required. That way you can tell the difference between /users/ being there and expecting and ID, and it not being there.

reply

upvote

by stouset13 hours ago|

[-]

The point was that returning a 404 for unexpected query strings doesn’t just happen to okay per the specs, but that there is significant historical precedent for doing so based on application design that was common in the past.

reply

upvote

by brightball13 hours ago|

[-]

Yea, empty response at a valid path. Isn’t 204 the code for it?

Lots of REST libraries that I’ve used treat any 400 response as an error so generating a 404 when for an empty list would just create more headaches.

reply

upvote

by HumanOstrich10 hours ago|

[-]

Libraries that automatically throw errors for status codes in the 400 and 500 ranges are pretty obnoxious (looking at you, axios). It adds unnecessary overhead, complexity, and bad ergonomics by hijacking control flow from the app.

Responses with status codes in the 400 range are client errors, so the client shouldn't retry the same request. So a 404 is appropriate despite how annoying a library might be at handling it. Depending on which language/ecosystem you are using, there are likely more sane alternatives.

reply

upvote

by mnahkies4 hours ago|

[-]

Completely agree on the axios part - one implication of that is you can't statically type the error response shapes (since exceptions can't be typed). Where as with fetch you can have a discriminated union based on the status code (eg: https://github.com/mnahkies/openapi-code-generator/blob/main...)

Although I do feel like I've seen too many instances of a 404 being used for an empty collection where it would make more sense to return `[]` and treat it as an expected (successful) state.

reply

upvote

by vlovich1239 hours ago|

[-]

Generally true although 429 is often used for rate limiting so a back off and retry is appropriate. 409, 412, 428 may also be retriable depending on the specific semantics of the given situation. 421 apparently shows up commonly in HTTP/2 connection reuse and is retriable. 423 and 425 too potentially.

It would have been nice if there was an actually grouping of retriable and not retriable but in reality it’s a complete mess.

But at a minimum beware of 429. That’s not a permanent outage and is a frequent one you might get that needs a careful retry.

reply

upvote

by sk5t12 hours ago|

[-]

204 might be acceptable if you aren’t returning an entity body to describe what is missing, but do wish to indicate the request was successful.

reply

upvote

by burnished11 hours ago|

[-]

I think the author is comfortable creating headaches for people tacking query strings onto URLs

reply

upvote

by nofriend11 hours ago|

[-]

> It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

That's not obvious at all. If I receive json data that contains a property I'm not aware of, i don't reject the entire document for that reason. In the case of query strings, extra query parameters might be used by other parts of the stack besides yours, so rejecting the entire document because someone somewhere else is trying to pass information to itself is the wrong approach.

reply

upvote

by saimiam10 hours ago|

[-]

> other parts of the stack

As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

If someone is not on the list, your job is to default to declining them access, not granting them access assuming level 2 security will handle them at a deeper layer.

It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

reply

upvote

by simondotau8 hours ago|

[-]

The first layer of any web security should never be checking someone against a list, unless this can be done in less than a few milliseconds. It should only be sanity checking for basic compliance. In the analogy, this first layer should be denying entry to obviously drunk people, zebras, and a stampede of protesters.

reply

upvote

by nofriend10 hours ago|

[-]

>It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

This is how the vast majority of websites work. The practical reason is obvious: when we model the behaviour our code depends on, we want to create the simplest possible model that allows our code to work as expected. Placing requirements on it that our code doesn't actually depend on is useless, unneeded, complexity.

> As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

there is no security benefit to filtering out unneeded url parameters.

reply

upvote

by chii4 hours ago|

[-]

> there is no security benefit to filtering out unneeded url parameters.

there is - security in depth.

If a url parameter would've been a vulnerability because something lower down the stack misinterprets it (and the param wasn't necessary for your app in the first place), then you've just left a window open for the exploit.

If the set of url params are known ahead of time (which i claim should be true), then you could make adding unknown params an error.

reply

upvote

by larusso6 hours ago|

[-]

> there is no security benefit to filtering out unneeded url parameters.

What about passing extra data to fill the server memory with either extra known junk or a script / executable to use with a zero day in an internal component or something.

To misuse the nightclub analogy: it’s like checking for bags not being larger than A4 and disallow knives and other weapons.

reply

upvote

by 8 hours ago|

[-]

deleted

reply

upvote

by bandrami8 hours ago|

[-]

At the risk of naming an Eldritch horror, IIRC it was Cold Fusion that first adopted something like an MVC-in-querystring routing system in the late 90s or early 00s, and that eventually spread when FCGI caught on and users of other languages got used to long-running middleware processes. It seemed hella elegant at the time.

reply

upvote

by Ekaros2 hours ago|

[-]

No 400 is correct for bad request. As unknown query parameters is clear client error.

reply

upvote

by bartread1 hours ago|

[-]

All 4xx errors are client errors.

400 is the general “bad request” client area, indicating something is wrong with the request but not being specific about what.

404 is simply a more specific client error: it means the client asked for a resource that couldn’t be found.

reply

upvote

by sroussey13 hours ago|

[-]

Oh no, looks like my old forum software urls.

reply

upvote

by panzi11 hours ago|

[-]

watch?v=oHg5SJYRHA0

reply

upvote

by kaelwd7 hours ago|

[-]

item?id=48076173

reply

upvote

by qingcharles5 hours ago|

[-]

Ooo.. burn.

reply

upvote

by chrismorgan9 hours ago|

[-]

> in form-urlencoded form, people were not savages

Oh yeah? I remember a lot of semicolons from Perl and other CGI stuff where we would now use ampersands, back in the day, both in the path and in the query. (Sometimes the ? itself would be written ;.)

reply

upvote

by otabdeveloper46 hours ago|

[-]

Correct. In fact, the semicolon is part of the URI scheme standard, and the ampersand is just some ad-hoc thing that got adopted naturally without any standardization effort.

reply

upvote

by chrismorgan9 hours ago|

[-]

Yeah, URLs really don’t have much in the way of semantics. Path is clearly intended for hierarchical data and query for non-hierarchical data, and there are strong customs, some commonly supported or even enforced by libraries, but no actual rules. Ultimately, it’s just a string that the server can decide what to do with.

The really funny thing about this is that, when I was worrying about possible side effects if I responded 404, I somehow completely forgot how much of the web’s history the path has been useless for. Paths have won. No one really starts new things with URLs like /item?id=… any more. Yay!

reply

upvote

by fpoling6 hours ago|

[-]

Wikipedia web server treats anything after /wiki/ literally as the name of the article.

So en.wikipedia.org/wiki/// is the article about C++ style comments

reply

upvote

by chrismorgan5 hours ago|

[-]

Oh, magnificent. Lovely high-profile example to add about empty path segments being meaningful.

reply

upvote

by chii4 hours ago|

[-]

i wonder if it ought to be `/wiki/%2F%2F` instead...

reply

upvote

by dylan6048 hours ago|

[-]

Wouldn't a generic 400 be better. It's not that the page wasn't found, but you've sent something that was not an accepted request. Fix your request and try again is how I've read it, and that's how I use it in the APIs I provide. I prefer it over 406 since it's not my end that can't process it. If your query string is tacking extra stuff trying to break things or just because your request wasn't crafted per the docs, then it's on you.

reply

upvote

by Ekaros1 hours ago|

[-]

406 would be wrong for me. As it is to be used when client sends Accept: header and server cannot fulfil that. HTTP return codes get quite specific when you read the actual description and not just name.

reply

upvote

by qiller13 hours ago|

[-]

Interestingly, quite a few places that should treat query strings transparently make a lot of assumptions about their structure. We ran into that when picking a new CDN, some providers didn't handle repeat parameters (?a=1&a=2) correctly.

reply

upvote

by sroussey13 hours ago|

[-]

What’s do you mean by correctly?

reply

upvote

by kstrauser13 hours ago|

[-]

Incorrectly would be processing the query string and deduping keys. Correctly would be passing it through as-is, or at least only lightly processing it, like normalizing escaping or such.

reply

upvote

by sroussey12 hours ago|

[-]

Indeed I would expect pass through with no changes.

Though there are “smart” CDNs that will resize images etc. all beats are off for those.

reply

upvote

by nofriend11 hours ago|

[-]

Standards are just commonly accepted behaviour that somebody chose to write down somewhere. There are a great number of commonly accepted behaviours that nobody's ever bothered to encode into a formal standard, but where failure to follow the accepted practice will result in widespread breakage. There are also a great many "standards" that you would be a fool to follow to the letter. In the OP case, the only thing that will break is people trying to visit their site, who will presumably simply press the back button on their browser and go about their day. They can decide for themselves if that is an acceptable casualty. But it isn't definitionally acceptable because no standard says it isn't (nor would is suddenly become unacceptable because a standard said it was...)

reply

upvote

by socalgal24 hours ago|

[-]

The No-Vary-Search (proposal?)

https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

effectively lets you specify what parts of a query are relevant. So for example

url?a=b&c=d matches url?c=d&a=b in terms of caching

reply

upvote

by bawolff12 hours ago|

[-]

> I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.

This feels like a technically correct is the best kind of correct situation. Like technically, yeah web servers may respond 404 if they dont understand a query parameter, but in practise that is not how urls are conceptualized normally.

reply

upvote

by ompogUe13 hours ago|

[-]

Something I discovered looking back at some old sites: "pages" defined by URL params don't always make it into the Wayback Machine.

reply

upvote

by nrds15 hours ago|

[-]

Wait until you realize that the difference between path and query string is entirely arbitrary and decided by the server. Query strings should never have existed. They are an implementation detail of CGI webservers that leaked all over everything and now smells really bad.

reply

upvote

by mikeocool15 hours ago|

[-]

I dunno, it seems like the fact that we arrived at a fairly standard structure for URL paths that works pretty well is not a bad outcome.

Seems a lot better than the other potential world we could lived in, where paths were a black box and every web server/framework invented their own structure for them.

reply

upvote

by hamburglar14 hours ago|

[-]

My next website is going to have the path portion of the URL be a base64 encoded ASN.1 blob.

reply

upvote

by chrismorgan9 hours ago|

[-]

So long as it starts with a slash, go ahead! See how long it takes for someone to figure it out.

It’s your website. Have fun with it! Do dumb things! :-)

reply

upvote

by rkeene28 hours ago|

[-]

Make sure you use URL-safe base64 or the portions that looks like a path can get mangled

MII//epi

Is converted to MII/epi

reply

upvote

by gritzko14 hours ago|

[-]

In my current project I use URIs to refer to absolutely any entity in a git(-ish) repo. Files, branches, revisions, diffs, anything. URI turns out to be a really good addressing scheme for everything. Surprise. But the most used and abused element is always the path. Query takes a lot of that mess away. Might have been unmanageable otherwise.

https://github.com/gritzko/beagle

reply

upvote

by gritzko13 hours ago|

[-]

In fact, GitHub URIs are a good example of overusing paths: https://github.com/gritzko/beagle/blob/a7e17290a39250092055f...

  - user gritzko,
  - project beagle, 
  - view blob, 
  - commit a7e17290a39250092055fcda5ae7015868dabdb4, 
  - file path VERBS.md

... all concatenated indiscriminately.

reply

upvote

by altairprime12 hours ago|

[-]

That’s not an indiscriminate hierarchy.

Grouping data by user is common and normal in computing: /home laid precedent decades ago.

Project directories are an extremely common grouping within a user’s work sets. Yeah, some of us just dump random files in $HOME, but this is still a sensible tier two path component.

The choice to make ‘view metadata-wrapped content in browser HTML output’ the default rather than ‘view raw file contents’ the default is legitimate for their usage. One could argue that using custom http headers would be preferable to a path element (to the exclusion of JavaScript being able to access them, iirc?) or that the path element blob should be moved into the domain component or should prefix rather than suffix the operands; all valid choices, but none implicitly better or worse here.

Object hash is obviously mandatory for git permalinks, and is perhaps the only mandatory component here. (But notably, that’s not the same as a commit hash.) However, such paths could arguably be interpreted as maximally user-hostile.

File path, interestingly enough, is completely disposable if one refers to a specific result object hash within a commit, but if the prior object hash was required to be a commit, then this is a valid unique identifier for the filesystem-tree contents of that commit. You could use the object hash instead of the full path within the commit hash, but that’s a pretty user-hostile way to go about this.

So, then, which part of the ordering and path selections do you consider indiscriminate, and why?

reply

upvote

by em-bee10 hours ago|

[-]

actually, instead of the object hash, you could also use the commit-hash. then the filename would be mandatory, but the url would be more readable and usable: give me the file VERBS.md as it is at commit <hash>

reply

upvote

by masklinn6 hours ago|

[-]

That's actually what it is here, a7e17290a39250092055fcda5ae7015868dabdb4 is a commit's oid: https://github.com/gritzko/beagle/commit/a7e17290a3925009205...

reply

upvote

by 6 hours ago|

[-]

deleted

reply

upvote

by deepsun9 hours ago|

[-]

But the path misses param names (or types?). E.g who said the hex-encoded part is a commit hash? Maybe it's a tree hash, or just weird ref.

Query strings are more verbose as force to give each param a name.

reply

upvote

by altairprime8 hours ago|

[-]

Which target audience of github needs extra verbosity in the commit hash, though? Once you know it you know it; if you don’t know git you aren’t the target audience; etc. Saying /user=foo is no better than ?user=foo if your audience can work it out without confusion from your unadorned paths. We have a great deal of history with filesystems showing that people are capable of keeping up with paths that lack key names if exposed to and familiar with them, and if the filesystem isn’t being constantly randomized.

reply

upvote

by em-bee13 hours ago|

[-]

what would be a better way of doing that? i am not disagreeing, but i just can't think of any way to improve on this. put everything into the query part? i prefer to use the query only for optional arguments. in this example the blob argument is the only thing that doesn't fit in my opinion.

reply

upvote

by arjvik12 hours ago|

[-]

Every object in git (commit, tree, revision of a single file) has a hash that is guaranteed unique within a repository (otherwise many more things than a web UI would break) and likely also globally. I can understand wanting to isolate repositories to prevent hash collisions from causing problems, but within a repo everything has a universally unique ID.

edit: for instance, that specific VERBS.md is represented by the blob 3b9a46854589abb305ea33360f6f6d8634649108.

reply

upvote

by em-bee11 hours ago|

[-]

that's not what i meant. i was trying to suggest that the string "blob" does not fit. why is it there? why is it needed?

    https://github.com/gritzko/beagle/a7e17290a39250092055fcda5ae7015868dabdb4/VERBS.md

this should be sufficient to represent the file.

"blob" is like a descriptor of the value that follows. it would be like doing this:

    https://github.com/user/gritzko/project/beagle/blob/a7e17290a39250092055fcda5ae7015868dabdb4/file/VERBS.md

this actually irks me every time i see it in a github url

reply

upvote

by masklinn5 hours ago|

[-]

> this should be sufficient to represent the file.

Except it's not, because the oid can be a short hash (https://github.com/gritzko/beagle/blob/a7e172/VERBS.md) and that means you're at risk of colliding with every other top-level entry in the repository, so you're restricting the naming of those toplevel entries, for no reason.

So namespacing git object lookups is perfectly sensible, and doing so with the type you're looking for (rather than e.g. `git` to indicate traversal of the git db) probably simplifies routing, and to the extent that it is any use makes the destination clearer for people reading the link.

reply

upvote

by sowbug10 hours ago|

[-]

They are following the /key/value/key/value pattern, but the first two pairs in a GitHub URL are fixed to user and project, which lets them omit the key names. I could see them not being willing to hardcode the third pair to blob.

Back when GitHub URLs were kind of cool, github.com/user/gritzko/project/beagle would have been much less cool than just github.com/gritzko/beagle.

reply

upvote

by masklinn5 hours ago|

[-]

> They are following the /key/value/key/value pattern

They are not. There's just a routing layer below the repository.

reply

upvote

by iainmerrick13 hours ago|

[-]

Back in the day there was an attempt to introduce "matrix URIs" as a more structured alternative to query strings: https://www.w3.org/DesignIssues/MatrixURIs.html

Of course there's nothing to stop you using URIs like this (I think Angular does, or did at one point?) but I don't think the rules for relative matrix URIs were ever figured out and standardised, so browsers don't do anything useful with them.

reply

upvote

by pverheggen12 hours ago|

[-]

Not entirely arbitrary - forms that use the GET method instead of POST will append form values as query params.

For sites without Javascript, it's great for things like search boxes, tables with sorting/filtering, etc. instead of POST, since it preserves your query in the URL.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

reply

upvote

by msandford11 hours ago|

[-]

It has always amazed me how much trouble the SPA folks are willing to go to in order to slowly rebuild just normal boring URLs with querystrings because users demand deep linking and back buttons and the like.

Or you could accept that you're probably going to need a round trip to the server and use a normal URL and it's fine.

For all but the absolute biggest websites in the world, anyhow. At Facebook or Google scale yeah it's needed.

reply

upvote

by halayli13 hours ago|

[-]

Nothing you said here is correct. Paths, query strings, and fragments are all well defined entities. https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

reply

upvote

by sroussey13 hours ago|

[-]

It’s a string between ? and # isn’t well defined. Or it is and it says very little.

reply

upvote

by gpvos15 hours ago|

[-]

Query strings existed before CGI did, and the way they're defined to be filled in from web forms is quite useful; I wouldn't want to need Javascript to fit that into path format. There's nothing wrong about having things decided by the server; I don't get that part of your argument at all.

reply

upvote

by cobbzilla14 hours ago|

[-]

Maybe dumb question: how does the server “decide” anything other than what file to serve? Today we have many choices but back in the day CGI was the first standard way to do it.

So yes query parameters existed before CGI but to use them you had to hack your server to do something with them (iirc NCSA web servers had some magic hacks for queries). CGI drove standardization.

reply

upvote

by losvedir21 minutes ago|

[-]

TCP has been around a long time. Listen, read, send, you're good to go. It's just software so you can make it do anything.

But you're asking about the relationship between popular primarily file serving servers like Apache and their relationship to high level code to create custom responses? Yeah, CGI was the first big standard there that I remember, though it was a bit before my time. But that's only one possible architecture.

These days, most web apps have the web server built in, and so the custom code you're writing works with the full request directly. There may be a lightweight web server in front (or multiple), like nginx, to manage connections, but they will largely just proxy the whole thing through.

reply

upvote

by stirfish14 hours ago|

[-]

    func specialHandler(w http.ResponseWriter, r *http.Request) {
 if time.Now().Weekday() == time.Tuesday {
  http.NotFound(w, r)
  return
 }

     fmt.Fprintln(w, "server made a decision")
    }

Your server can make decisions however you program it to, you know? It's just software.

Forgive the phone-posting.

reply

upvote

by cobbzilla11 hours ago|

[-]

and what server software is running this code in 1995?

reply

upvote

by lispwitch10 hours ago|

[-]

CL-HTTP or AOLserver

reply

upvote

by cobbzilla10 hours ago|

[-]

sure looks like VB there, what’s the plugin? Didn’t see anything like that before.

reply

upvote

by heavensteeth5 hours ago|

[-]

That's Go.

reply

upvote

by cobbzilla2 hours ago|

[-]

Which runs on what computer in 1995?

reply

upvote

by 12 hours ago|

[-]

deleted

reply

upvote

by jolmg15 hours ago|

[-]

It's arbitrary to a degree like the difference between using an attribute or child element in XML, but it's not entirely arbitrary. If you want to include data in the URL that's not part of the hierarchy of the path, query strings are good for that.

reply

upvote

by paulddraper14 hours ago|

[-]

How do you figure?

Paths are hierarchical; query strings are name/value.

(Note I speak of common usage.)

You can create a different convention, but that one is pretty dang useful.

reply

upvote

by TZubiri12 hours ago|

[-]

Whatwg is for html, try the IEEE http rfcs

reply