upvote
Back in the day it was reasonably common for CMSs and forums to only have an index.php, and routing entirely by query string (in form-urlencoded form, people were not savages). So you would have index.php?p=home and index.php?p=shop. Or index.php?action=showthread&forum=42&thread=17976. It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

In fact lots of sites still work like that, they just hide it behind a couple rewrite rules in apache/nginx for SEO reasons

reply
If you're routing like it's 1999, sure, 404.

On the other hand, if it's a CRUD app and you're filtering a list of entities by various field values? Returning that no items matched your selection (or an empty list, if an API) makes more sense than a 404, which would more appropriate for an attempt to pull up a nonexistent entity URI.

reply

    204 No Content
for nothing found is both not an error (because 2xx code) but also indicates there was nothing found to match the request.

If it's an API, a 200 with an empty JSON object or array in the body is legitimate as well, but a 204 is explicit.

reply
There is no reason you can return that "no items matched your selection" with a 404 HTTP response code instead of a 200.
reply
Another reason not to return a 404 in that case is that chances there will be monitor tooling in place that will treat a 404 as an "error" that will show up in your alerting, but would not be ideal; it will just be noise.
reply
You can return whatever HTTP response code you want, but if you care about knowing whether your site is working being about to look at the logs and see "That user requested a page that doesn't exist" being different to "That user requested a page that exists but had no results" is quite useful. In coding terms it's the difference between a null and an empty array.
reply
In this case I don't think the status should depend on the number of results. Here are you results, [] is a valid response body when there are no result. Returning 404 if there are no result (GET /books?title=a for instance) is misleading, the caller may think that /books is a non existent route and may conclude that books are reachable via another URI. To me, the querystring has no influence on the response status.

/books/1 could return 200 or 404 depending on the existence of the book#1, here it make sense because if /books/1 does not exist the API must tell it explicitly. However 404 belongs to the 4XX family which means "client error", is it an error to ask for a non existing book ? If you enter in a bookshop and ask for a book they don't have you did not "make a mistake". It's not like if you asked for a chainsaw. But in an API, especially with hypermedia, you are not supposed to request a resource that does not exist (unless the API provides a link to an existing resource that is was deleted before the caller try to reach it).

reply
If you enter a bookshop and you ask for a book that does not exist then it's definitely your mistake.

If you ask for a book they don't have it's a different matter.

In any case, when you ask for a book in a library you are using their "search" endpoint. The equivalent to opening a books/1 url would be asking for a specific instance of a book by serial number or so. Then it's clear that you made a mistake uf you do that for an unexistent serial number...

reply
A response code of 204 seems more appropriate but the problem is you're not allowed to send further information, which would make that descriptive response... not descriptive enough.
reply
Code 204 is just code 200 with the "yes the body really is zero bytes this is not an error it's supposed to be like this" bit set.
reply
I think of it like this:

/users/ returns a 404 in an API means that this resource does not exist. As in, this is not a part of the API.

/users/123 returns a 404 means this user record does not exist.

Yes this means that a 404 is context dependent but in a way that makes it easier for a human to think of and reason about.

reply
Yes, and this is obvious if /users/ exists and returns a 400 if the ID is required. That way you can tell the difference between /users/ being there and expecting and ID, and it not being there.
reply
The point was that returning a 404 for unexpected query strings doesn’t just happen to okay per the specs, but that there is significant historical precedent for doing so based on application design that was common in the past.
reply
Yea, empty response at a valid path. Isn’t 204 the code for it?

Lots of REST libraries that I’ve used treat any 400 response as an error so generating a 404 when for an empty list would just create more headaches.

reply
Libraries that automatically throw errors for status codes in the 400 and 500 ranges are pretty obnoxious (looking at you, axios). It adds unnecessary overhead, complexity, and bad ergonomics by hijacking control flow from the app.

Responses with status codes in the 400 range are client errors, so the client shouldn't retry the same request. So a 404 is appropriate despite how annoying a library might be at handling it. Depending on which language/ecosystem you are using, there are likely more sane alternatives.

reply
Completely agree on the axios part - one implication of that is you can't statically type the error response shapes (since exceptions can't be typed). Where as with fetch you can have a discriminated union based on the status code (eg: https://github.com/mnahkies/openapi-code-generator/blob/main...)

Although I do feel like I've seen too many instances of a 404 being used for an empty collection where it would make more sense to return `[]` and treat it as an expected (successful) state.

reply
Generally true although 429 is often used for rate limiting so a back off and retry is appropriate. 409, 412, 428 may also be retriable depending on the specific semantics of the given situation. 421 apparently shows up commonly in HTTP/2 connection reuse and is retriable. 423 and 425 too potentially.

It would have been nice if there was an actually grouping of retriable and not retriable but in reality it’s a complete mess.

But at a minimum beware of 429. That’s not a permanent outage and is a frequent one you might get that needs a careful retry.

reply
204 might be acceptable if you aren’t returning an entity body to describe what is missing, but do wish to indicate the request was successful.
reply
I think the author is comfortable creating headaches for people tacking query strings onto URLs
reply
> It should be immediately obvious that in that scheme 404 is indeed the correct answer to unknown query parameters

That's not obvious at all. If I receive json data that contains a property I'm not aware of, i don't reject the entire document for that reason. In the case of query strings, extra query parameters might be used by other parts of the stack besides yours, so rejecting the entire document because someone somewhere else is trying to pass information to itself is the wrong approach.

reply
> other parts of the stack

As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

If someone is not on the list, your job is to default to declining them access, not granting them access assuming level 2 security will handle them at a deeper layer.

It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

reply
The first layer of any web security should never be checking someone against a list, unless this can be done in less than a few milliseconds. It should only be sanity checking for basic compliance. In the analogy, this first layer should be denying entry to obviously drunk people, zebras, and a stampede of protesters.
reply
>It’s possible that the teams you work with expect fuzzy behaviour from the website but that’s a choice, not a practice.

This is how the vast majority of websites work. The practical reason is obvious: when we model the behaviour our code depends on, we want to create the simplest possible model that allows our code to work as expected. Placing requirements on it that our code doesn't actually depend on is useless, unneeded, complexity.

> As a web developer, you’re the like the guy standing with a clipboard outside a fancy club checking if people requesting entry are allowed or not. Basically, level 1 security.

there is no security benefit to filtering out unneeded url parameters.

reply
> there is no security benefit to filtering out unneeded url parameters.

there is - security in depth.

If a url parameter would've been a vulnerability because something lower down the stack misinterprets it (and the param wasn't necessary for your app in the first place), then you've just left a window open for the exploit.

If the set of url params are known ahead of time (which i claim should be true), then you could make adding unknown params an error.

reply
> there is no security benefit to filtering out unneeded url parameters.

What about passing extra data to fill the server memory with either extra known junk or a script / executable to use with a zero day in an internal component or something.

To misuse the nightclub analogy: it’s like checking for bags not being larger than A4 and disallow knives and other weapons.

reply
deleted
reply
At the risk of naming an Eldritch horror, IIRC it was Cold Fusion that first adopted something like an MVC-in-querystring routing system in the late 90s or early 00s, and that eventually spread when FCGI caught on and users of other languages got used to long-running middleware processes. It seemed hella elegant at the time.
reply
No 400 is correct for bad request. As unknown query parameters is clear client error.
reply
All 4xx errors are client errors.

400 is the general “bad request” client area, indicating something is wrong with the request but not being specific about what.

404 is simply a more specific client error: it means the client asked for a resource that couldn’t be found.

reply
Oh no, looks like my old forum software urls.
reply
watch?v=oHg5SJYRHA0
reply
item?id=48076173
reply
Ooo.. burn.
reply
> in form-urlencoded form, people were not savages

Oh yeah? I remember a lot of semicolons from Perl and other CGI stuff where we would now use ampersands, back in the day, both in the path and in the query. (Sometimes the ? itself would be written ;.)

reply
Correct. In fact, the semicolon is part of the URI scheme standard, and the ampersand is just some ad-hoc thing that got adopted naturally without any standardization effort.
reply
Yeah, URLs really don’t have much in the way of semantics. Path is clearly intended for hierarchical data and query for non-hierarchical data, and there are strong customs, some commonly supported or even enforced by libraries, but no actual rules. Ultimately, it’s just a string that the server can decide what to do with.

The really funny thing about this is that, when I was worrying about possible side effects if I responded 404, I somehow completely forgot how much of the web’s history the path has been useless for. Paths have won. No one really starts new things with URLs like /item?id=… any more. Yay!

reply
Wikipedia web server treats anything after /wiki/ literally as the name of the article.

So en.wikipedia.org/wiki/// is the article about C++ style comments

reply
Oh, magnificent. Lovely high-profile example to add about empty path segments being meaningful.
reply
i wonder if it ought to be `/wiki/%2F%2F` instead...
reply
Wouldn't a generic 400 be better. It's not that the page wasn't found, but you've sent something that was not an accepted request. Fix your request and try again is how I've read it, and that's how I use it in the APIs I provide. I prefer it over 406 since it's not my end that can't process it. If your query string is tacking extra stuff trying to break things or just because your request wasn't crafted per the docs, then it's on you.
reply
406 would be wrong for me. As it is to be used when client sends Accept: header and server cannot fulfil that. HTTP return codes get quite specific when you read the actual description and not just name.
reply
Interestingly, quite a few places that should treat query strings transparently make a lot of assumptions about their structure. We ran into that when picking a new CDN, some providers didn't handle repeat parameters (?a=1&a=2) correctly.
reply
What’s do you mean by correctly?
reply
Incorrectly would be processing the query string and deduping keys. Correctly would be passing it through as-is, or at least only lightly processing it, like normalizing escaping or such.
reply
Indeed I would expect pass through with no changes.

Though there are “smart” CDNs that will resize images etc. all beats are off for those.

reply
Standards are just commonly accepted behaviour that somebody chose to write down somewhere. There are a great number of commonly accepted behaviours that nobody's ever bothered to encode into a formal standard, but where failure to follow the accepted practice will result in widespread breakage. There are also a great many "standards" that you would be a fool to follow to the letter. In the OP case, the only thing that will break is people trying to visit their site, who will presumably simply press the back button on their browser and go about their day. They can decide for themselves if that is an acceptable casualty. But it isn't definitionally acceptable because no standard says it isn't (nor would is suddenly become unacceptable because a standard said it was...)
reply
The No-Vary-Search (proposal?)

https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

effectively lets you specify what parts of a query are relevant. So for example

url?a=b&c=d matches url?c=d&a=b in terms of caching

reply
> I was pretty geared up to have a contrarian opinion until I looked at the standards but they're actually pretty clear, a 404 could be a proper response to unexpected query string; query string is as much part of the URL API as the path is and I think pretty much everyone can acknowledge that just tacking random stuff onto the path would be ill advised and undefined behavior.

This feels like a technically correct is the best kind of correct situation. Like technically, yeah web servers may respond 404 if they dont understand a query parameter, but in practise that is not how urls are conceptualized normally.

reply
Something I discovered looking back at some old sites: "pages" defined by URL params don't always make it into the Wayback Machine.
reply
Wait until you realize that the difference between path and query string is entirely arbitrary and decided by the server. Query strings should never have existed. They are an implementation detail of CGI webservers that leaked all over everything and now smells really bad.
reply
I dunno, it seems like the fact that we arrived at a fairly standard structure for URL paths that works pretty well is not a bad outcome.

Seems a lot better than the other potential world we could lived in, where paths were a black box and every web server/framework invented their own structure for them.

reply
My next website is going to have the path portion of the URL be a base64 encoded ASN.1 blob.
reply
So long as it starts with a slash, go ahead! See how long it takes for someone to figure it out.

It’s your website. Have fun with it! Do dumb things! :-)

reply
Make sure you use URL-safe base64 or the portions that looks like a path can get mangled

MII//epi

Is converted to MII/epi

reply
In my current project I use URIs to refer to absolutely any entity in a git(-ish) repo. Files, branches, revisions, diffs, anything. URI turns out to be a really good addressing scheme for everything. Surprise. But the most used and abused element is always the path. Query takes a lot of that mess away. Might have been unmanageable otherwise.

https://github.com/gritzko/beagle

reply
In fact, GitHub URIs are a good example of overusing paths: https://github.com/gritzko/beagle/blob/a7e17290a39250092055f...

  - user gritzko,
  - project beagle, 
  - view blob, 
  - commit a7e17290a39250092055fcda5ae7015868dabdb4, 
  - file path VERBS.md
... all concatenated indiscriminately.
reply
That’s not an indiscriminate hierarchy.

Grouping data by user is common and normal in computing: /home laid precedent decades ago.

Project directories are an extremely common grouping within a user’s work sets. Yeah, some of us just dump random files in $HOME, but this is still a sensible tier two path component.

The choice to make ‘view metadata-wrapped content in browser HTML output’ the default rather than ‘view raw file contents’ the default is legitimate for their usage. One could argue that using custom http headers would be preferable to a path element (to the exclusion of JavaScript being able to access them, iirc?) or that the path element blob should be moved into the domain component or should prefix rather than suffix the operands; all valid choices, but none implicitly better or worse here.

Object hash is obviously mandatory for git permalinks, and is perhaps the only mandatory component here. (But notably, that’s not the same as a commit hash.) However, such paths could arguably be interpreted as maximally user-hostile.

File path, interestingly enough, is completely disposable if one refers to a specific result object hash within a commit, but if the prior object hash was required to be a commit, then this is a valid unique identifier for the filesystem-tree contents of that commit. You could use the object hash instead of the full path within the commit hash, but that’s a pretty user-hostile way to go about this.

So, then, which part of the ordering and path selections do you consider indiscriminate, and why?

reply
actually, instead of the object hash, you could also use the commit-hash. then the filename would be mandatory, but the url would be more readable and usable: give me the file VERBS.md as it is at commit <hash>
reply
That's actually what it is here, a7e17290a39250092055fcda5ae7015868dabdb4 is a commit's oid: https://github.com/gritzko/beagle/commit/a7e17290a3925009205...
reply
deleted
reply
But the path misses param names (or types?). E.g who said the hex-encoded part is a commit hash? Maybe it's a tree hash, or just weird ref.

Query strings are more verbose as force to give each param a name.

reply
Which target audience of github needs extra verbosity in the commit hash, though? Once you know it you know it; if you don’t know git you aren’t the target audience; etc. Saying /user=foo is no better than ?user=foo if your audience can work it out without confusion from your unadorned paths. We have a great deal of history with filesystems showing that people are capable of keeping up with paths that lack key names if exposed to and familiar with them, and if the filesystem isn’t being constantly randomized.
reply
what would be a better way of doing that? i am not disagreeing, but i just can't think of any way to improve on this. put everything into the query part? i prefer to use the query only for optional arguments. in this example the blob argument is the only thing that doesn't fit in my opinion.
reply
Every object in git (commit, tree, revision of a single file) has a hash that is guaranteed unique within a repository (otherwise many more things than a web UI would break) and likely also globally. I can understand wanting to isolate repositories to prevent hash collisions from causing problems, but within a repo everything has a universally unique ID.

edit: for instance, that specific VERBS.md is represented by the blob 3b9a46854589abb305ea33360f6f6d8634649108.

reply
that's not what i meant. i was trying to suggest that the string "blob" does not fit. why is it there? why is it needed?

    https://github.com/gritzko/beagle/a7e17290a39250092055fcda5ae7015868dabdb4/VERBS.md
this should be sufficient to represent the file.

"blob" is like a descriptor of the value that follows. it would be like doing this:

    https://github.com/user/gritzko/project/beagle/blob/a7e17290a39250092055fcda5ae7015868dabdb4/file/VERBS.md
this actually irks me every time i see it in a github url
reply
> this should be sufficient to represent the file.

Except it's not, because the oid can be a short hash (https://github.com/gritzko/beagle/blob/a7e172/VERBS.md) and that means you're at risk of colliding with every other top-level entry in the repository, so you're restricting the naming of those toplevel entries, for no reason.

So namespacing git object lookups is perfectly sensible, and doing so with the type you're looking for (rather than e.g. `git` to indicate traversal of the git db) probably simplifies routing, and to the extent that it is any use makes the destination clearer for people reading the link.

reply
They are following the /key/value/key/value pattern, but the first two pairs in a GitHub URL are fixed to user and project, which lets them omit the key names. I could see them not being willing to hardcode the third pair to blob.

Back when GitHub URLs were kind of cool, github.com/user/gritzko/project/beagle would have been much less cool than just github.com/gritzko/beagle.

reply
> They are following the /key/value/key/value pattern

They are not. There's just a routing layer below the repository.

reply
Back in the day there was an attempt to introduce "matrix URIs" as a more structured alternative to query strings: https://www.w3.org/DesignIssues/MatrixURIs.html

Of course there's nothing to stop you using URIs like this (I think Angular does, or did at one point?) but I don't think the rules for relative matrix URIs were ever figured out and standardised, so browsers don't do anything useful with them.

reply
Not entirely arbitrary - forms that use the GET method instead of POST will append form values as query params.

For sites without Javascript, it's great for things like search boxes, tables with sorting/filtering, etc. instead of POST, since it preserves your query in the URL.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

reply
It has always amazed me how much trouble the SPA folks are willing to go to in order to slowly rebuild just normal boring URLs with querystrings because users demand deep linking and back buttons and the like.

Or you could accept that you're probably going to need a round trip to the server and use a normal URL and it's fine.

For all but the absolute biggest websites in the world, anyhow. At Facebook or Google scale yeah it's needed.

reply
Nothing you said here is correct. Paths, query strings, and fragments are all well defined entities. https://datatracker.ietf.org/doc/html/rfc3986#section-3.3
reply
It’s a string between ? and # isn’t well defined. Or it is and it says very little.
reply
Query strings existed before CGI did, and the way they're defined to be filled in from web forms is quite useful; I wouldn't want to need Javascript to fit that into path format. There's nothing wrong about having things decided by the server; I don't get that part of your argument at all.
reply
Maybe dumb question: how does the server “decide” anything other than what file to serve? Today we have many choices but back in the day CGI was the first standard way to do it.

So yes query parameters existed before CGI but to use them you had to hack your server to do something with them (iirc NCSA web servers had some magic hacks for queries). CGI drove standardization.

reply
TCP has been around a long time. Listen, read, send, you're good to go. It's just software so you can make it do anything.

But you're asking about the relationship between popular primarily file serving servers like Apache and their relationship to high level code to create custom responses? Yeah, CGI was the first big standard there that I remember, though it was a bit before my time. But that's only one possible architecture.

These days, most web apps have the web server built in, and so the custom code you're writing works with the full request directly. There may be a lightweight web server in front (or multiple), like nginx, to manage connections, but they will largely just proxy the whole thing through.

reply

    func specialHandler(w http.ResponseWriter, r *http.Request) {
 if time.Now().Weekday() == time.Tuesday {
  http.NotFound(w, r)
  return
 }

     fmt.Fprintln(w, "server made a decision")
    }
Your server can make decisions however you program it to, you know? It's just software.

Forgive the phone-posting.

reply
and what server software is running this code in 1995?
reply
CL-HTTP or AOLserver
reply
sure looks like VB there, what’s the plugin? Didn’t see anything like that before.
reply
That's Go.
reply
Which runs on what computer in 1995?
reply
deleted
reply
It's arbitrary to a degree like the difference between using an attribute or child element in XML, but it's not entirely arbitrary. If you want to include data in the URL that's not part of the hierarchy of the path, query strings are good for that.
reply
How do you figure?

Paths are hierarchical; query strings are name/value.

(Note I speak of common usage.)

You can create a different convention, but that one is pretty dang useful.

reply
Whatwg is for html, try the IEEE http rfcs
reply