upvote
Differentials between different URI parsers are a huge source of bugs. The amount of shenanigans you can do inside URIs is bonkers, and trying to handle this by yourself with some regex and string splitting is absolutely insane.

Like https://www.example.com:443@203569230:8080/ will send you to the IP address "12.34.56.78" on port 8080 using basic authentication with the domain and port as username and password. If your code tries to split by `:` or check that the URI starts with some specific string, then it won't be good enough. Indeed, use a library that you trust.

reply
I don't believe Python's urllib has a function that takes what HTTP terms an "origin-form" (an absolute path with possibly a query attached to it with "?") and parses it apart.

Still, the RFC 9112 that defines HTTP/1.1 basics requires that, for the purposes of URI reconstruction, "if there is no Host header field or if its field value is empty or invalid, the target URI's authority component is empty."

reply
reply
Yep, none of them are suitable for this use case; you need to validate the Host header first and reconstruct the URI first before parsing it.
reply
deleted
reply
[dead]
reply