upvote
Go's docs put it like this: Path names are UTF-8-encoded, unrooted, slash-separated sequences of path elements, like “x/y/z”. If you operate on a path that's a non-UTF-8 string, then Go will do... something to make the string work with UTF-8 when passed back to standard file methods, but it likely won't end up operating on the same file.

Rust has OsStr to represent strings like paths, with a lossy/fallible conversion step instead.

Go's approach is fine for 99% of cases, and you're pretty screwed if your application falls for the 1% issue. Go has a lot of those decisions, often to simplify the standard library for most use cases most people usually run into (like their awful, lossy, incomplete conversion between Unix and Windows when it comes to permissions/read-only flags/etc.).

reply
> Path names are UTF-8-encoded, unrooted, slash-separated sequences of path elements, like “x/y/z”

This is only for the "io/fs" package and its generic filesystem abstractions. The "os" package, which always operates on the real filesystem, doesn't actually specify how paths are encoded, nor does its associated helper package "path/filepath".

In practice, non-UTF-8 already wasn't an issue on Unix-like systems, where file paths are natively just byte sequences. You do need to be aware of this possibility to avoid mangling the paths yourself, though. The real problem was Windows, where paths are actually WTF-16, i.e. UTF-16 with unpaired surrogates. Go has addressed this issue by accepting WTF-8 paths since Go 1.21: https://github.com/golang/go/issues/32334#issuecomment-15500...

reply
The `os` package, that is the main way everyone I've seen opens and reads files in Go, doesn't specify any restriction on its path syntax (except that it uses `string`, of course). I've tried using it on Linux with a file name that would be invalid UTF-8 and it works without any issues.

I for one hadn't even heard of the io/fs package that has the problems that you mention, and I don't remember ever seeing it used in an example. I've looked in a code base I help maintain, and the only uses I could find are related to some function type definitions that are used by filepath.WalkDir and filepath.Walk - and those functions explicitly document the fact that they don't use `io/fs` style paths when calling these functions - they don't even respect the path separator format:

  // WalkDir calls fn with paths that use the separator character appropriate
  // for the operating system. This is unlike [io/fs.WalkDir], which always
  // uses slash separated paths.
  func WalkDir(root string, fn fs.WalkDirFunc) error {
Where fs.WalkDirFunc is defined like this:

  type WalkDirFunc func(path string, d DirEntry, err error) error
reply
> Go strings are just arrays of bytes,

https://go.dev/ref/spec#String_types: “A string value is a (possibly empty) sequence of bytes”

https://pkg.go.dev/strings@go1.26.2: “Package strings implements simple functions to manipulate UTF-8 encoded strings.”

So, yes, Go strings are just arrays of bytes in the language, but in the standard library, they’re supposed to be UTF-8 (the documentation isn’t immediately clear on how it handles non-UTF-8 strings).

I think this may be why the OP thinks the Go approach is “every path is a valid UTF-8 string”

reply