upvote
Yeah I completely agree, but I think this model solves a different problem. AFAIK it's specifically there for the case where you only have one photo, but still need a 3D gaussian splat scene.
reply
I haven't tried that specific case but - are you sure? It does get a lot of stuff right from context. I think it would probably depend how much of the frame, the poster took up.
reply
More reference images from different angles is always going to give more accurate information in 3D. From a single 2D image there is a lot of ambiguity in the context. Several different shapes in 3D can be represented in identical ways in 2D. Additional context like lighting shadows etc helps. But more real signal from more images will always be better
reply
I'm not saying it wouldn't be - because that's obvious.
reply
Agreed, wasn't arguing just trying to add additional information in case it isn't obvious to anyone
reply
Maybe, but what is wrong with wanting real depth instead of "made up depth"? One extra photo mostly solves that.
reply
1. There's many use cases where only a single photo is available

2. There are many models similar to Sharp that do accept multiple photos - but Sharp is trying to solve a specific problem. If you have multiple photos - don't use Sharp.

reply