Depth Perception

Vocabulary/concepts:
- Prey FOV: Anatomically, prey tend to have eyes on the sides of their heads in order to have a much wider field of vision in case there are predators, less detailed vision and lacks depth perception from binocular eyes
- Predator FOV: Anatomically, predators tend to have eyes more in the front with more overlapping vision, giving detailed pictures and increased depth perception, in order to better hunt
- Binocular summation: If both eyes are looking at something, there is increased chances that a target is in the FOV of at least one eye. If each eye has a 75% chance of seeing the target, the chance of not seeing it is 1/16 (which is 1-(1/4)*(1/4))
- Binocular disparity: The phenomenon of having slightly different images on the left and right retinas, due to slightly different angles, aids in depth perception. The amount of crossed/uncrossed disparity is an example of relative metric depth information, happening because of this
- Stereopsis: Seeing with two eyes (unlike monocular vision), very much like binocular disparity, infers depth information
- Fixation: The act of putting something in the center of the visual field, in the center of the retina for both eyes, no binocular disparity
- Horopter: The “circle” between eyes in which if an object is located within it will project to the same point of both retinas. Changes with change in fixation. Can be horizontal or vertical, objects not in this circle project to different points of the retinas and provide information about depth
- Panum’s fusion area: The “width” nearby the horopter that are close to but not exactly on the horopter where there isn’t a real difference in binocular disparity, the visual system doesn’t take disparity into account for points in this area
- Circle: Can be defined by any 3 non-liner points, a circle can be created with any of them. For the horopter, these 3 points are the two pupils, and the center of fixated vision
- Uncrossed disparity: Surfaces further than the horopter are displaced rightward in the right visual field and leftward in the left visual field (same side of the retina as the visual field), objects project closer to the nose and have a smaller visual angle, if this is small objects are closer to the horopter (further from us)
- Crossed disparity: Surfaces closer than the horopter are displaced leftward in the right visual field and rightward in the left visual field (opposite), objects project closer to the ears and have a larger visual angle, if this is small objects are closer to the horopter (closer to us)
- Correspondence problem: How does the visual system determine what points of the image the left eye sees correspond to points of what the right eye is seeing, how is a cohesive image created?
- Shape-First Theory: A way to solve the correspondence problem, there are images analyzed separately in the left and right eye, and then stereopsis
- Stereopsis First Theory: A way to solve the correspondence problem, information enters the left and right eye, going into stereopsis, and then the shape is analyzed together, demonstrated by the random dot stereogram thing (square is present in left and right images if 3D?), correct
- Random dot stereogram: No specific shape from left or right eye, only shape when depth is combined, giving evidence for stereopsis-first
- Absolute depth information: Calculated in terms of how far in a distance is an object away from an organism, given as a measurement of distance irrelevant of other objects. Always metric
- Relative depth information: Calculated from comparison of distances between objects, example a trash can is between me and the door, this applies to binocular disparity as objects project to the eyes with either crossed or uncrossed disparity (relative terms) and these are then compared to one another. Can be metric or non-metric
- Metric depth information: Gives quantitative information about depth (A is 2x as far away as B), this applies to binocular disparity due to the differing angles information comes into each eye from. Always applies to absolute depth information
- Non-metric depth information: Gives information about order (A is closer than B)
- Convergence: If the eyes move closer to each other as an object moves (or we move fixation between objects), then the object is moving closer. Determined by the amount of eye rotation
- Divergence: If the eyes move further from each other as an object moves (or we move between objects), then an object is moving further away. Determined by the amount of eye rotation
- Ciliary muscles: Adjust the lens to focus (bent = close, flat = far), the amount that these bend is kept track of and used to determine the visual angle and how it is changing, to ultimately derive absolute (metric) depth information about an object. This however isn’t useful if an object is further than 2-3 meters (amount of change isn’t that large, eyes tend to just be parallel)
- Motion parallax: An object that isn’t changing in properties will move more as an organism moves if the object is closer, if the object is further away, it will move across the visual field slower, this is relative and metric (if there are multiple things moving that can be compared to one another, based on the amount that they move across the field in ratios). Mathematically, it could be absolute but this isn’t how the brain treats it
- Pictorial depth cues: Cues on depth that work in a 2D setting, such as a photograph (or with monocular vision!), also work in a 3D space. Include things like size, saturation, detail, “height”, occluding cues
  - Occlusion: Which objects are covering which other objects, objects that occlude others are likely closer in terms of depth, relative non-metric
  - Relative size: Bigger objects are closer than smaller objects, typically. Relative metric (how much of the visual field an object takes up in terms of visual angle)
  - Familiar size: Distance information based on our knowledge of an object’s size as a reference, absolute and metric (based on trig)
  - Relative height: Objects “higher up” in a 2D image plane appear to be farther away, if two objects are the same size the one near the top plane is perceived as larger than the one on the bottom plane. Relative and metric (height on retina in comparison, reversed)
  - Linear perspective: Things that are parallel will have constant size proportions on the left and right sides, they will also converge to a point at the horizon, relative and metric (based on relation to the vanishing point)
  - Aerial perspective: Blue light scatters more, so objects farther away appear bluer than objects that are closer, far objects are smooth/foggy, relative metric (comparison of how foggy and blue it is based on activity of blue cone)
- Vanishing point: Where two parallel lines converge, assumed to be the horizon according to linear perspective
Two eyes allow for:
- Binocular summation
- Wider view of environment, more things seen, expands the visual field
- If one eye is damaged we can still see, seeing is one of our most important senses and eyes are fragile (kind of like insurance)
- Binocular disparity, aids in depth perception
The Marr & Poggio algorithm uses low and high frequency filters, low for blurry images to find general features and high to ‘check work’ and details:
- Use blurred/low spatial frequency filters to find likely features even in noisy displays like random dot stereograms
- Look for matched features in both retinal images
- Then check higher spatial frequency filters to confirm the match
The further something is away, the smaller visual angle it has (taken from eye to the object, it will end up projecting closer to the nose for each eye. The larger the angle, the closer the object, the closer to the ears on the retina
Amount of crossed/uncrossed disparity corresponds with distance from the horopter
Relative metric depth information could be something like “Object A is 2x closer to me than object B” (like binocular disparity)
In an example, the angle an object makes between the pupils of two eyes is theta. The distance between the two eyes (with the nose in the middle) is w, d is the distance that is a straight line between the nose and the object (in the middle because it is being fixated on). Thus, tan(theta/2)=w/(2d). If the visual angle is known, this allows for absolute depth information to be derived about an object