Camera doesn't exist in the 3D world. It's just our perception. We fake that perception by creating a matrix.
Understanding gluLookAt
The gluLookAt function accepts three parameters:
- eye — camera position in world space
- target — the point being viewed
- up — vector defining camera orientation
The objective is creating a matrix that transforms world coordinates to camera coordinates, where the camera sits at the origin, views along the negative z-axis, and has its up-direction along positive y.
Step 1 — Creating Camera Basis Vectors
Three orthogonal vectors define the camera's coordinate system:
- Forward vector — points toward the target
- Right vector — perpendicular to forward and up
- Up vector — perpendicular to forward and right
Why recompute the up vector? The input up vector might not be perfectly perpendicular to the forward vector. Recomputing it maintains an orthonormal basis and prevents skewing or shearing. The forward vector stays fixed because it defines what the camera targets; the up vector is adjusted purely for orthogonality.
Step 2 — Building the View Matrix
The view matrix accomplishes two transformations:
- Rotates the world to align camera basis vectors with world axes
- Translates the world so the camera sits at the origin
The rotation matrix is constructed using the camera basis vectors (right r, up u, forward f):
| rx ry rz 0 |
| ux uy uz 0 |
| fx fy fz 0 |
| 0 0 0 1 |
The translation matrix moves the camera to the origin using negative eye coordinates:
| 1 0 0 -eye.x |
| 0 1 0 -eye.y |
| 0 0 1 -eye.z |
| 0 0 0 1 |
Step 3 — Combining Everything
The final view matrix is View = R × T. This combination:
- Moves the camera to the origin
- Aligns the view direction with the negative z-axis
- Aligns the up direction with the positive y-axis
The result: every point in the world is now expressed relative to the camera. This is what makes the "camera" work — there is no actual camera object, just a transformation applied to everything else.
The Transformation Pipeline
The sequence that follows:
- View matrix applied (world to camera space)
- Projection matrix applied (camera to clip space)
- Viewport transform (clip to screen)
This sequence is efficient because the view matrix affects the entire scene before subsequent transformations, keeping the math clean and composable.