← blog
Aug 2022

Batch Rendering 101

The conventional approach to rendering multiple objects involves submitting vertex buffers, setting uniforms, and issuing individual draw calls for each object. As the number of objects increases, so do the draw calls — and with them comes a performance bottleneck that underutilises the GPU.

Batch rendering solves this by consolidating vertex data into a single buffer and issuing one draw call instead of many.

The Core Idea

Pack all your geometry into a single vertex buffer. Submit once. The GPU processes everything in one pass, with far less CPU-GPU synchronization overhead.

Three practical challenges arise immediately:

  1. How to pass per-object shader data without setting uniforms between draws
  2. Managing dynamic geometry that changes every frame
  3. Handling multiple textures for different meshes within one batch

Handling Multiple Textures

Bind all needed textures to specific texture slots upfront, then store the slot index as a vertex attribute. The shader reads it per-fragment:

// C++: bind textures to slots
glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, tex0);
glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, tex1);

// Pass sampler array uniform
int samplers[2] = { 0, 1 };
glUniform1iv(glGetUniformLocation(shader, "u_Textures"), 2, samplers);
// GLSL fragment shader
uniform sampler2D u_Textures[2];
in float v_TexIndex;

void main() {
    int index = int(v_TexIndex);
    color = texture(u_Textures[index], v_TexCoord);
}

Dynamic Geometry

Use the right buffer hint:

Two methods to update buffer contents without reallocating:

// Method 1: glBufferSubData
glBufferSubData(GL_ARRAY_BUFFER, offset, size, data);
// Method 2: map the buffer directly
void* ptr = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
memcpy(ptr, data, size);
glUnmapBuffer(GL_ARRAY_BUFFER);

Both pre-allocate the buffer at startup and update in-place, avoiding the cost of re-uploading the entire buffer each frame.

The Payoff

A well-implemented batch renderer can reduce draw calls from thousands to single digits per frame. This is the approach used in 2D engines, UI renderers, and particle systems — anywhere you have many small objects that share geometry types and can be sorted by state.