Previously processing multiple images in a batch would trigger
segfaults so sending images together was disabled as a way to
mitigate this. The trigger was processing one image on the CPU
and one on the GPU.
This can no longer happen:
- The vision encoder is now on the GPU so both images would be
processed on the GPU.
- We require images to be fully contained in a batch and each
image including its special tokens is over half the batch size.
As a result, we will never get two images in the same batch.
Fixes#9731