Fix the threads used per subimage
The old calculation for number of threads didn't really make sense anymore, possibly because of other changes since that was implemented. The number of threads per subimage is now set to the total number of requested threads, divided by the number of subimages that will run in parallel. This number is rounded upward.