Lets say I have a device with 2880 cuda cores.
I want to run a Monte Carlo simulation where:
- 2000 threads are each running a sample
- 880 threads are generating random numbers
This is because:
- I only want 2000 samples therefore the other 880 would be sitting idle
- I know that generating random numbers can be slow
Therefore I want to make a pool of random numbers that is replenished continuously by the 880 threads which the 2000 sample threads can take when required.
Is this possible? If so, please provide an example.