Youness Bennani February 2016

Reserved CPU Time in Google Dataflow

I have a question regarding the reserved CPU time field in Google Dataflow. I don't understand why it varies so widely depending on the configuration of my run. I suspect that I am not interpreting the reserved CPU time for what it really is. To my understanding, it is the CPU time that was needed to complete the job I submitted, but based on the following evidence, it seems I may be mistaken. Is it the time that is allocated to your job, regardless of whether it is actually using the resources? If that's the case, how do I get the actual CPU time of my job?

First I ran my job with a variable sized pool of workers (max 24 workers). Long run config

The corresponding stats are as follows:

Long run stats

Then, I ran my script using a fixed number of workers (10):

Short run config

And the stats changes to:

Short run stats

They went from 15 days to 7 hours? How is that possible?!

Thanks!

Answers


Tudor Marian February 2016

If you hover over the "?" next to "Reserved CPU time" a pop-up message will show and it will read: "The total time Dataflow was active on GCE instances, on a per-CPU basis." This indicates it is not the CPU-time used by the VMs. At this time Dataflow does not aggregate per-machine CPU usage stats; you may, however, be able to use the cloud monitoring API to extract those metrics yourself.

Post Status

Asked in February 2016
Viewed 3,539 times
Voted 8
Answered 1 times

Search




Leave an answer