When I run a U SQL script from portal/visual studio it follows stages like preparing,queued,running,finalizing. What exactly happens behind the scenes in all these stages?Will there be any execution time difference when the job is run from visual studio/portal in dev and production environment? We need to clock the speeds and record the time the script would take in production.Ultimately, the goal is to run these scripts as Data Factory activities in production.
I assume that there would be differences since I assume your dev environment would probably run at lower resource usage (lower degree of parallelism both between jobs and inside a job) than your production environment. Otherwise there should be no difference.
Note that we are still working on performance so if you are running into particular issues, please let us know.
The phases roughly do the following (I am probably missing some parts):
preparing: includes compilation, optimization, Codegen, preparing the execution graph and required resources and putting the job into the queue.
queueing: The job sits in the queue to get executed once the job is at the top of the queue and resources are available to start the job. This can be impacted by setting the maximal number of jobs that can run in parallel (a setting you can set by "calling" support/us).
running: Actual job execution. This will be affected by resources: Maximal number of parallelism that is specified on the job, network bandwidth, store access (throttling, bandwidth).
finalizing: Cleanup and stitching results into files, "sealing" table files. This can be more expensive depending on where you write the data (ADL is faster than WASB for example).
Asked in February 2016Viewed 2,992 timesVoted 10Answered 1 times