Submit New Event

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Submit News Feature

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Contribute a Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Sign up for Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Dec 5, 2016

Dask Development Log


This work is supported by Continuum Analyticsthe XDATA Programand the Data Driven Discovery Initiative from the MooreFoundation

Dask has been active lately due to a combination of increased adoption andfunded feature development by private companies. This increased activityis great, however an unintended side effect is that I have spent less timewriting about development and engaging with the broader community. To addressthis I hope to write one blogpost a week about general development. These willnot be particularly polished, nor will they announce ready-to-use features forusers, however they should increase transparency and hopefully better engagethe developer community.

So themes of last week

  1. Embedded Bokeh servers for the Workers
  2. Smarter workers
  3. An overhauled scheduler that is slightly simpler overall (thanks to thesmarter workers) but with more clever work stealing
  4. Fastparquet

Embedded Bokeh Servers in Dask Workers

The distributed scheduler’s web diagnosticpage is one of Dask’smore flashy features. It shows the passage of every computation on the clusterin real time. These diagnostics are invaluable for understanding performanceboth for users and for core developers.

I intend to focus on worker performance soon, so I decided to attach a Bokehserver to every worker to serve web diagnostics about that worker. To makethis easier, I also learned how to embed Bokeh servers inside of otherTornado applications. This has reduced the effort to create new visuals andexpose real time information considerably and I can now create a full livevisualization in around 30 minutes. It is now faster for me to builda new diagnostic than to grep through logs. It’s pretty useful.

Here are some screenshots. Nothing too flashy, but this information is highlyvaluable to me as I measure bandwidths, delays of various parts of the code,how workers send data between each other, etc..

Dask Bokeh Worker system page

Dask Bokeh Worker system page

Dask Bokeh Worker system page

To be clear, these diagnostic pages aren’t polished in any way. There’s lotsmissing, it’s just what I could get done in a day. Still, everyone running aTornado application should have an embedded Bokeh server running. They’regreat for rapidly pushing out visually rich diagnostics.

Smarter Workers and a Simpler Scheduler

Previously the scheduler knew everything and the workers were fairlysimple-minded. Now we’ve moved some of the knowledge and responsibility overto the workers. Previously the scheduler would give just enough work to theworkers to keep them occupied. This allowed the scheduler to make betterdecisions about the state of the entire cluster. By delaying committing a taskto a worker until the last moment we made sure that we were making the rightdecision. However, this also means that the worker sometimes has idleresources, particularly network bandwidth, when it could be speculativelypreparing for future work.

Now we commit all ready-to-run tasks to a worker immediately and that workerhas the ability to pipeline those tasks as it sees fit. This is better locallybut slightly worse globally. To counter balance this we’re now being much moreaggressive about work stealing and, because the workers have more information,they can manage some of the administrative costs of works stealing themselves.Because this isn’t bound to run on just the scheduler we can use more expensivealgorithms than when when did everything on the scheduler.

There were a few motivations for this change:

  1. Dataframe performance was bound by keeping the worker hardware fullyoccupied, which we weren’t doing. I expect that these changes willeventually yield something like a 30% speedup.
  2. Users on traditional job scheduler machines (SGE, SLURM, TORQUE) and userswho like GPUS, both wanted the ability to tag tasks with specific resourceconstraints like “This consumes one GPU” or “This task requires a 5GB of RAMwhile running” and ensure that workers would respect those constraints whenrunning tasks. The old workers weren’t complex enough to reason about theseconstraints. With the new workers, adding this feature was trivial.
  3. By moving logic from the scheduler to the worker we’ve actually made themboth easier to reason about. This should lower barriers for contributorsto get into the core project.

Dataframe algorithms

Approximate nunique andmultiple-output-partition groupbys landed in master last week. These arosebecause some power-users had very large dataframes that weree running intoscalability limits. Thanks to Mike Graham for the approximate nuniquealgorithm. This has also pushed hashingchanges upstream to Pandas.

Fast Parquet

Martin Durant has been working on a Parquet reader/writer for Python usingNumba. It’s pretty slick. He’s been using it on internal Continuum projectsfor a little while and has seen both good performance and a very Pythonicexperience for what was previously a format that was pretty inaccessible.

He’s planning to write about this in the near future so I won’t steal histhunder. Here is a link to the