Submit New Event

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Submit News Feature

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Contribute a Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Sign up for Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Aug 30, 2017

Dask Release 0.15.2

By

This work is supported by Anaconda Inc.and the Data Driven Discovery Initiative from the MooreFoundation.

I’m pleased to announce the release of Dask version 0.15.2. This releasecontains stability enhancements and bug fixes. This blogpost outlinesnotable changes since the 0.15.0 release on June 11th.

You can conda install Dask:

conda install dask

or pip install from PyPI

pip install dask[complete] --upgrade

Conda packages are available both on the defaults and conda-forge channels.

Full changelogs are available here:

Some notable changes follow.

New dask-core and dask conda packages

On conda there are now three relevant Dask packages:

  1. dask-core: Package that includes only the core Dask package. This has nodependencies other than the standard library. This is primarily intendedfor down-stream libraries that depend on certain parts of Dask.
  2. distributed: Dask’s distributed scheduler, depends on Tornado,cloudpickle, and other libraries.
  3. dask: Metapackage that includes dask-core, distributed, and all relevantlibraries like NumPy, Pandas, Bokeh, etc.. This is intended for users toinstall

This organization is designed to both allow downstream libraries to onlydepend on the parts of Dask that they need while also making the defaultbehavior for users all-inclusive.

Downstream libraries may want to change conda dependencies from dask todask-core. They will then need to be careful to include the necessarylibraries (like numpy or cloudpickle) based on their user community.

Improved Deployment

Due to increased deployment on Docker or other systems with complex networkingrules dask-worker processes now include separate --contact-address and--listen-address keywords that can be used to specify addresses that theyadvertise and addresses on which they listen. This is especially helpful whenthe perspective of ones network can shift dramatically.

dask-worker scheduler-address:8786 \
--contact-address 192.168.0.100:9000 # contact me at 192.168.0.100:9000
--listen-address 172.142.0.100:9000 # I listen on this host

Additionally other services like the HTTP and Bokeh servers now respect thehosts provided by --listen-address or --host keywords and will not bevisible outside of the specified network.

Avoid memory, file descriptor, and process leaks

There were a few occasions where Dask would leak resources in complexsituations. Many of these have now been cleaned up. We’re grateful to allthose who were able to provide very detailed case studies that demonstratedthese issues and even more grateful to those who participated in resolvingthem.

There is undoubtedly more work to do here and we look forward to futurecollaboration.

Array and DataFrame APIs

As usual, Dask array and dataframe have a new set of functions that fill outtheir API relative to NumPy and Pandas.

See the full APIs for further reference:

Deprecations

Officially deprecated dask.distributed.Executor, users should use dask.distributed.Clientinstead. Previously this was set to an alias.

Removed Bag.concat, users should use Bag.flatten instead.

Removed magic tuple unpacking in Bag.map like bag.map(lambda x, y: x + y).Users should unpack manually instead.

Julia

Developers from the Invenia have been building Julia workers and clients thatoperate with the Dask.distributed scheduler. They have been helpful in raisingissues necessary to ensure cross-language support.

Acknowledgements

The following people contributed to the dask/dask repository since the 0.15.0release on June 11th

  • Bogdan
  • Elliott Sales de Andrade
  • Bruce Merry
  • Erik Welch
  • Fabian Keller
  • James Bourbeau
  • Jeff Reback
  • Jim Crist
  • John A Kirkham
  • Luke Canavan
  • Mark Dunne
  • Martin Durant
  • Matthew Rocklin
  • Olivier Grisel
  • Søren Fuglede Jørgensen
  • Stephan Hoyer
  • Tom Augspurger
  • Yu Feng

The following people contributed to the dask/distributed repository since the1.17.1 release on June 14th:

  • Antoine Pitrou
  • Dan Brown
  • Elliott Sales de Andrade
  • Eric Davies
  • Erik Welch
  • Evan Welch
  • John A Kirkham
  • Jim Crist
  • James Bourbeau
  • Jeremiah Lowin
  • Julius Neuffer
  • Martin Durant
  • Matthew Rocklin
  • Paul Anton Letnes
  • Peter Waller
  • Sohaib Iftikhar
  • Tom Augspurger

Additionally we’re happy to announce that John Kirkham(@jakirkham) has accepted commit rights to theDask organization and become a core contributor. John has been active throughthe Dask project, and particularly active in Dask.array.