This work is supported by Anaconda Inc.and the Data Driven Discovery Initiative from the MooreFoundation.
I’m pleased to announce the release of Dask version 0.15.2. This releasecontains stability enhancements and bug fixes. This blogpost outlinesnotable changes since the 0.15.0 release on June 11th.
You can conda install Dask:
conda install dask
or pip install from PyPI
pip install dask[complete] --upgrade
Conda packages are available both on the defaults and conda-forge channels.
Full changelogs are available here:
Some notable changes follow.
On conda there are now three relevant Dask packages:
This organization is designed to both allow downstream libraries to onlydepend on the parts of Dask that they need while also making the defaultbehavior for users all-inclusive.
Downstream libraries may want to change conda dependencies from dask todask-core. They will then need to be careful to include the necessarylibraries (like numpy or cloudpickle) based on their user community.
Due to increased deployment on Docker or other systems with complex networkingrules dask-worker processes now include separate --contact-address and--listen-address keywords that can be used to specify addresses that theyadvertise and addresses on which they listen. This is especially helpful whenthe perspective of ones network can shift dramatically.
dask-worker scheduler-address:8786 \
--contact-address 192.168.0.100:9000 # contact me at 192.168.0.100:9000
--listen-address 172.142.0.100:9000 # I listen on this host
Additionally other services like the HTTP and Bokeh servers now respect thehosts provided by --listen-address or --host keywords and will not bevisible outside of the specified network.
There were a few occasions where Dask would leak resources in complexsituations. Many of these have now been cleaned up. We’re grateful to allthose who were able to provide very detailed case studies that demonstratedthese issues and even more grateful to those who participated in resolvingthem.
There is undoubtedly more work to do here and we look forward to futurecollaboration.
As usual, Dask array and dataframe have a new set of functions that fill outtheir API relative to NumPy and Pandas.
See the full APIs for further reference:
Officially deprecated dask.distributed.Executor, users should use dask.distributed.Clientinstead. Previously this was set to an alias.
Removed Bag.concat, users should use Bag.flatten instead.
Removed magic tuple unpacking in Bag.map like bag.map(lambda x, y: x + y).Users should unpack manually instead.
Developers from the Invenia have been building Julia workers and clients thatoperate with the Dask.distributed scheduler. They have been helpful in raisingissues necessary to ensure cross-language support.
The following people contributed to the dask/dask repository since the 0.15.0release on June 11th
The following people contributed to the dask/distributed repository since the1.17.1 release on June 14th:
Additionally we’re happy to announce that John Kirkham(@jakirkham) has accepted commit rights to theDask organization and become a core contributor. John has been active throughthe Dask project, and particularly active in Dask.array.