Submit New Event

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Submit News Feature

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Contribute a Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Sign up for Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Jan 23, 2019

Dask Release 1.1.0

By

I’m pleased to announce the release of Dask version 1.1.0. This is a majorrelease with bug fixes and new features. The last release was 1.0.0 on2018-11-29.This blogpost outlines notable changes since the last release.

You can conda install Dask:

conda install dask

or pip install from PyPI:

pip install dask[complete] --upgrade

Full changelogs are available here:

Notable Changes

A lot of work has happened over the last couple months, and we encourage peopleto look through the changelog to get a sense of the kinds of incrementalchanges that developers are working on.

There are also a few notable changes in this release that we’ll highlight here:

  • Support for the recent Numpy 1.16 and Pandas 0.24 releases
  • Support for Pandas Extension Arrays (see Tom Augspurger’s post on the topic)
  • High level graph in Dask dataframe and operator fusion in simple cases
  • Increased support for other libraries that look enough like Numpy and Pandasto work within Dask Array/Dataframe

Support for Numpy 1.16 and Pandas 0.24

Both Numpy and Pandas have been evolving quickly over the last few months.We’re excited about the changes to extensibility arriving in both libraries.The Dask array/dataframe submodules have been updated to work well with theserecent changes.

Pandas Extension Arrays

In particular Dask Dataframe supports Pandas Extension arrays,meaning that it’s easier to use third party Pandas packages like CyberPandas orFletcher in parallel with Dask Dataframe.

For more information see Tom Augspurger’s post

High Level Graphs in Dask Dataframe

For a while Dask array has had some high level graphs for “atop” operations(elementwise, broadcasting, transpose, tensordot, reductions), which allow forreduced overhead and task fusion on computations within this class.

y = da.exp(x + 1).T # These operations get fused to a single task

We’ve renamed atop to blockwise to be a bit more generic, and have alsostarted applying it to Dask Dataframe, which helps to reduce overheadsubstantially when doing computations with many simple operations.

This still needs to be improved to increase the class of cases where it works,but we’re already seeing nice speedups on previously unseen workloads.

The da.atop function has been deprecated in favor of da.blockwise. Thereis now also a dd.blockwise which shares a common code path.

Non-Pandas dataframe and Non-Numpy array types

We’re working to make Dask a bit more agnostic to the types of in-memory arrayand dataframe objects that it can manipulate. Rather than having Dask Array bea grid of Numpy arrays and Dask Dataframe be a sequence of Pandas dataframes,we’re relaxing that constraint to a grid of Numpy-like arrays and a sequenceof Pandas-like dataframes.

This is an ongoing effort that has targetted alternate backends likescipy.sparse, pydata/sparse, cupy, cudf and other systems.

There were some recent posts onarrays anddataframes that show proofs ofconcept for this with GPUs.

Acknowledgements

There have been several releases since the last time we had a release blogpost.The following people contributed to the dask/dask repository since the 0.19.0release on September 5th:

  • Anderson Banihirwe
  • Antonino Ingargiola
  • Armin Berres
  • Bart Broere
  • Carlos Valiente
  • Daniel Li
  • Daniel Saxton
  • David Hoese
  • Diane Trout
  • Damien Garaud
  • Elliott Sales de Andrade
  • Eric Wolak
  • Gábor Lipták
  • Guido Imperiale
  • Guillaume Eynard-Bontemps
  • Itamar Turner-Trauring
  • James Bourbeau
  • Jan Koch
  • Javad
  • Jendrik Jördening
  • Jim Crist
  • Jonathan Fraine
  • John Kirkham
  • Johnnie Gray
  • Julia Signell
  • Justin Dennison
  • M. Farrajota
  • Marco Neumann
  • Mark Harfouche
  • Markus Gonser
  • Martin Durant
  • Matthew Rocklin
  • Matthias Bussonnier
  • Mina Farid
  • Paul Vecchio
  • Prabakaran Kumaresshan
  • Rahul Vaidya
  • Stephan Hoyer
  • Stuart Berg
  • TakaakiFuruse
  • Takahiro Kojima
  • Tom Augspurger
  • Yu Feng
  • Zhenqing Li
  • @milesial
  • @samc0de
  • @slnguyen

The following people contributed to the dask/distributed repository since the 0.19.0release on September 5th:

  • Adam Klein
  • Brett Naul
  • Daniel Farrell
  • Diane Trout
  • Dirk Petersen
  • Eric Ma
  • Jim Crist
  • John Kirkham
  • Gaurav Sheni
  • Guillaume Eynard-Bontemps
  • Loïc Estève
  • Marius van Niekerk
  • Matthew Rocklin
  • Michael Wheeler
  • MikeG
  • NotSqrt
  • Peter Killick
  • Roy Wedge
  • Russ Bubley
  • Stephan Hoyer
  • @tjb900
  • Tom Rochette
  • @fjetter