our DASK ETL Journey

Sephi Berry Sephi Berry
Language: English
video in English
The presentation was given on 2019.06.02 at PyCon Israel 2019.

Using DASK in an ETL pipeline has some gotcha's. Although there are many similarities to pandas there are some issues and best practices that can optimize the usage of DASK in general

The presentation agenda:

  • Intro to Dask framework
  • Basic setup Client
  • Dask.dataframe
  • Data manipulation
  • Read/Write files
  • Advanced groupby
  • Debugging

There is a jupyter notebook (see attachment) to supplement the talk. See also: jupyter notebook of the presentation (163.4 KB)