GoDataFest 2019 Schedule
GoDataFest takes place from Monday, October 28 to Friday, November 1.
Monday, October 28 - Amazon Web Services
Join Amazon Web Services, Binx.io and GoDataDriven for an exciting day jam-packed with the latest and greatest AWS has to offer around data and machine learning.
09:00 Opening Notes
09:15 Machine Learning in the Real World - Guy Kfir
10:00 Revving up with Reinforcement Learning – Ricardo Suerias (AWS) & Diederik Greveling (GoDataDriven) [Handson]
An introduction to AWS DeepRacer and how it will enable you to get started with Reinforcement Learning.
including coffee break at 10:45
11:45 Setting up a Data Hub on AWS - Martijn van Dongen
How to set up a high-volume and scalable data platform on AWS.
13:15 Machine Learning Industrialization - Zhe Sun (AWS Professional Services)
To overcome the challenges of productionizing models from a PoC, AWS Professional Service team introduces a framework to deploy models faster and more manageable. Services such as AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, AWS CloudFormation, Amazon Sagemaker, AWS Step Functions, AWS Lambda, AWS Glue, Amazon DynamoDB are used. We will bring this to life with one or two case studies (by AWS).
14:15 Elastic Kubernetes Service - Thijs Elferink [Handson]
Learn about Amazon Elastic Kubernetes Service (Amazon EKS, a services that makes it easy to deploy, manage, and scale containerized applications using Kubernetes on AWS.
15:15 Formula One Race Insights in Real-Time with Serverless Machine Learning - Luuk Figdor (AWS Professional Services)
F1 pushes limits of both humans and technology. Long gone are the times when people could analyze race data without the use of technology, but now being competitive requires moving beyond past event analysis into live insights and predictions. To satisfy this demand, F1 decided to employ cloud-native technologies using AWS. Machine learning models created in AWS SageMaker and hosted on AWS Lambda allows F1 to pinpoint how a driver is performing, if they are pushing the car over the limit, and how their battle against other drivers will end.
These insights are immediately shared to fans all over the world through television and digital platforms. In this talk we will dive deep into the serverless machine learning architecture used for the application.
Attendees will learn about common pitfalls in serverless machine learning applications and how to overcome them. Lastly, we will walk through various tips and tricks for deploying machine learning models in AWS Lambda that will allow you to rapidly develop and deploy your machine learning application in truly serverless manner.
16:00 Racing with the DeepRacer [Handson]
Doing laps with the DeepRacer with pre-trained models or a model that you trained yourself on the DeepRacer console.
19:00 AWS Meetup
In the evening a Amazon Web Services Meetup takes place, more info and registration for this: https://www.awsug.nl
Tuesday, October 29 - Microsoft Azure
09:00 Opening Notes by Rudy Doornewaard
09:15 Introduction to Microsoft Azure AI - Henk Boelman, senior cloud advocate, Microsoft
10:00 DevOps for AI - Marcel de Vries and Niels Zeilemaker
10:30 Coffee break
11:00 AI and IoT - Tony Krijnen, IoT Technology Strategist
11:45 Customer Story Vattenfall Rens Weijers, manager Data & Strategy
Developing smart applications on Azure
13:30 Workshops - Marian Dragt, AI MVP [Handson]
- Custom Vision AI
- Visual Interface AMLS
Wednesday, October 30 - Databricks
09:00 Opening Notes by Susie Dobing
09:15 What's new at Databricks?
10:00 Quby - Making Homes Efficient and Comfortable using AI, IoT data and the full Databricks stack - Erni Durdevic
Quby is a leading company offering data driven home services technology across European markets, known for creating the in-home display and smart thermostat Toon. In this talk Erni will take you on a tour of how Quby leverages the full Databricks stack to quickly prototype, validate, scale and launch data science products. We will explore the technical workflow of a Data Science project from end to end. Starting from developing a notebook prototype and tracking the Machine Learning Model performance with ML Flow, we move towards production-grade Databricks jobs with a CI/CD pipeline, debugging production code with Databricks Connect, and finally setting up a monitoring system for the jobs.
10:40 Coffee break
11:10 Wehkamp - Applied Machine Learning for Ranking Products in an Ecommerce Setting - Jerry Vos & Arnoud de Munnik
As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. In this talk, we are going to demonstrate how we use Spark to build up the whole pipeline of ranking products and the challenges we faced along the way.
13:00 Data pipelines with ML Flow - Mike Mengarelli
14:00 Technical Breakout Sessions
ML: Parallelising Machine Learning: Spark and Deep Learning - Matt Thompson
In this session, we will discuss the different options for parallelising machine learning. We will firstly consider methods like Spark ML and Pandas_UDFs and then we will talk about the different steps to scaling Deep Learning models.
DE: Data engineering with Delta Lake - Mike Mengarelli [Handson]
Data lakes still face reliability challenges when it comes to building production data pipelines at scale to power these initiatives.
The open Source project, Delta Lake is a storage layer that brings reliability to data lakes through features including ACID transactions, scalable metadata handling, to unify streaming and batch data processing whilst running on top your existing data lake in stores such as Azure Data Lake Storage, AWS S3, Hadoop HDFS, or on-premise whilst remaining fully compatible with Apache Spark APIs.
In this session we’ll review the architecture of Delta and an end to end scenario to get a depth understanding of Delta and its application. For those that wish to follow along, you’ll need to bring your laptop and have setup community edition here: https://databricks.com/try (you’ll just need an email address to sign up for community edition).
19:00 Data Council Meetup - with Tim Hunter
Koalas: pandas APIs on Apache Spark
In this talk, Tim will present Koalas, a new open source project that was announced at the Spark + AI Summit in April. Koalas is a Python package that implements the pandas API on top of Apache Spark, to make the pandas API scalable to big data. Using Koalas, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework.
Tim will demonstrate Koalas' new functionalities since its initial release, discuss its roadmaps, and how he envisions Koalas could become the standard API for large scale data science.
Thursday, October 31 - Google
09:30 Opening Notes
09:45 Cloud Data Fusion: Data Integration at Google Cloud- Rokesh Jankie, Customer Engineer Google Cloud
10:45 Coffee break
11:15 Customer Stories:
- Mollie - From cloudy to the cloud: Mollie's data transformation
- WEBB traders - Increasing profitability with back testing in the cloud
- Nico Lab - How StrokeView, an AI-powered clinical decision support system, offers a complete assessment of relevant imaging biomarkers for faster and more accurate treatment decisions in stroke, where every second counts.
13:30 Democratizing Machine Learning with BigQuery ML - Abishay Rao
A demo-driven session with BigQuery ML, showing how BQML democratizes ML and can be incredibly useful in real-world scenarios.
14:30 Coffee break
14:45 The Future of BI is a Data Platform - Sebastien Fabri
An introduction to Looker, part of Google Cloud, by Sebastien Fabri. Looker delivers insights to user workflows, allowing organizations to extract value from their data
15:45 The What and Why of Serverless – a talk by O'Reilly author Wietse Venema about serverless on GCP.
16:45 Qwiklabs and networking [Handson]
18:30 Google Developers Group Cloud Meetup - Practical ML on GCP More info
Friday, November 1 - Open-Source
The final day of GoDataFest is all about the latest open-source technology.
09:30 Taking Machine Learning Models into Production - Julian de Ruiter [Handson]
In this interactive session we will look at the technical challenges associated with getting data science products into production. So, our target audience for this breakfast are people with a technical role, like data scientist, data engineer or machine learning engineer.
In detail, we will look into the steps needed to move a simple data science model from a Jupyter Notebook into a more production-ready Python package. Besides this, we will also explore how to expose this model in a simple web API and how to deploy this API in a containerized environment using Docker.
11:45 Implementing JPEG in Python - Cor Zuurmond
Hundreds of millions of pictures are taken everyday. These pictures are stored on mobile devices, computers and servers. How are these pictures stored efficiently? To answer this question I will explain how the JPEG algorithm efficiently compresses images with minimum information loss.
In this talk, Cor will explain the different compression steps, and more importantly why these steps are needed. After this talk, you have an understanding of the JPEG algorithm. An understanding which will be useful to have whenever you work with images!
13:30 Exact: Bookkeeping gets a lot more exciting when data science joins the field - Marichelle Gietema & Levon Goceryan
In this talk, we will show you how we managed to uniformize almost 400.000 unique bookkeeping administrations into a fully anonymized, real-time dashboard of the Dutch small and medium-size enterprise economy. Showing key performance indicators and allowing entrepreneurs to benchmark themselves against the averages of their sectors.
Bookkeeping might not sound like the most vibrant field to work in as a data scientist. However, things get a lot more interesting when you realize that as a company we serve 20% of the small and medium-sized enterprises in the Netherlands. Collectively these data form a data lake that is updated in real time and is rich in economic information.
14:15 Schiphol Takeoff: Your Runway to the Cloud
Developing and deploying data products often involves several steps, especially when using cloud infrastructure. Tying this all together can take a lot of time, and can also result in a lot of different ways to achieve the same goal throughout an organisation. At Schiphol, we've built a library, dubbed Takeoff, (which was recently open-sourced) that we use for deployment of data products to the Schiphol Data Hub (SDH) on Microsoft Azure, although Takeoff itself is cloud-agnostic.
Takeoff allows us to abstract away the details of how to interact with the various components and systems needed to put a full-fledged data product live. From deploying Spark jobs on Databricks, publishing custom packages to pypi, setting up Azure Eventhub consumers, deploying API's on Kubernetes, and much more. Takeoff itself is written in Python, and we'll give you a glimpse into how we built it.
15:00 Screening of the Dataiku Data Science Pioneers Documentary
With humor and humanity, DATA SCIENCE PIONEERS presents a documentary about passionate data scientists driving us towards technological revolution.
16:00 GoDataFest Party - Drinks, fun and all around joy