The most effort-consuming phase in data science projects is data preparation. No standard procedure exists that covers all potential data preparation issues. In this seminar, you will learn how to increase the efficiency of data preparation in order to gain faster insights into your data using data analytics.

From a process point of view, the CRoss-Industry Standard Process for Data Mining (CRISP-DM) describes six major steps for any data analysis project. After having gained Business Understanding, we need to identify and semantically understand the required data (Data Understanding). This requires domain knowledge as well as data engineering and data analysis knowledge. Therefore, Data Understanding is the starting point for Data Ingestion and Data Preparation.

Goal of the Seminar

In this hands-on seminar, you will learn how to prepare the data for your data analysis projects. Based on concrete examples and on our experience from many projects, we will show caveats and possible solutions for Data Preparation. You will learn how to realize data preparation steps using Jupyter Notebooks.

You will get an understanding of the necessity of the Data Preparation phase in CRISP-DM, get to know the methods and tools needed to assess data quality, and learn how to mitigate commonly available issues.

Content of the Seminar

The seminar covers the following topics:
  • Data Preparation
  • Data Quality Assessment and Mitigation Strategies
  • Hands-on with Jupyter Notebooks