Professional Development

Main page content

Data Science Training

 

 

The purpose of this training sequence is to provide a solid working background in data science coupled with practical, hands-on experience. Each session consist of three classes, which are four hours in duration.  The complete sequence should ideally be held over a three-week period, one session per week. Each session builds upon previous sessions and attendees are expected to have a working knowledge of the content of previous sessions.  The goal of this training sequence is to quickly ramp attendees on effectively using the tools and techniques of data science in their current positions. 

 

Each session focuses on specific aspects of data science.  The first session starts by introducing data science notebooks, setting up and executing simple analysis pipelines, and refreshing attendees on the practical use of statistical methods.  In the second session, we focus on data extraction and feature engineering using various data sets before introducing and using several supervised learning techniques including classification, regression, and time series analysis.  The third session covers additional supervised techniques and then follows up with an introduction to unsupervised techniques to find unknown patterns within unexplored data sets. The third session concludes with a brief introduction to employing the techniques discussed on extremely large data sets.  The fourth session will introduce neural networks enabling attendees to put build and train recognition systems using Deep Learning. 

 

Ideally each four-hour class would be held in a single week.  These class sessions should occur on Mondays, Wednesdays, and Fridays to enable time for attendees to do necessary “homework” between class sessions. Each four-hour class consists of a combination of short introductory lectures, to provide a solid understanding of the fundamentals, coupled with hands-on, in-class exercises to drive home how to use introduced techniques.  Upon completion of all three-class sessions, attendees should understand how to approach new data analysis challenges as well as have the confidence to use standard data science and machine learning tools and frameworks necessary to build and apply advanced machine learning pipelines. 

 

 

 

 

 

 Introduction to Data Science – Training Schedule

 

Session Number

 

Day

Topic

 

Agenda

I

1

Introduction & Data Science Notebooks

  • Overview of modeling, analysis techniques, and machine learning pipelines.
  • Language and library comparison overview – Python vs. R
  • Getting started with data exploration and visualization during modeling process using Jupyter notebooks.
  • Linear Regression Pipeline example using Notebooks

2

Probability & Statistics

  • Introduction/Refresher to Probability
  • Statistical Measures
  • Practical examples using R - Probability distributions
  • Simulations and Hypothesis Testing

3

Fundamentals of Machine Learning

  • Introduction to Machine Learning
  • Overview of types of learning: Supervised, Unsupervised, Reinforcement, Transfer
  • Types of data
  • Example Applications of ML
  • How to evaluate ML models
  • Sample exercises using example data examples.

II

1

Data Extraction & Analysis

  • What makes a good dataset
  • Data pre-processing and transformation
  • Feature engineering and dimensionality reduction  (PCA)
  • Essential concepts of Linear Algebra

2

Supervised Learning Techniques

  • Classification Techniques.  Decision Trees and Ensemble Techniques
  • Regression examples
  • Evaluation and comparison of models
  • Generalization without over fitting - regularization

3

Supervised Learning Techniques

  • Additional classification techniques
  • Additional regression techniques
  • Notebook examples using NAVSEA datasets

 

III

1

Supervised Techniques & Time Series Analysis

  • Cover additional supervised techniques 
  • Characterize impact of feature engineering on model performance
  • Time series data analysis, ARIMA, smoothing. 

2

Unsupervised Learning Techniques

  • Techniques to find and characterize unknown patterns within data sets
  • Clustering, Kmeans, K nearest neighbor, hill climbing
  • Sample exercises using real data sets

3

Data Science at Scale

  • Practical Considerations for analyzing, building, and evaluating models using large data sets
  • Intro to Scalable Machine Learning Frameworks

Session Number

Day

Topic

Agenda

 

 

 

 

IV

 

 

1

Advanced Learning Techniques

  • Developing and using Deep Neural Networks
  • Different types of Neural Networks
  • Introduction to TensorFlow

 

2

Deep Learning Techniques

  • Building Neural Networks using Keras/TensorFlow
  • Build, Train, and Evaluate Examples using Keras/TensorFlow

3

Deep Learning Techniques

  • Transform and Evaluate Neural Networks to model NAVSEA data sets
  • Compare alternative techniques and the impact on recognition accuracy
 
For additional information on scheduling and other details, contact Dr. Matthew Tolentino (metolent@uw.edu).