Course Outline:
Day 1
- Data Science
- Data Science Process Overview
- Structured and Unstructured Data
- Data Acquisition and Transformation
- Data Analysis and Machine Learning
- Machine Learning Concepts
Day 2
- Big Data overview
- A brief history of Big Data
- History and background of Big Data and Hadoop
- 5 V’s of Big Data
- Secret Sauce of Big Data Hadoop
- Big Data Distributions in Industry
- End-to-End Big Data Life cycle overview
- Demos and Labs
- Big Data Ecosystem before Spark
- Big Data Ecosystem before Apache Spark
- Storage options – HDFS and No-SQL
- Processing options – MapReduce, Hive etc.
- Administrative tools – Zookeeper, Ozzie etc.
- Ingestion tools – Sqoop, Flume
- Demos and Labs
Day 3
- Getting Started with Apache Spark
- Introduction to Spark RDD
- Spark RDD Transformation and Actions
- Spark Lifecycle
- Spark Caching
- Setup Account on Apache Spark Databricks Cloud
- Databricks Notebooks overview
- Lab – Spark RDD Transformation & Actions
- Lab – Spark RDD Advanced Transformation & Actions
- Demos and Labs
- Apache Spark SQL, DataFrames, Datasets
- Introduction to Spark SQL
- SQL, DataFrames and Datasets Spark Library
- Compare the various APIs – RDD, DataFrames and Datasets
- Demos and Labs
- Machine Learning using Apache Spark
- Introduction to Machine Learning and Data Science
- Machine Learning Spark Library
- Spark Machine Learning examples
- Demos and Labs
- Streaming using Apache Spark
- Need of real time processing
- Streaming Spark Library
- Spark Streaming examples
- Demos and Labs
Training material provided:
Yes (Digital format)