Apache Spark (http://spark.apache.org) is currently the fastest growing project in Big Data environment. It allows processing Big Data sets faster and easier than in the existing solutions. This workshop will jump-start you into working with Spark and help in transition from analyst or developer to Big Data engineer.
Introduction to Big Data
Definition
What is Big Data?
History of Big Data
Big Data problems
Apache Spark
Introduction
History
Spark vs Hadoop
Resilient Distributed Datasets (RDDs)
Architecture
Operation variants
Administration
Spark Core
Introduction
Java vs Spark vs Python
Connecting to cluster
Dataset distribution
RDD operations
Shared variables
Execution and testing
Spark SQL
Introduction
Spark SQL vs Hive
Basic operation
Data and schema
Queries
Hive integration
Execution and testing