Hands on Data Engineering in Palantir foundry - Jyotin Padhi
Duration:4 hours
Batch Type:Weekend
Languages:English, Telugu
Class Type:Online
Course Fee:
Course Content
1️⃣ Introduction to Big Data & PySpark
What is Big Data?
Hadoop ecosystem overview
Spark vs Hadoop MapReduce
Installation & environment setup
Introduction to PySpark architecture
2️⃣ PySpark Core Concepts
RDDs (Resilient Distributed Datasets)
Transformations & actions
Lazy evaluation
RDD persistence & optimization
3️⃣ PySpark DataFrames & SQL
DataFrame creation & operations
Schema definition
Importing CSV, JSON, Parquet, ORC
Spark SQL basics
SQL queries on large datasets
Window functions
4️⃣ Data Processing & ETL with PySpark
Data cleaning
Handling nulls & duplicates
Joins & aggregations
User-defined functions (UDFs)
File formats & partitioning
ETL pipelines with PySpark
5️⃣ Big Data Analytics with PySpark
Exploratory data analysis
Distributed computing principles
Performance optimization techniques
Caching & checkpointing
Cluster management basics
6️⃣ PySpark MLlib (Basics)
Basic ML algorithms with PySpark
Feature engineering in Spark
Pipelines & model evaluation
7️⃣ Real-Time & Batch Processing (Optional Module)
Introduction to Spark Streaming
Structured streaming concepts
Batch processing workflows
8️⃣ Hands-on Projects
ETL pipeline for large datasets
Analytics dashboard-ready dataset creation
Big Data business case implementation
Skills
Big Data Analytics, Pyspark
Tutor

Jyotin Padhi is a dedicated data engineering and analytics tutor with 3 years of hands-on experience in PySpark and Big Data Analytics
0.0 Average Ratings
0 Reviews
3 Years Experience
Rayagada
