Data Engineer
Duration:3 months
Batch Type:Weekend and Weekdays
Languages:English, Hindi
Class Type:Online
Course Fee:Call for fee
Course Content
Advanced SQL for Data Engineering
Ø Joins (inner, outer, left, right)
Ø Group By, Aggregations, Subqueries
Ø Window Functions (ROW_NUMBER, RANK, LEAD/LAG, NTILE, etc.)
Ø Recursive CTEs
Ø Pivoting & Unpivoting Data
Ø Set Operations (UNION, INTERSECT, EXCEPT)
Ø Indexing & Partitioning
Ø Query Plans & Optimization (EXPLAIN)
Ø Materialized Views
Ø Data Deduplication Techniques
Ø Handling NULLs, Outliers & Bad Data
Ø SQL for ETL (merge, upsert, deletes)
Advanced Python for Data Engineering
Ø Data Structures (list, tuple, dict, set)
Ø List comprehensions, Generators, Iterators
Ø Functions, Lambdas, Decorators
Ø Pandas (data wrangling)
Ø PySpark basics (RDDs, DataFrames introduction)
Ø SQLAlchemy (DB connections)
Ø openpyxl / pyodbc (Excel & SQL Server interaction)
Ø File handling (CSV, JSON, Parquet, Avro, ORC)
Ø Logging, Exception Handling
Ø Multi-threading & multi-processing
Ø APIs (REST, Requests, JSON handling)
Ø Unit Testing with pytest
Ø Data Validation with Great Expectations
Azure Data Engineering with Databricks & PySpark
Ø Azure Storage (Blob, ADLS Gen2)
Ø Azure SQL Database & Synapse Analytics
Ø Azure Data Factory vs Databricks
Ø Networking & IAM (RBAC, service principals)
Ø Databricks Workspace, Clusters, Notebooks
Ø DBFS & Mounting ADLS
Ø RDDs vs DataFrames vs Datasets
Ø Transformations & Actions
Ø Spark SQL
Ø UDFs in PySpark
Ø Reading/Writing (CSV, Parquet, Avro, Delta)
Ø Partitioning, Bucketing & Optimizations
Ø Delta Lake Concepts (ACID transactions, Time Travel, Merge)
Ø Joins, Aggregations at scale
Ø Catalyst Optimizer
Ø Tungsten Execution
Ø Caching, Broadcasting, Skew Handling
Ø Connecting Databricks with Azure Data Lake (ADLS)
Ø Databricks with Synapse / SQL Database
Ø Databricks Jobs & Pipelines
Orchestration with Apache Airflow
Ø DAGs (Directed Acyclic Graphs)
Ø Tasks & Operators (PythonOperator, BashOperator, etc.)
Ø Scheduling & Backfilling
Ø Triggering PySpark jobs from Airflow
Ø Using Airflow with Databricks
Ø XComs (data sharing between tasks)
Ø Airflow Variables & Connections
Ø Error Handling & Retries
Ø Monitoring & Logging
Data with Apache Kafka
Ø Topics, Producers, Consumers
Ø Partitions & Offsets
Ø Brokers & Clusters
Ø Kafka Producers (Python & PySpark)
Ø Kafka Consumers (Python & PySpark)
Ø Schema Registry & Avro/JSON/Protobuf
Ø Structured Streaming with PySpark
Ø Windowed Aggregations
Ø Handling late data & watermarks
Skills
Ai and Data Analytics, Azure Data Engineering, Advanced Python for Data Engineering, Advanced SQL for Data Engineering, Data Engineer
Institute

TaskHiveSea is a skilled Machine Learning and Artificial Intelligence trainer with 5 years of experience. He specializes in Machine Learni...
0.0 Average Ratings
0 Reviews
6 Years Experience
Badarpur near metro



