Data Engineer

DurationDuration:3 months

Batch TypeBatch Type:Weekend and Weekdays

LanguagesLanguages:English, Hindi

Class TypeClass Type:Online

Class Type Course Fee:Call for fee

Course Content

Advanced SQL for Data Engineering

Ø  Joins (inner, outer, left, right)

Ø  Group By, Aggregations, Subqueries

Ø  Window Functions (ROW_NUMBER, RANK, LEAD/LAG, NTILE, etc.)

Ø  Recursive CTEs

Ø  Pivoting & Unpivoting Data

Ø  Set Operations (UNION, INTERSECT, EXCEPT)

Ø  Indexing & Partitioning

Ø  Query Plans & Optimization (EXPLAIN)

Ø  Materialized Views

Ø  Data Deduplication Techniques

Ø  Handling NULLs, Outliers & Bad Data

Ø  SQL for ETL (merge, upsert, deletes)

 

Advanced Python for Data Engineering

Ø  Data Structures (list, tuple, dict, set)

Ø  List comprehensions, Generators, Iterators

Ø  Functions, Lambdas, Decorators

Ø  Pandas (data wrangling)

Ø  PySpark basics (RDDs, DataFrames introduction)

Ø  SQLAlchemy (DB connections)

Ø  openpyxl / pyodbc (Excel & SQL Server interaction)

Ø  File handling (CSV, JSON, Parquet, Avro, ORC)

Ø  Logging, Exception Handling

Ø  Multi-threading & multi-processing

Ø  APIs (REST, Requests, JSON handling)

Ø  Unit Testing with pytest

Ø  Data Validation with Great Expectations

 

Azure Data Engineering with Databricks & PySpark

Ø  Azure Storage (Blob, ADLS Gen2)

Ø  Azure SQL Database & Synapse Analytics

Ø  Azure Data Factory vs Databricks

Ø  Networking & IAM (RBAC, service principals)

Ø  Databricks Workspace, Clusters, Notebooks

Ø  DBFS & Mounting ADLS

Ø  RDDs vs DataFrames vs Datasets

Ø  Transformations & Actions

Ø  Spark SQL

Ø  UDFs in PySpark

Ø  Reading/Writing (CSV, Parquet, Avro, Delta)

Ø  Partitioning, Bucketing & Optimizations

Ø  Delta Lake Concepts (ACID transactions, Time Travel, Merge)

Ø  Joins, Aggregations at scale

Ø  Catalyst Optimizer

Ø  Tungsten Execution

Ø  Caching, Broadcasting, Skew Handling

Ø  Connecting Databricks with Azure Data Lake (ADLS)

Ø  Databricks with Synapse / SQL Database

Ø  Databricks Jobs & Pipelines

 

 Orchestration with Apache Airflow

Ø  DAGs (Directed Acyclic Graphs)

Ø  Tasks & Operators (PythonOperator, BashOperator, etc.)

Ø  Scheduling & Backfilling

Ø  Triggering PySpark jobs from Airflow

Ø  Using Airflow with Databricks

Ø  XComs (data sharing between tasks)

Ø  Airflow Variables & Connections

Ø  Error Handling & Retries

Ø  Monitoring & Logging

 

Data with Apache Kafka

Ø  Topics, Producers, Consumers

Ø  Partitions & Offsets

Ø  Brokers & Clusters

Ø  Kafka Producers (Python & PySpark)

Ø  Kafka Consumers (Python & PySpark)

Ø  Schema Registry & Avro/JSON/Protobuf

Ø  Structured Streaming with PySpark

Ø  Windowed Aggregations

Ø  Handling late data & watermarks

Skills

Ai and Data Analytics, Azure Data Engineering, Advanced Python for Data Engineering, Advanced SQL for Data Engineering, Data Engineer

Institute

TaskHiveSea Profile Pic
TaskHiveSea

TaskHiveSea is a skilled Machine Learning and Artificial Intelligence trainer with 5 years of experience. He specializes in Machine Learni...

0.0 Average Ratings

0 Reviews

6 Years Experience

Badarpur near metro

Students Rating

0.0

Course Rating

Blogs

Explore All