Enrollment closed

Data Engineering Course

Schedule:

Duration of lecture:
3 hours
Duration of education (Mon,Wed,Fri):
3,5 months
Duration of paid internship (5 days/week, 8 hours/day):
2,5 months

Course description

The Data Engineering course is designed for those interested in a career in data. This course will teach the fundamentals of data engineering related to building data pipelines using Python, SQL, NoSQL, and other modern technologies.

You will get acquainted with the Data Engineering ecosystem, data integration pipelines, and data repositories within the course. You will study relational and non-relational databases, learn what data warehouses are, and also be able to store and process large amounts of information.

As part of the course, you will be able to:

  • Understand what data engineering is and what a modern data ecosystem looks like.
  • Apply into practice the basics of Python programming and various ways of processing data using basic constructs, collections, Pandas, etc.
  • Learn databases, including database design, creating tables, using, and working with constraints.

Projects

Throughout the education process, students will work with projects that cover the material studied during the course, which allows them to consolidate the knowledge gained in practice. The final stage of the Data Engineering course - final project development. You will be able to comprehensively collect all the knowledge gained during the course and apply it in practice using real cases and working with a real team. All cases include solving a real problem using the knowledge and experience you have acquired during the training. In addition, the final project will allow your mentors and colleagues to validate and approve your knowledge.

Evaluation

During the educational process, students will complete homework and projects. Based on this, experts will assess the student's practical skills and knowledge, as well as the level of individual work. The results of the final project will be evaluated by mentors, they will conduct a comprehensive assessment of approaches and artifacts.

EXPERTS

Maksym Voitko
Maksym Voitko
Data Engineer Team Lead
Dmytro Kulyk
Dmytro Kulyk
DWH Architect
Vladyslav Suprunov
Vladyslav Suprunov
Python Team Lead
Serhii Dimchenko
Serhii Dimchenko
Lead Data Analyst
Oleksandr Tonkonozhenko
Oleksandr Tonkonozhenko
Head of Engineering

Course program

1.
Git

Git

  • Git Basic workflow
  • Best Practices
  • Code review methodology

Python

  • Python intro 
  • Data-oriented techniques 
  • Pandas, NumPy 
  • OOP
  • Asynchronous

Linux intro

Docker

  • Docker common workflows. Docker registries 
  • Building images with Dockerfile 
  • Local SDLC with Docker-compose 
  • Docker application in CI/CD

Data Engineering

  • Data Sources. Formats. Models. Storage Engines and Processing 
  • SQL, NoSQL
  • Columnar DB 
  • MapReduce, Kafka 
  • Data transformation techniques
  • Data warehousing

Airflow

  • Airflow basic concepts 
  • ETL with Airflow Workshop

CI/CD with GitLab

  • CI/CD Problematic 
  • Software Development Lifecycle 
  • CI/CD Pipelines with GitLab CI 
  • .gitlab-ci.yml directivies 
  • Runners infrastructure

Terraform

  • Deploying Infrastructure with Terraform 
  • Terraform Provisioners 
  • Modules & Workspaces 
  • Remote State Management

Kubernetes

  • Orchestration problematic 
  • Kubernetes platform architecture 
  • K8s abstractions and objects

Data Analysis

  • Statistic 101
  • Visualization types. Tools overview: Tableau, Superset, Excel, Python/R; Grafana

Data-driven/product mindset

  • MVP, hypothesis, prioritization; 
  • Metrics; 
  • A/B tests
Enrollment closed
Fill in the form
exit