Turkish University Department Data (2019-2024)

Python Pandas CSV ETL Data Cleaning Normalization
Project Information
Category
Dataset
Date 2025-09

Details and Documentation

Comprehensive Documentation

The detailed usage guide, API reference, and development notes for this project are available in the README file on GitHub.

About the Project

Turkish University Department Data (2019-2024)

A cleaned, standardized dataset collected from YOK Atlas and OSYM.

Goal (Short)

This dataset brings together university department data from 2019-2024 into a single pool and cleans it to unify naming and spelling differences into a consistent structure. This makes research, application development, and quick table-based analysis far easier. For 2025, only program lists are available for now; score and other statistical fields are intentionally left empty. The dataset is published on both GitHub and Kaggle, and I also provide a detailed analysis flow with notebooks on the Kaggle page.

Summary Stats

  • 128,352 rows / 32,505 programs (program_code)
  • 235 universities / 733 department names
  • Data is used on sinavizcisi.com

Kaggle Performance

The dataset received strong feedback on Kaggle and was used as a reference dataset in other projects, which was very meaningful for me. In total, it reached 3,500+ views and 600+ downloads.

Kaggle Notebooks

Technical Details

Data Model (Brief)

  • Normalized core: departments_normalized.csv, department_stats.csv
  • Lookup/bridge: department_names, faculty_names, score_types, universities_normalized, department_tags, etc.
  • Fast EDA: data/all_in_one_denormalized.csv

ETL Steps

  • remove_2025_from_departments.py → filter out 2025
  • process_raw_data.py → normalized tables
  • build_all_in_one_denormalized.py → single table

Example Usage

import pandas as pd

# Quick filtering with the denormalized file
eda = pd.read_csv('data/all_in_one_denormalized.csv')
q = (
    (eda['year'] == 2024) &
    (eda['city'] == 'ISTANBUL') &
    (eda['university_type'] == 'vakif') &
    (eda['department_name'] == 'Computer Engineering')
)
print(eda.loc[q, ['university_name','scholarship_type','total_quota','total_enrolled']])

Other Projects

Take a look at the other projects I built

Sinavizcisi

A web platform where I analyze YKS placement data and university reviews with AI and present them to students.

Django PostgreSQL Transformers

YokAPI

A data layer that normalizes YOK Atlas data and serves it through a single API. It standardizes YOK Atlas's scattered, …

Python Requests Aiohttp

EBA Score Bot

A desktop bot with a GUI that automates earning points on EBA.

Python Selenium PyQt5