Create Turkish Voter Profile Database With Web Scraping

Published: May 1, 2024

Create Turkish Voter Profile Database With Web Scraping

Some of you may know that there were local elections on 31 March 2024 in Turkey. These elections caused repurcussions around the world as 22-year AKP regime got its first defeat in an election. Elections' surprise winner was CHP and they now govern cities which make up 60% of the population and 80% of the economic output of the country.

So how did this happen?

Background

If you watch Turkish TVs, pundits blame this defeat on the unsatisified retired population of the country due to economic reasons.

Another common perception in Turkey is that country's youth support (at 70% levels!) CHP policies, while AKP lacks support from the youth. So it is said that CHP's support is only going to increase going forward.

Pundits are known to make false predictions and give incorrect information. As a data scientist, it is my pleasure and duty to check whether these claims are true using data.

Data

To do such analysis, we need age profile of voters and how they vote. People usually conduct costly surveys for this purpose. However, we have a nice publicly available data that we can use.

Turkish Statistical Institute

Turkish Statistical Institute (TUIK) publishes the election results data as well as the voter age/ sex/ education profile at a district level. The 81 provinces of Turkey are divided into 973 districts.

For our purpose, we only need to look at age data, as we can represent youth as voters below the age ~30 and retired people as those above the age of 65, as the age of retirement is 65 in Turkey.

We can get this data by looking at TUIK's election database. The latest available data is unfortunately 2018 database. A lot has changed since 2018 but this data can give us a glimpse of macro voter trends.

Elections

The issue with this database is that, we need to create individual report for each district. District level report looks like below:

Election Results

Considering that there are 973 districts, this is both boring and tiring, and this is where scraping comes into picture.

Scraping the data

Considering that there are 973 districts, this is both boring and tiring, and this is where scraping comes into picture.

Please take a look at this GitHub repo for all the code needed to scrape the data, collate it into a dataframe, and the resulting CSV file.

Overall, the idea is to loop through all districts by switching between cities and districts using Selenium and clicking on "Download the report". Once the report is downloaded, we move to the next district. When all districts of a city are exhausted, we move to the next city. This way, we loop through 81 cities and 937 districts of Turkey and download a report for each district.

from selenium import webdriver
import chromedriver_autoinstaller
from selenium.webdriver.common.by import By
import time
chromedriver_autoinstaller.install()

###### Start webdriver
driver = webdriver.Chrome()
time.sleep(1)

###### Go to the main site
driver.get("https://biruni.tuik.gov.tr/secimdagitimapp/secimsecmen.zul")
time.sleep(1)

###### Select radio buttons for the desired areas
for span in driver.find_elements(By.CSS_SELECTOR, ".grid-od span"):
    if span.text == "2018":
        span.find_element(By.CSS_SELECTOR, "input").click()
        time.sleep(1)


for span in driver.find_elements(By.CSS_SELECTOR, ".grid-od span"):
    if span.text == "Yurt içi seçmen":
        span.find_element(By.CSS_SELECTOR, "input").click()
        time.sleep(1)

for span in driver.find_elements(By.CSS_SELECTOR, ".grid-od span"):
    if span.text == "Yaş grubu":
        span.find_element(By.CSS_SELECTOR, "input").click()
        time.sleep(1)

for span in driver.find_elements(By.CSS_SELECTOR, ".grid .gc span"):
    if span.text == "İBBS-Düzey4 (İlçe)":
        span.find_element(By.CSS_SELECTOR, "input").click()
        time.sleep(1)

###### Identify the cities in city (There should be 81)
tables = driver.find_elements(By.CSS_SELECTOR, ".listbox-btable")

for table in tables:
    if "Adana" in table.text:
        cities = table.find_elements(By.CSS_SELECTOR, "td")
        break

city_names = [city.text for city in cities]

###### Save data per district
for city in cities:
    city.click()
    time.sleep(2)

    tables = driver.find_elements(By.CSS_SELECTOR, ".listbox-btable")

    for table in tables:
        if "Tüm İlçeler" in table.text:
            districts = table.find_elements(By.CSS_SELECTOR, "td")
            break

    district_names = [district.text for district in districts]
    districts = [district for district in districts if "Tüm İlçeler" not in district.text]
    
    for district in districts:
        district.click()
        time.sleep(1)
    
        # Set Excel
        excel =  driver.find_element(By.CSS_SELECTOR, 'span[title="EXCEL"]').find_element(By.CSS_SELECTOR, 'input').click()
        time.sleep(1)
        
        ## Save data
        save =  driver.find_element(By.CSS_SELECTOR, 'input[value="Raporu Oluştur"]').click()
        
        ## Log, Sleep
        print(city.text, district.text)
        time.sleep(1)

Aggregating the data

Once the data is collected, we will have 937 HTML files waiting to be processed. Using below code, we clean up and aggregate the data into one big CSV file:

import pandas as pd
import os
from bs4 import BeautifulSoup
import warnings
warnings.simplefilter(action='ignore')

folder = "<Address/with/downloaded/files/from/TUIK>"

paths = [f for f in os.listdir(folder) if f.endswith("xls")]
dfs = []

for path in paths:
    with open(f"{folder}/{path}", "r", encoding="iso-8859-9") as f:
        text = f.read()
        soup = BeautifulSoup(text, "html")
        tables = pd.read_html(str(soup.findAll("table")[1]))
        city = tables[0].iloc[0, 0]
        df = tables[2]
        df.drop(1, axis=1, inplace=True)
        district = df.iloc[0, 0]
        df.columns = ["age", "male", "female", "all"]
        df["city"] = city
        df["district"] = district
        df = df[1:]
        for col in df.columns:
            df[col] = df[col].str.replace(".", "", regex=False)
        df = df[df["age"] != "Toplam"]
        df["male"] = pd.to_numeric(df["male"])
        df["female"] = pd.to_numeric(df["female"])
        df["all"] = pd.to_numeric(df["all"])

        dfs.append(df)
        
df = pd.concat(dfs, axis=0, ignore_index=True)
df.to_csv("data.csv", index=False)

Result

The resulting file looks like below with the voter age data at the district level for 2018 general elections in Turkey: Data Scraping Results

Conclusion

With this data, we can now run models like regression and see how the votre age profile is influincing the voting results. Each district will be treated as a data point, and with 937 data points, we should be able to get statistically significant results. But this is for another article.

Hope this data is used by many people to develope useful models for Turkish elections.

Check out the GitHub repo for the code and the data.

Happy hacking!

Leave comment

Comments

Check out other blog posts

Create A Simple and Dynamic Tooltip With Svelte and JavaScript

2024/06/19

Create A Simple and Dynamic Tooltip With Svelte and JavaScript

JavaScriptSvelteSimpleDynamicTooltipFront-end
Create an Interactive Map of Tokyo with JavaScript

2024/06/17

Create an Interactive Map of Tokyo with JavaScript

SvelteSVGJavaScriptTailwindInteractive MapTokyoJapanTokyo Metropolitan Area23 Wards
How to Easily Fix Japanese Character Issue in Matplotlib

2024/06/14

How to Easily Fix Japanese Character Issue in Matplotlib

MatplotlibGraphChartPythonJapanese charactersIssueBug
Book Review | Talking to Strangers: What We Should Know about the People We Don't Know by Malcolm Gladwell

2024/06/13

Book Review | Talking to Strangers: What We Should Know about the People We Don't Know by Malcolm Gladwell

Book ReviewTalking to StrangersWhat We Should Know about the People We Don't KnowMalcolm Gladwell
Most Commonly Used 3,000 Kanjis in Japanese

2024/06/07

Most Commonly Used 3,000 Kanjis in Japanese

Most CommonKanji3000ListUsage FrequencyJapaneseJLPTLanguageStudyingWordsKanji ImportanceWord Prevalence
Replace With Regex Using VSCode

2024/06/07

Replace With Regex Using VSCode

VSCodeRegexFindReplaceConditional Replace
Do Not Use Readable Store in Svelte

2024/06/06

Do Not Use Readable Store in Svelte

SvelteReadableWritableState ManagementStoreSpeedMemoryFile Size
Increase Website Load Speed by Compressing Data with Gzip and Pako

2024/06/05

Increase Website Load Speed by Compressing Data with Gzip and Pako

GzipCompressionPakoWebsite Load SpeedSvelteKit
Find the Word the Mouse is Pointing to on a Webpage with JavaScript

2024/05/31

Find the Word the Mouse is Pointing to on a Webpage with JavaScript

JavascriptMousePointerHoverWeb Development
Create an Interactive Map with Svelte using SVG

2024/05/29

Create an Interactive Map with Svelte using SVG

SvelteSVGInteractive MapFront-end
Book Review | Originals: How Non-Conformists Move the World by Adam Grant & Sheryl Sandberg

2024/05/28

Book Review | Originals: How Non-Conformists Move the World by Adam Grant & Sheryl Sandberg

Book ReviewOriginalsAdam Grant & Sheryl SandbergHow Non-Conformists Move the World
How to Algorithmically Solve Sudoku Using Javascript

2024/05/27

How to Algorithmically Solve Sudoku Using Javascript

Solve SudokuAlgorithmJavaScriptProgramming
How I Increased Traffic to my Website by 10x in a Month

2024/05/26

How I Increased Traffic to my Website by 10x in a Month

Increase Website TrafficClicksImpressionsGoogle Search Console
Life is Like Cycling

2024/05/24

Life is Like Cycling

CyclingLifePhilosophySuccess
Generate a Complete Sudoku Grid with Backtracking Algorithm in JavaScript

2024/05/19

Generate a Complete Sudoku Grid with Backtracking Algorithm in JavaScript

SudokuComplete GridBacktracking AlgorithmJavaScript
Why Tailwind is Amazing and How It Makes Web Dev a Breeze

2024/05/16

Why Tailwind is Amazing and How It Makes Web Dev a Breeze

TailwindAmazingFront-endWeb Development
Generate Sitemap Automatically with Git Hooks Using Python

2024/05/15

Generate Sitemap Automatically with Git Hooks Using Python

Git HooksPythonSitemapSvelteKit
Book Review | Range: Why Generalists Triumph in a Specialized World by David Epstein

2024/05/14

Book Review | Range: Why Generalists Triumph in a Specialized World by David Epstein

Book ReviewRangeDavid EpsteinWhy Generalists Triumph in a Specialized World
What is Svelte and SvelteKit?

2024/05/13

What is Svelte and SvelteKit?

SvelteSvelteKitFront-endVite
Internationalization with SvelteKit (Multiple Language Support)

2024/05/12

Internationalization with SvelteKit (Multiple Language Support)

InternationalizationI18NSvelteKitLanguage Support
Reduce Svelte Deploy Time With Caching

2024/05/11

Reduce Svelte Deploy Time With Caching

SvelteEnhanced ImageCachingDeploy Time
Lazy Load Content With Svelte and Intersection Oberver

2024/05/10

Lazy Load Content With Svelte and Intersection Oberver

Lazy LoadingWebsite Speed OptimizationSvelteIntersection Observer
Find the Optimal Stock Portfolio with a Genetic Algorithm

2024/05/10

Find the Optimal Stock Portfolio with a Genetic Algorithm

Stock marketPortfolio OptimizationGenetic AlgorithmPython
Convert ShapeFile To SVG With Python

2024/05/09

Convert ShapeFile To SVG With Python

ShapeFileSVGPythonGeoJSON
Reactivity In Svelte: Variables, Binding, and Key Function

2024/05/08

Reactivity In Svelte: Variables, Binding, and Key Function

SvelteReactivityBindingKey Function
Book Review | The Art Of War by Sun Tzu

2024/05/07

Book Review | The Art Of War by Sun Tzu

Book ReviewThe Art Of WarSun TzuThomas Cleary
Specialists Are Dead. Long Live Generalists!

2024/05/06

Specialists Are Dead. Long Live Generalists!

SpecialistGeneralistParadigm ShiftSoftware Engineering
Analyze Voter Behavior in Turkish Elections with Python

2024/05/03

Analyze Voter Behavior in Turkish Elections with Python

TurkeyAge Analysis2018 ElectionsVoter Behavior
Make Infinite Scroll With Svelte and Tailwind

2024/04/30

Make Infinite Scroll With Svelte and Tailwind

SvelteTailwindInfinite ScrollFront-end
How I Reached Japanese Proficiency In Under A Year

2024/04/29

How I Reached Japanese Proficiency In Under A Year

JapaneseProficiencyJLPTBusiness
Use-ready Website Template With Svelte and Tailwind

2024/04/25

Use-ready Website Template With Svelte and Tailwind

Website TemplateFront-endSvelteTailwind
Lazy Engineers Make Lousy Products

2024/01/29

Lazy Engineers Make Lousy Products

Lazy engineerLousy productStarbucksSBI
On Greatness

2024/01/28

On Greatness

GreatnessMeaning of lifeSatisfactory lifePurpose
Converting PDF to PNG on a MacBook

2024/01/28

Converting PDF to PNG on a MacBook

PDFPNGMacBookAutomator
Recapping 2023: Compilation of 24 books read

2023/12/31

Recapping 2023: Compilation of 24 books read

BooksReading2023Reflections
Create a Photo Collage with Python PIL

2023/12/30

Create a Photo Collage with Python PIL

PythonPILImage ProcessingCollage
Detect Device & Browser of Visitors to Your Website

2024/01/09

Detect Device & Browser of Visitors to Your Website

JavascriptDevice DetectionBrowser DetectionWebsite Analytics
Anatomy of a ChatGPT Response

2024/01/19

Anatomy of a ChatGPT Response

ChatGPTLarge Language ModelMachine LearningGenerative AI