Kalvium Data Analyst task
This repository contains a data analysis of the recently concluded Lok Sabha election results. The analysis includes scraping the election results data from the Election Commission of India's website, processing the data, and generating insights.
- Introduction
- Data Scraping
- Installation
- Usage
- Insights
- Insight 1: Vote Share by Party
- Insight 2: Seat Change Compared to Previous Election
- Insight 3: Voter Turnout
- Insight 4: Winning Margins
- Insight 5: Performance in Key States
- Insight 6: Gender Representation
- Insight 7: First-time Winners
- Insight 8: Incumbent Re-elections
- Insight 9: Closest Contests
- Insight 10: Historical Comparison
- Sources
- Methods
- Key Findings
- Visualization
This project aims to provide a comprehensive analysis of the 2024 Lok Sabha election results. The data is scraped from the Election Commission of India's official website and various insights are derived from the processed data.
The data is scraped from the Election Commission of India's results page using BeautifulSoup. If the data is loaded dynamically, Selenium is used to ensure all data is captured correctly.
-
Clone the repository:
git clone https://github.com/yourusername/lok-sabha-election-analysis.git cd lok-sabha-election-analysis
-
Install the required packages:
pip install -r requirements.txt
-
If using Selenium, ensure you have the appropriate WebDriver for your browser:
To scrape the election results data and save it to a CSV file, run:
python scrape_election_results.py
### scrape_election_results.py
Here's the complete Python script for scraping the election results and saving them to a CSV file:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
def scrape_with_bs4():
url = "https://results.eci.gov.in/pcresults.htm" # Update with the correct URL
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = soup.find("table", {"id": "datatable"})
if table is None:
print("Table not found. Here is the HTML:")
print(soup.prettify())
else:
data = []
headers = [th.text.strip() for th in table.find_all("th")]
data.append(headers)
for row in table.find_all("tr")[1:]:
cols = row.find_all("td")
cols = [ele.text.strip() for ele in cols]
data.append(cols)
df = pd.DataFrame(data[1:], columns=data[0])
df.to_csv("election_results.csv", index=False)
print("Data saved to election_results.csv")
def scrape_with_selenium():
driver_path = 'path/to/chromedriver' # Change to the path where your chromedriver is located
driver = webdriver.Chrome(driver_path)
url = "https://results.eci.gov.in/pcresults.htm" # Update with the correct URL
driver.get(url)
time.sleep(10) # Adjust the sleep time as needed
table = driver.find_element(By.ID, "datatable")
data = []
headers = [th.text.strip() for th in table.find_elements(By.TAG_NAME, "th")]
data.append(headers)
rows = table.find_elements(By.TAG_NAME, "tr")[1:] # Skip header row
for row in rows:
cols = row.find_elements(By.TAG_NAME, "td")
cols = [ele.text.strip() for ele in cols]
data.append(cols)
df = pd.DataFrame(data[1:], columns=data[0])
df.to_csv("election_results.csv", index=False)
driver.quit()
print("Data saved to election_results.csv")
# Uncomment one of the following lines to use the respective scraping method
# scrape_with_bs4()
# scrape_with_selenium()