Predicting 2018 Pakistani Election using a Novel Rigged Model

Election results as predicted by this model before elections vs original results.

Introduction

This repository contains code for the election prediction model to predict 2018's general election of Pakistan and won first prize in nation-wide data science competiton. Paper based on this model is accepted in the special issue of Spriger's journal on Big Data and Politics. Find more about this model here.

Dependencies

scipy 0.18.1
matplotlib 2.0.0
pandas 0.19.2
tqdm 4.28.1
numpy 1.11.3
bayesian_optimization 0.6.0
thesis 0.0.01

How to run the code

cd GE2018
pip install -r requirements.txt
python main.py

Reference

Please cite this.

@article{awais2019leveraging,
  title={Leveraging big data for politics: predicting general election of Pakistan using a novel rigged model},
  author={Awais, Muhammad and Hassan, Saeed-Ul and Ahmed, Ali},
  journal={Journal of Ambient Intelligence and Humanized Computing},
  pages={1--9},
  year={2019},
  publisher={Springer}
}

License

This repository is licensed under the terms of the GNU AGPLv3 license.

Nan in party list causes an IndexError with predict_partyHistory()

There is an nan in party list which causes an IndexError when calculating the probablility based on party history. It seems that when you're iterating over your list of parties for prediction based on party's history in pervious polls you have a nan at some point in your party list:

C:\ProgramData\Anaconda3\envs\homework3\python.exe C:/Users/antoi/Documents/Programming/GE2018/main.py
Starting..
...
  0%|          | 0/270 [00:00<?, ?it/s]
...
 72%|███████▏  | 194/270 [00:41<00:16,  4.73it/s]
Traceback (most recent call last):
  File "C:/Users/antoi/Documents/Programming/GE2018/main.py", line 47, in <module>
    data1 = compare_methods("L2-EX")
  File "C:\Users\antoi\Documents\Programming\GE2018\comparison.py", line 118, in compare_methods
    party_wise_result, seat_wise_result = final_model(paras[:12])
  File "C:\Users\antoi\Documents\Programming\GE2018\model.py", line 55, in final_model
    candidate_prob += para6*np.array(predict_partyHistory(current_constituency_data))
  File "C:\Users\antoi\Documents\Programming\GE2018\predict.py", line 124, in predict_partyHistory
    votes = party_prob[0]
IndexError: list index out of range

Process finished with exit code 1

Indeed, contrarily to all others, at some iteration you have an nan from in your list of parties.

list_parties:  ['National Party', 'MMA', 'Allah-o-Akbar Tehreek', 'IND', 'PML-N', nan, 'IND', 'APML', 'PPPP', 'TLP', 'PTI']

And the related party_prob is empty when you try to get it from party nan:

    for party in list_parties:
        party_prob = df_probability[df_probability["Party"] == party]["Probability"].tolist()
        # if party is in gallup survey or it has zero rating
        is_in_history = (df_probability[df_probability["Party"].isin([party])].index).tolist()
        # if party is in gallup (not not empty list is false)
        if( not not is_in_history ):
            votes = party_prob[0]

Indeed, the results are:

party:  nan
df_probability[df_probability["Party"] == party]: 
 Empty DataFrame
Columns: [Party, Probability, Unnamed: 2, Unnamed: 3, Unnamed: 4, Unnamed: 5, Unnamed: 6]
Index: []
party_prob:  []

My attempt

If I try to filter the parties to get rid out of these nan I create a ValueError. I tried:

    # find probability of winning for each candidate from gallup survey
    candidate_prob = []
    list_parties = [x for x in list_parties if str(x) != 'nan']
    for party in list_parties:

But got:

C:\ProgramData\Anaconda3\envs\homework3\python.exe C:/Users/antoi/Documents/Programming/GE2018/main.py
Starting..
....
  0%|          | 1/270 [00:00<00:47,  5.72it/s]C:\Users\antoi\Documents\Programming\GE2018\predict.py:135: RuntimeWarning: divide by zero encountered in double_scalars
  prob_extra = 0.5*float(remaining_prob/remaining_candidates)
C:\Users\antoi\Documents\Programming\GE2018\predict.py:174: RuntimeWarning: divide by zero encountered in double_scalars
  prob_extra = 0.5*float(remaining_prob/remaining_candidates)
  7%|▋         | 19/270 [00:03<00:46,  5.45it/s]C:\Users\antoi\Documents\Programming\GE2018\predict.py:57: RuntimeWarning: divide by zero encountered in double_scalars
  prob_extra = 0.5*float(remaining_prob/remaining_candidates)
 72%|███████▏  | 194/270 [00:37<00:14,  5.12it/s]
Traceback (most recent call last):
  File "C:/Users/antoi/Documents/Programming/GE2018/main.py", line 47, in <module>
    data1 = compare_methods("L2-EX")
  File "C:\Users\antoi\Documents\Programming\GE2018\comparison.py", line 118, in compare_methods
    party_wise_result, seat_wise_result = final_model(paras[:12])
  File "C:\Users\antoi\Documents\Programming\GE2018\model.py", line 55, in final_model
    candidate_prob += para6*np.array(predict_partyHistory(current_constituency_data))
ValueError: operands could not be broadcast together with shapes (11,) (10,) (11,)

awaisrauf / ge2018 Goto Github PK

ge2018's Introduction

Predicting 2018 Pakistani Election using a Novel Rigged Model

Introduction

Dependencies

How to run the code

Reference

License

ge2018's People

Contributors

Stargazers

Watchers

Forkers

ge2018's Issues

Nan in party list causes an IndexError with predict_partyHistory()

My attempt

how to plot graphs of predicted party?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent