chriskelly / lifefinances Goto Github PK

View Code? Open in Web Editor NEW

10.0 6.0 2.0 68.5 MB

Scripts for validating retirement plans using Monte Carlo analysis.

License: GNU Affero General Public License v3.0

Python 27.72% Jupyter Notebook 71.87% HTML 0.31% Dockerfile 0.02% Makefile 0.08%

financial-independence retirement-planning genetic-algorithm investing monte-carlo simulation

lifefinances's People

Contributors

Stargazers

Watchers

Forkers

tdubourg vishnuvelayuthan

lifefinances's Issues

Lower debug level for genetic.py that prints only local and final maxes

Code in Pension Cashout option

In the old version, I hard coded in the cashout amount, but I don't actually remember how I got it in the first place. Need to read the pension documentation.

Create more accurate inflation

Current inflation is symmetric Gaussian since it was based on the code that made stock returns. Actual inflation follows a more positively skewed distribution as shown in the image below. Need to change code to make more representative inflation.

Rename everything from time to date?

Ex: time_ls -> date_ls, FI Quarter -> FI Date

Seems like clearer language

Save/print summary from simulator.py run

average withdrawal rate @ FI date and median final value

Create constants.py

A single place for all constants that might be referred to by multiple scripts like return characteristics and gov_params.

Am I wasting time generating so many sets of returns?

Since I'm targeting similar annualized returns, would I get equally accurate information by just shuffling a single set of generated returns 5000 different ways rather than generating 5000 sets of returns? Would this be faster?

ReturnGenerator uses constants rather than parameters

returnGenerator.py is using constants for generating returns, but should be taking in params from the simulator.py so return parameters can be adjusted

Clean up params.json of unused parameters

Optimization Algorithm

Goal

Looking to create algorithm for optimizing adjustable parameters. This would help find high-performing parameter combinations faster than manually trying out combinations.

Requirements

Existing Monte Carlo simulator would provide the judgement result of the parameter combination.
Should be able to define unique range for each parameter and interval qty/spacing

Considerations

Some parameters won't cause any changes depending on the state of a different parameter (useless genes). For example, if the 'Life Cycle' allocation mode is not chosen, then altering 'Equity Target' parameter would cause no changes.

Options

Genetic Algorithm

A genetic algorithm treats the parameters as genes that can be mutated in a hyper-volume of possibilities. Different algorithms implement different approaches with evolutionary traits such as cross-breeding, parenting, and mutation.

DIY

Create my own algorithm specialized for this project

Full control of evolutionary traits
May not need to modify parameter management

PyGAD

Open source algorithm with many options for evolution

Documentation is not clean on how 'genes' need to be fed to the library
Unclear if I'll be able to clearly provide max, min, and intervals of each parameter
Unlikely to be able to prevent running useless gene combinations.

Package Life Finances for distribution

Follow tutorial here for making a test distribution

Inflation correlated to other returns

There's a partial correlation between inflation and RE & bond returns that would help success rates. Would be more accurate, but less conservative, to incorporate it

To be fair, we should build in all correlations, including those between Equities/Bonds, Equities/RE, etc. It may also be worth researching correlation in good times vs bad to see if it's consistent.

RE Ratio always seems to start with 0.6...

More accurate returns and annualized limits

Do the same work you did for inflation to have real annualized limits for stock/bond returns

Should pension/SS be linked to inflation?

More accurate, but less conservative. Should check and see how well SS has tracked inflation in the past. Perhaps a cap on the increase since SS might not track really high inflation years

What's the repeatability of simulator results for different quantities of monte carlo runs?

Seems to be around +/-2% at 5000 runs, would be valuable to see how that accuracy changes for smaller and larger numbers of runs

Update Social Security Rules for spousal income

https://www.ssa.gov/benefits/retirement/planner/applying7.html

A little convoluted, but seems to say spouse gets the greater of 50% of partner's benefit or 100% their own benefit.
Need to dig into the pension rules

Update Diagnostics.ipnyb and chart param_success.json data

Dynamic SS/Pension age

A method that decides whether to withdraw SS/Pension depending on portfolio size

Genetic: auto data reset

perhaps by checking latest modification data on simulator.py and params.json

Feature: Auto advance FI date in genetic

Rather than working with a fixed FI date, start with the date from params, then at each successful param combination (>95% success), reduce the FI date by 0.25 and start again.

perhaps also seed each updated date with the previous dates success

Speed up simulation

Running 5000x takes ~15sec (not including generating returns). Improving this would allow far more combinations to be tried.

Randomized Life Expectancy?

Should we randomize life expectancy within it's actual variation range? Seems risky to always assume the same life span.

Concerns:
How do you deal with spouses?
Would have to move the time_ls inside the monte carlo loop

When to use fresh returns

Currently using fresh returns for each set of children. Could consider only refreshing returns for each generation (when parent_is_best_qty == 0) Trading off variation for speed.

Only run some of the Simulation.py for genetic.Algorithm

Right now, we're running the entire Simulation.main() for every child creation. We should be able to save a little time by skipping the static lists (Time and Job Income). Would just need to separate that part into a different function and deal with retrieving all variables needed.

Need to be able to quit INVESTIGATE_MODE

Had issues trying to modify the global INVESTIGATE_MODE variable to False if the user wants to exit investigation mode.

social security back estimate

Complete method _back_estimate() so that if no historic earnings given to socialSecurity.py, it'll work backward from user's age to estimate previous earnings

MCV code should be in folders

Should have folders to organize MVC code. Just not sure how to import files in different folders. Need to test

Each user needs their own ss_earnings

Solutions:

Do what was done in issue #42 and .gitignore the file with try/except statements to manage first time implementation
Put SS_earnings.csv into params.json

Update ss_earnings.csv

Deal with fractional income years in Social Security

Improve _add_to_earnings_record() in socialSecurity.py to avoid adding fractional earnings for the first year

Order of operations (average and pow)

Diagnostics notebook shows there's a difference between averaging yearly results converted from quarterly results versus averaging quarterly results and then converting to yearly.

This shows very high average results for the worst_failure tested in this instance (12.7% average returns) and may be an issue elsewhere in the simulator. Need to investigate further!

Live updating chart for genetic algorithm

I'd like to be able to bring up a bar chart (or histogram) for every mutable parameter and update it with each generation with which option was used by the best child. Would give a sense of which option from each parameter works the best with most combinations.

Increasing gene creep per generation

Consider increasing the number of gene shifts possible with each additional generation of 'No Better Children'

Ability to add as many kids as you want

Major blocker here is the View. There's no ability in the view to add to a list

User specific parameters

First step towards enabling others to use the package. Possible implementation would be creating a sub-folder in the data folder that's not synced to Git. This folder would have a params.json that's specific to the user. The original params.json would used as the default and copied when the user first runs.

Need a way to track allocation

Allocation needs to be tracked in a list like the other variables so we can look at specific instances in Diagnostics. Might need a general method to do this for any variable that's created in an isolated loop.

Check params.json matches template

Make an automatic check whether data/params.json has different keys than data/default_params/params.json

Genetic: save final best parameters

Could save/overwrite each best parameter combo

Switch to from JSON to YAML for parameters?

https://www.geeksforgeeks.org/what-is-the-difference-between-yaml-and-json/

The ability to add comments and sequences would be valuable

Stop showing numbers in thousands?

Should I stop putting all currency numbers in thousands? It was originally done to improve readability in Google Sheets. It also is more readable in the Tk GUI.

Changing it to regular numbers would make it easier to read though in the CSV file since decimal places aren't limited to 3. I may also be able to change everything over to Ints instead since I don't care about pennies. Rates would still be floats, but all dollar values could be rounded to ints.

Consider keeping a list of previously tried children

Would this reduce time spent trying combinations? Would the list search eventually get too long? Since we're generally running at low resolution, is there value in giving a given parameter combination another chance?

Would need a way to reset the list whenever parameter changes are made

Retire with zero option (auto annuities purchase)

A method to dynamically purchase annuities as portfolio reaches higher values

Adjust step mutation prev_used_params to prevent infinite loop problem

I suspect you could have a situation where you are stuck in a local max trying to beat the TARGET_SUCCESS_RATE and run out of options (2^8 is only 256 combinations). In that case, you would get stuck in a loop making more parameter sets that have all been used.

Possible solutions:

skip the child if it is a repeat. This would cause you to hit the ITER_LIMIT and finally do a random mutation. Risk is you'd have slightly less children tested around each local max
Set a limit to the len(prev_used_params) that triggers a reset

Fix the Can't Stand The Pressure Issue

The genetic algorithm sometimes stagnates in a state of pretty high success rates at low monte carlo counts that can't hit the threshold when the monte carlo count is increased from it's regular 250 to 5000.

At the lower counts, random noise can push a parameter set above the 95% target, but it'll fall just below when tested at higher values. That parameter set is then used as a parent and it's likely one of the children will also luck about the 95% and start the cycle over again.