kstaats / karoo_gp Goto Github PK

A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

License: Other

Python 100.00%

karoo_gp's Introduction

Karoo GP

Karoo GP is an evolutionary algorithm, a genetic programming application suite written in Python which supports both symbolic regression and classification data analysis. It has been used in radio astronomy, gravitational wave detector characterisation and synthetic supernovae detection, and a variety of other use cases in a diversity of fields.

You need only prepare your dataset according to the User Guide. No programming required. Karoo is multicore and GPU enabled by means of the powerful library TensorFlow. Karoo has three text cases built-in: Iris dataset, Kepler's law of planetary motion, and a maths problem you can modify to various degrees of challenge.

Karoo is launched from the command line with an intuitive user interface or with arguments for full automation from bash or another Python script. The output of each run is automatically archived and includes the configuraiton, a summary, and the full suite of GP trees saved as .csv files for your review and edit such that you can hand-build the starting block for your next run.

Be certain to read the User Guide for a starter's guide to Genetic Programming and examples of all you can do with this unique body of code.

For an interesting read on scalar vs vector and CPU vs GPU performance with Karoo GP: https://arxiv.org/abs/1708.03157 or to learn how Karoo applied to supernova detection at LIGO: https://arxiv.org/abs/2002.04591

Learn more at kstaats.github.io/karoo_gp/ ...

karoo_gp's People

Contributors

Stargazers

Watchers

karoo_gp's Issues

How to change tensor name

Dear Kai,

I have the following tensor:

Tensor("Greater:0", shape=(1984,), dtype=bool, device=/device:GPU:0)

if I use tf.cast to change dtype=tf.float32, I get:

Tensor("Cast:0", shape=(1984,), dtype=float32, device=/device:GPU:0)

I need a tensor that has the same name after casting that looks like this:

Tensor("Greater:0", shape=(1984,), dtype=float32, device=/device:GPU:0)

In other words, change dtype while maintaining the **original name. **

Karoo 3.0 meta-issue

We're planning a major release, Karoo 3.0, that will be pip-installable and include all the recent features - the new API, sklearn-compatibility, logical operators - and some additional items. The purpose of this issue is to determine which items to include, and track our progress toward the release.

Below is my first take:

Karoo 3.0 (include)

#92
#83
#73 Inline-execution and tensorflow 2, per my latest comments
Documentation with sphinx for readthedocs.io, including the content from the Karoo_GP_User_Guide.pdf updated to be current

Karoo 3.N (later)

Add click for color in Interactive-mode menu.
Remove ast from the Tree.load process.
Basic parsimony (T/F, choose shorter if equal fitness, add to BaseGP.compare_fitness
Finalize sklearn: pass sklearn.utils.estimator_checks.check_estimator(BaseGP).
Add support for log and trigonometric funcs.
Add simplifying support for operators not supported by Sympy.
Tree.display support highlighting specific nodes
Replace the current calls to log/pause/error with a 'hooks' system.
Don't store X and y in the model directly. (needs hooks)
Make save/load population api more generic, move the population_a.csv references to a dict/enum in karoo-gp.py. (needs hooks)
Limit 'load' functionality in Interactive mode to certain stages
Make it possible to save/reload and continue training from same point.
Create more GP classes: BinaryClassifierGP, EnsembleClassifierGP, BooleanGP, etc.

run karoo_gp code on google colab

I have used a your project to run genetic programming on google colab, when i run karoo_gp file on google colab i get this error==>
ValueError: could not convert string to float: ' "shell_port": 37649'

How can i solve this problem?

Use joblib to use multithreading/multiprocessing computing

It will speed up the runs.
https://joblib.readthedocs.io/en/latest/

IF THEN ELSE

Hello,

I am keen of having conditional statements work. In tensor flow:

tf.cond( pred, true_fn=None, false_fn=None, name=None)

e.g. tf.cond(x < y, tf.add(x, z), tf.square(y))

However I cannot simplify this.

Any ideas for how we can make this work?

**IN ALL CASES: I added MAX & MIN (use min,2 max,2 without operator before) and attached the file. Karoo works more efficiently with max and min.

The file is attached, but note that I used the modified base by https://github.com/rll2021/karoo_gp/commits?author=rll2021.**

=================

Karoo GP Base Class

Define the methods and global variables used by Karoo GP

by Kai Staats, MSc with TensorFlow support provided by Iurii Milovanov; see LICENSE.md

pip install package preparation by Antonio Spadaro and Ezio Melotti

version 2.4 for Python 3.8

'''
A NOTE TO THE NEWBIE, EXPERT, AND BRAVE
Even if you are highly experienced in Genetic Programming, it is recommended that you review the 'Karoo User Guide' before running
this application. While your computer will not burst into flames nor will the sun collapse into a black hole if you do not, you will
likely find more enjoyment of this particular flavour of GP with a little understanding of its intent and design.
'''

import sys
import os
import csv
import time
import math

import numpy as np
import sklearn.metrics as skm
#import sklearn.cross_validation as skcv # Python 2.7
import sklearn.model_selection as skcv

from sympy import sympify
from datetime import datetime
from collections import OrderedDict

from . import pause as menu

np.random.seed(1000) # for reproducibility

TensorFlow Imports and Definitions

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"

import tensorflow as tf

import tensorflow.compat.v1 as tf; tf.disable_v2_behavior() # from https://www.tensorflow.org/guide/migrate on 20210125
import ast
import operator as op

operators = {ast.Add: tf.add, # e.g., a + b
ast.Sub: tf.subtract, # e.g., a - b
ast.Mult: tf.multiply, # e.g., a * b
ast.Div: tf.divide, # e.g., a / b
ast.Pow: tf.pow, # e.g., a ** 2
ast.USub: tf.negative, # e.g., -a
ast.And: tf.logical_and, # e.g., a and b
ast.Or: tf.logical_or, # e.g., a or b
ast.Not: tf.logical_not, # e.g., not a
ast.Eq: tf.equal, # e.g., a == b
ast.NotEq: tf.not_equal, # e.g., a != b
ast.Lt: tf.less, # e.g., a < b
ast.LtE: tf.less_equal, # e.g., a <= b
ast.Gt: tf.greater, # e.g., a > b
ast.GtE: tf.greater_equal, # e.g., a >= 1
'abs': tf.abs, # e.g., abs(a)
'sign': tf.sign, # e.g., sign(a)
'square': tf.square, # e.g., square(a)
'sqrt': tf.sqrt, # e.g., sqrt(a)
'pow': tf.pow, # e.g., pow(a, b)
'log': tf.log, # e.g., log(a)
'log1p': tf.log1p, # e.g., log1p(a)
'cos': tf.cos, # e.g., cos(a)
'sin': tf.sin, # e.g., sin(a)
'tan': tf.tan, # e.g., tan(a)
'acos': tf.acos, # e.g., acos(a)
'asin': tf.asin, # e.g., asin(a)
'atan': tf.atan, # e.g., atan(a)
'exp': tf.exp, # e.g. exp(a)
'expm1': tf.expm1, # e.g. expm1(a)
'min': tf.math.maximum, # e.g., min(a,b)
'max': tf.math.minimum, # e.g., max(a,b)
}

np.set_printoptions(linewidth = 320) # set the terminal to print 320 characters before line-wrapping in order to view Trees

class Base_GP(object):

'''
This Base_BP class contains all methods for Karoo GP. Method names are differentiated from global variable names 
(defined below) by the prefix 'fx_' followed by an object and action, as in fx_display_tree(), with a few 
expections, such as fx_fitness_gene_pool().

The method categories (denoted by +++ banners +++) are as follows:
	fx_karoo_					Methods to Run Karoo GP
	fx_data_					Methods to Load and Archive Data
	fx_init_					Methods to Construct the 1st Generation
	fx_eval_					Methods to Evaluate a Tree
	fx_fitness_					Methods to Train and Test a Tree for Fitness
	fx_nextgen_					Methods to Construct the next Generation
	fx_evolve_					Methods to Evolve a Population
	fx_display_					Methods to Visualize a Tree
	
Error checks are quickly located by searching for 'ERROR!'
'''
	
def __init__(self):
			
	'''
	### Global variables used for data management ###
	self.data_train				store train data for processing in TF
	self.data_test				store test data for processing in TF
	self.tf_device				set TF computation backend device (CPU or GPU)
	self.tf_device_log			employed for TensorFlow debugging

	self.data_train_cols		number of cols in the TRAINING data - see fx_data_load()
	self.data_train_rows		number of rows in the TRAINING data - see fx_data_load()
	self.data_test_cols			number of cols in the TEST data - see fx_data_load()
	self.data_test_rows			number of rows in the TEST data - see fx_data_load()

	self.functions				user defined functions (operators) from the associated files/[functions].csv
	self.terminals				user defined variables (operands) from the top row of the associated [data].csv
	self.coeff					user defined coefficients (NOT YET IN USE)
	self.fitness_type			fitness type
	self.datetime				date-time stamp of when the unique directory is created
	self.path					full path to the unique directory created with each run
	self.dataset				local path and dataset filename
	
	### Global variables used for evolutionary management ###
	self.population_a			the root generation from which Trees are chosen for mutation and reproduction
	self.population_b			the generation constructed from gp.population_a (recyled)
	self.gene_pool				once-per-generation assessment of trees that meet min and max boundary conditions
	self.gen_id					simple n + 1 increment
	self.fitness_type			set in fx_data_load() as either a minimising or maximising function
	self.tree					axis-1, 13 element Numpy array that defines each Tree, stored in 'gp.population'
	self.pop_*					13 variables that define each Tree - see fx_init_tree_initialise()
	'''
	
	self.algo_raw = [] # the raw expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL
	self.algo_sym = [] # the expression generated by Sympy per Tree -- CONSIDER MAKING THIS VARIABLE LOCAL
	self.fittest_dict = {} # all Trees which share the best fitness score
	self.gene_pool = [] # store all Tree IDs for use by Tournament
	self.class_labels = 0 # the number of true class labels (data_y)
	
	return
	

#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Run Karoo GP                  |
#+++++++++++++++++++++++++++++++++++++++++++++

def fx_karoo_gp(self, kernel, tree_type, tree_depth_base, tree_depth_max, tree_depth_min, tree_pop_max, gen_max, tourn_size, filename, evolve_repro, evolve_point, evolve_branch, evolve_cross, display, precision, swim, mode):

	'''
	This method enables the engagement of the entire Karoo GP application. Instead of returning the user to the pause 
	menu, this script terminates at the command-line, providing support for bash and chron job execution.
	
	Calld by: user script karoo_gp.py
	
	Arguments required: (see below)
	'''
	
	### PART 1 - set global variables to those local values passed from the user script ###
	self.kernel = kernel # fitness function
	# tree_type is passed between methods to construct specific trees
	# tree_depth_base is passed between methods to construct specific trees
	self.tree_depth_max = tree_depth_max # maximum Tree depth for the entire run; limits bloat
	self.tree_depth_min = tree_depth_min # minimum number of nodes
	self.tree_pop_max = tree_pop_max # maximum number of Trees per generation
	self.gen_max = gen_max # maximum number of generations
	self.tourn_size = tourn_size # number of Trees selected for each tournament
	# filename is passed between methods to work with specific populations
	self.evolve_repro = evolve_repro # quantity of a population generated through Reproduction
	self.evolve_point = evolve_point # quantity of a population generated through Point Mutation
	self.evolve_branch = evolve_branch # quantity of a population generated through Branch Mutation
	self.evolve_cross = evolve_cross # quantity of a population generated through Crossover
	self.display = display # display mode is set to (s)ilent # level of on-screen feedback
	self.precision = precision # the number of floating points for the round function in 'fx_fitness_eval'
	self.swim = swim # pass along the gene_pool restriction methodology
	# mode is engaged at the end of the run, below
	
	### PART 2 - construct first generation of Trees ###
	self.fx_data_load(filename)
	self.gen_id = 1 # set initial generation ID
	self.population_a = ['Karoo GP by Kai Staats, Generation ' + str(self.gen_id)] # initialise population_a to host the first generation
	self.population_b = ['placeholder'] # initialise population_b to satisfy fx_karoo_pause()
	self.fx_init_construct(tree_type, tree_depth_base) # construct the first population of Trees
	
	if self.kernel == 'p': # terminate here for Play mode
		self.fx_display_tree(self.tree) # print the current Tree
		self.fx_data_tree_write(self.population_a, 'a') # save this one Tree to disk
		sys.exit()

	elif self.gen_max == 1: # terminate here if constructing just one generation
		self.fx_data_tree_write(self.population_a, 'a') # save this single population to disk
		print ('\n We have constructed a single, stochastic population of', self.tree_pop_max,'Trees, and saved to disk')
		sys.exit()
		
	else: print ('\n We have constructed the first, stochastic population of', self.tree_pop_max,'Trees')
	
	### PART 3 - evaluate first generation of Trees ###
	print ('\n Evaluate the first generation of Trees ...')
	self.fx_fitness_gym(self.population_a) # generate expression, evaluate fitness, compare fitness
	self.fx_data_tree_write(self.population_a, 'a') # save the first generation of Trees to disk
	
	### PART 4 - evolve multiple generations of Trees ###
	menu = 1
	while menu != 0: # this allows the user to add generations mid-run and not get buried in nested iterations
		for self.gen_id in range(self.gen_id + 1, self.gen_max + 1): # evolve additional generations of Trees
			
			print ('\n Evolve a population of Trees for Generation', self.gen_id, '...')
			self.population_b = ['Karoo GP by Kai Staats - Evolving Generation'] # initialise population_b to host the next generation
			self.fx_fitness_gene_pool() # generate the viable gene pool (compares against gp.tree_depth_min)
			self.fx_nextgen_reproduce() # method 1 - Reproduction
			self.fx_nextgen_point_mutate() # method 2 - Point Mutation
			self.fx_nextgen_branch_mutate() # method 3 - Branch Mutation
			self.fx_nextgen_crossover() # method 4 - Crossover
			self.fx_eval_generation() # evaluate all Trees in a single generation
			self.population_a = self.fx_evolve_pop_copy(self.population_b, ['Karoo GP by Kai Staats - Generation ' + str(self.gen_id)])
			
		if mode == 's': menu = 0 # (s)erver mode - termination with completiont of prescribed run
		else: # (d)esktop mode - user is given an option to quit, review, and/or modify parameters; 'add' generations continues the run
			print ('\n\t\033[32m Enter \033[1m?\033[0;0m\033[32m to review your options or \033[1mq\033[0;0m\033[32muit\033[0;0m')
			menu = self.fx_karoo_pause()
			
	self.fx_karoo_terminate() # archive populations and return to karoo_gp.py for a clean exit
	
	return
	

def fx_karoo_pause_refer(self):

	'''
	Enables (g)eneration, (i)nteractive, and (d)e(b)ug display modes to offer the (pause) menu at each prompt.
	
	See fx_karoo_pause() for an explanation of the value being passed.
	
	Called by: the functions called by PART 4 of fx_karoo_gp()
	
	Arguments required: none
	'''
	
	menu = 1
	while menu == 1: menu = self.fx_karoo_pause()
	
	return
	

def fx_karoo_pause(self):

	'''
	Pause the program execution and engage the user, providing a number of options. 
	
	Called by: fx_karoo_pause_refer
	
	Arguments required: [0,1,2] where (0) refers to an end-of-run; (1) refers to any use of the (pause) menu from 
	within the run, and anticipates ENTER as an escape from the menu to continue the run; and (2) refers to an 
	'ERROR!' for which the user may want to archive data before terminating. At this point in time, (2) is 
	associated with each error but does not provide any special options).
	'''
	
	### PART 1 - reset and pack values to send to menu.pause ###
	menu_dict = {'input_a':'', 
		'input_b':0, 
		'display':self.display, 
		'tree_depth_max':self.tree_depth_max, 
		'tree_depth_min':self.tree_depth_min, 
		'tree_pop_max':self.tree_pop_max, 
		'gen_id':self.gen_id, 
		'gen_max':self.gen_max, 
		'tourn_size':self.tourn_size, 
		'evolve_repro':self.evolve_repro, 
		'evolve_point':self.evolve_point, 
		'evolve_branch':self.evolve_branch, 
		'evolve_cross':self.evolve_cross, 
		'fittest_dict':self.fittest_dict, 
		'pop_a_len':len(self.population_a), 
		'pop_b_len':len(self.population_b), 
		'path':self.path}
		
	menu_dict = menu.pause(menu_dict) # call the external function menu.pause
	
	### PART 2 - unpack values returned from menu.pause ###
	input_a = menu_dict['input_a']
	input_b = menu_dict['input_b']
	self.display = menu_dict['display']
	self.tree_depth_min = menu_dict['tree_depth_min']
	self.gen_max = menu_dict['gen_max']
	self.tourn_size = menu_dict['tourn_size']
	self.evolve_repro = menu_dict['evolve_repro']
	self.evolve_point = menu_dict['evolve_point']
	self.evolve_branch = menu_dict['evolve_branch']
	self.evolve_cross = menu_dict['evolve_cross']
	
	### PART 3 - execute the user queries returned from menu.pause ###
	if input_a == 'esc': return 2 # breaks out of the fx_karoo_gp() or fx_karoo_pause_refer() loop
	
	elif input_a == 'eval': # evaluate a Tree against the TEST data
		self.fx_eval_poly(self.population_b[input_b]) # generate the raw and sympified expression for the given Tree using SymPy
		#print ('\n\t\033[36mTree', input_b, 'yields (raw):', self.algo_raw, '\033[0;0m') # print the raw expression
		print ('\n\t\033[36mTree', input_b, 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m') # print the sympified expression
		
		result = self.fx_fitness_eval(str(self.algo_sym), self.data_test, get_pred_labels = True) # might change to algo_raw evaluation			
		
		if self.kernel == 'c': self.fx_fitness_test_classify(result) # TF tested 2017 02/02
		elif self.kernel == 'r': self.fx_fitness_test_regress(result)
		elif self.kernel == 'm': self.fx_fitness_test_match(result)
		# elif self.kernel == '[other]': # use others as a template
		
	elif input_a == 'print_a': # print a Tree from population_a
		self.fx_display_tree(self.population_a[input_b])
		
	elif input_a == 'print_b': # print a Tree from population_b
		self.fx_display_tree(self.population_b[input_b])
		
	elif input_a == 'pop_a': # list all Trees in population_a
		print ('')
		for tree_id in range(1, len(self.population_a)):
			self.fx_eval_poly(self.population_a[tree_id]) # extract the expression
			print ('\t\033[36m Tree', self.population_a[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
			
	elif input_a == 'pop_b': # list all Trees in population_b
		print ('')
		for tree_id in range(1, len(self.population_b)):
			self.fx_eval_poly(self.population_b[tree_id]) # extract the expression
			print ('\t\033[36m Tree', self.population_b[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
			
	elif input_a == 'load': # load population_s to replace population_a
		self.fx_data_recover(self.filename['s']) # NEED TO replace 's' with a user defined filename
		
	elif input_a == 'write': # write the evolving population_b to disk
		self.fx_data_tree_write(self.population_b, 'b')
		print ('\n\t All current members of the evolving population_b saved to karoo_gp/runs/[date-time]/population_b.csv')
		
	elif input_a == 'add': # check for added generations, then exit fx_karoo_pause and continue the run
		self.gen_max = self.gen_max + input_b # if input_b > 0: self.gen_max = self.gen_max + input_b - REMOVED 2019 06/05
		
	elif input_a == 'quit': self.fx_karoo_terminate() # archive populations and exit
	
	return 1
	

def fx_karoo_terminate(self):
	'''
	Terminates the evolutionary run (if yet in progress), saves parameters and data to disk, and cleanly returns
	the user to karoo_gp.py and the command line.
	
	Called by: fx_karoo_gp() and fx_karoo_pause_refer()
	
	Arguments required: none
	'''
	
	self.fx_data_params_write()
	target = open(self.filename['f'], 'w'); target.close() # initialize the .csv file for the final population
	self.fx_data_tree_write(self.population_b, 'f') # save the final generation of Trees to disk
	print ('\n\t\033[32m Your Trees and runtime parameters are archived in karoo_gp/runs/[date-time]/\033[0;0m')
	
	print ('\n\033[3m "It is not the strongest of the species that survive, nor the most intelligent,\033[0;0m')
	print ('\033[3m  but the one most responsive to change."\033[0;0m --Charles Darwin\n')
	print ('\033[3m Congrats!\033[0;0m Your Karoo GP run is complete.\n')
	sys.exit()
	
	return
	

#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Load and Archive Data         |
#+++++++++++++++++++++++++++++++++++++++++++++

def fx_data_load(self, filename):

	'''
	The data and function .csv files are loaded according to the fitness function kernel selected by the user. An
	alternative dataset may be loaded at launch, by appending a command line argument. The data is then split into 
	both TRAINING and TEST segments in order to validate the success of the GP training run. Datasets less than
	10 rows will not be split, rather copied in full to both TRAINING and TEST as it is assumed you are conducting
	a system validation run, as with the built-in MATCH kernel and associated dataset.
	
	Called by: fx_karoo_gp
	
	Arguments required: filename (of the dataset)
	'''
	
	### PART 1 - load the associated data set, operators, operands, fitness type, and coefficients ###
	# full_path = os.path.realpath(__file__); karoo_dir = os.path.dirname(full_path) # for user Marco Cavaglia
	karoo_dir = os.path.dirname(os.path.realpath(__file__))
	
	data_dict = {'c':karoo_dir + '/files/data_CLASSIFY.csv', 'r':karoo_dir + '/files/data_REGRESS.csv', 'm':karoo_dir + '/files/data_MATCH.csv', 'p':karoo_dir + '/files/data_PLAY.csv'}
	
	if len(sys.argv) == 1: # load data from the default karoo_gp/files/ directory
		data_x = np.loadtxt(data_dict[self.kernel], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column
		data_y = np.loadtxt(data_dict[self.kernel], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels)
		header = open(data_dict[self.kernel],'r') # open file to be read (below)
		self.dataset = data_dict[self.kernel] # copy the name only
		
	elif len(sys.argv) == 2: # load an external data file
		data_x = np.loadtxt(sys.argv[1], skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column
		data_y = np.loadtxt(sys.argv[1], skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels)
		header = open(sys.argv[1],'r') # open file to be read (below)
		self.dataset = sys.argv[1] # copy the name only
		
	elif len(sys.argv) > 2: # receive filename and additional arguments from karoo_gp.py via argparse
		data_x = np.loadtxt(filename, skiprows = 1, delimiter = ',', dtype = float); data_x = data_x[:,0:-1] # load all but the right-most column
		data_y = np.loadtxt(filename, skiprows = 1, usecols = (-1,), delimiter = ',', dtype = float) # load only right-most column (class labels)
		header = open(filename,'r') # open file to be read (below)
		self.dataset = filename # copy the name only
		
	fitt_dict = {'c':'max', 'r':'min', 'm':'max', 'p':''}
	self.fitness_type = fitt_dict[self.kernel] # load fitness type
	
	func_dict = {'c':karoo_dir + '/files/operators_CLASSIFY.csv', 'r':karoo_dir + '/files/operators_REGRESS.csv', 'm':karoo_dir + '/files/operators_MATCH.csv', 'p':karoo_dir + '/files/operators_PLAY.csv'}
	self.functions = np.loadtxt(func_dict[self.kernel], delimiter=',', skiprows=1, dtype = str) # load the user defined functions (operators)
	self.terminals = header.readline().split(','); self.terminals[-1] = self.terminals[-1].replace('\n','') # load the user defined terminals (operands)
	self.class_labels = len(np.unique(data_y)) # load the user defined true labels for classification or solutions for regression
	#self.coeff = np.loadtxt(karoo_dir + '/files/coefficients.csv', delimiter=',', skiprows=1, dtype = str) # load the user defined coefficients - NOT USED YET
	
	### PART 2 - from the dataset, extract TRAINING and TEST data ###
	if len(data_x) < 11: # for small datasets we will not split them into TRAINING and TEST components
		data_train = np.c_[data_x, data_y]
		data_test = np.c_[data_x, data_y]
		
	else: # if larger than 10, we run the data through the SciKit Learn's 'random split' function
		x_train, x_test, y_train, y_test = skcv.train_test_split(data_x, data_y, test_size = 0.2) # 80/20 TRAIN/TEST split
		data_x, data_y = [], [] # clear from memory
		
		data_train = np.c_[x_train, y_train] # recombine each row of data with its associated class label (right column)
		x_train, y_train = [], [] # clear from memory
		
		data_test = np.c_[x_test, y_test] # recombine each row of data with its associated class label (right column)
		x_test, y_test = [], [] # clear from memory
		
	self.data_train_cols = len(data_train[0,:]) # qty count
	self.data_train_rows = len(data_train[:,0]) # qty count
	self.data_test_cols = len(data_test[0,:]) # qty count
	self.data_test_rows = len(data_test[:,0]) # qty count
	
	### PART 3 - load TRAINING and TEST data for TensorFlow processing - tested 2017 02/02
	self.data_train = data_train # Store train data for processing in TF
	self.data_test = data_test # Store test data for processing in TF
	self.tf_device = "/gpu:0" # Set TF computation backend device (CPU or GPU); gpu:n = 1st, 2nd, or ... GPU device
	self.tf_device_log = False # TF device usage logging (for debugging)
	
	### PART 4 - create a unique directory and initialise all .csv files ###
	self.datetime = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
	self.path = os.path.join(os.getcwd(), 'runs', filename.split('.')[0] + '_' + self.datetime + '/') # generate a unique directory name
	if not os.path.isdir(self.path): os.makedirs(self.path) # make a unique directory
	
	self.filename = {} # a dictionary to hold .csv filenames
	
	self.filename.update( {'a':self.path + 'population_a.csv'} )
	target = open(self.filename['a'], 'w'); target.close() # initialise a .csv file for population 'a' (foundation)
	
	self.filename.update( {'b':self.path + 'population_b.csv'} )
	target = open(self.filename['b'], 'w'); target.close() # initialise a .csv file for population 'b' (evolving)
	
	self.filename.update( {'f':self.path + 'population_f.csv'} )
	target = open(self.filename['f'], 'w'); target.close() # initialise a .csv file for the final population (test)
	
	self.filename.update( {'s':self.path + 'population_s.csv'} )
	target = open(self.filename['s'], 'w'); target.close() # initialise a .csv file to manually load (seed)
	
	return
	

def fx_data_recover(self, population):

	'''
	This method is used to load a saved population of Trees, as invoked through the (pause) menu where population_r 
	replaces population_a in the karoo_gp/runs/[date-time]/ directory.
	
	Called by: fx_karoo_pause
	
	Arguments required: population (filename['s'])
	'''
	
	with open(population, 'rb') as csv_file:
		target = csv.reader(csv_file, delimiter=',')
		n = 0 # track row count
		
		for row in target:
			print ('row', row)
			
			n = n + 1
			if n == 1: pass # skip first empty row
			
			elif n == 2:
				self.population_a = [row] # write header to population_a
				
			else:
				if row == []:
					self.tree = np.array([[]]) # initialise Tree array
					
				else:
					if self.tree.shape[1] == 0:
						self.tree = np.append(self.tree, [row], axis = 1) # append first row to Tree
						
					else:
						self.tree = np.append(self.tree, [row], axis = 0) # append subsequent rows to Tree
						
				if self.tree.shape[0] == 13:
					self.population_a.append(self.tree) # append complete Tree to population list
					
	print ('\n', self.population_a)
	
	return
	
	
def fx_data_tree_clean(self, tree):

	'''
	This method aesthetically cleans the Tree array, removing redundant data.
	
	Called by: fx_data_tree_append, fx_evolve_branch_copy
	
	Arguments required: tree
	'''
	
	tree[0][2:] = '' # A little clean-up to make things look pretty :)
	tree[1][2:] = '' # Ignore the man behind the curtain!
	tree[2][2:] = '' # Yes, I am a bit OCD ... but you *know* you appreciate clean arrays.
	
	return tree
	

def fx_data_tree_append(self, tree):

	'''
	Append Tree array to the foundation Population.
	
	Called by: fx_init_construct
	
	Arguments required: tree
	'''
	
	self.fx_data_tree_clean(tree) # clean 'tree' prior to storing
	self.population_a.append(tree) # append 'tree' to population list
	
	return
	

def fx_data_tree_write(self, population, key):

	'''
	Save population_* to disk.
	
	Called by: fx_karoo_gp, fx_eval_generation
	
	Arguments required: population, key
	'''
	
	with open(self.filename[key], 'a') as csv_file:
		target = csv.writer(csv_file, delimiter=',')
		if self.gen_id != 1: target.writerows(['']) # empty row before each generation
		target.writerows([['Karoo GP by Kai Staats', 'Generation:', str(self.gen_id)]])
		
		for tree in range(1, len(population)):
			target.writerows(['']) # empty row before each Tree
			for row in range(0, 13): # increment through each row in the array Tree
				target.writerows([population[tree][row]])
				
	return
	

def fx_data_params_write(self): # tested 2017 02/13; argument 'app' removed to simplify termination 2019 06/08

	'''
	Save run-time configuration parameters to disk.
	
	Called by: fx_karoo_gp, fx_karoo_pause
	
	Arguments required: app
	'''
	
	file = open(self.path + 'log_config.txt', 'w')
	file.write('Karoo GP')
	file.write('\n launched: ' + str(self.datetime))
	file.write('\n dataset: ' + str(self.dataset))
	file.write('\n')
	file.write('\n kernel: ' + str(self.kernel))
	file.write('\n precision: ' + str(self.precision))
	file.write('\n')
	# file.write('tree type: ' + tree_type)
	# file.write('tree depth base: ' + str(tree_depth_base))
	file.write('\n tree depth max: ' + str(self.tree_depth_max))
	file.write('\n min node count: ' + str(self.tree_depth_min))
	file.write('\n')
	file.write('\n genetic operator Reproduction: ' + str(self.evolve_repro))
	file.write('\n genetic operator Point Mutation: ' + str(self.evolve_point))
	file.write('\n genetic operator Branch Mutation: ' + str(self.evolve_branch))
	file.write('\n genetic operator Crossover: ' + str(self.evolve_cross))
	file.write('\n')
	file.write('\n tournament size: ' + str(self.tourn_size))
	file.write('\n population: ' + str(self.tree_pop_max))
	file.write('\n number of generations: ' + str(self.gen_id))		
	file.write('\n\n')
	file.close()
	
	
	file = open(self.path + 'log_test.txt', 'w')
	file.write('Karoo GP')
	file.write('\n launched: ' + str(self.datetime))
	file.write('\n dataset: ' + str(self.dataset))
	file.write('\n')
	
	if len(self.fittest_dict) > 0:
	
		fitness_best = 0
		fittest_tree = 0
					
		# revised method, re-evaluating all Trees from stored fitness score
		for tree_id in range(1, len(self.population_b)):
		
			fitness = float(self.population_b[tree_id][12][1])
			
			if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel
				if fitness >= fitness_best: # find the Tree with Maximum fitness score
					fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
					
			elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel
				if fitness_best == 0: fitness_best = fitness # set the baseline first time through
				if fitness <= fitness_best: # find the Tree with Minimum fitness score
					fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
					
			elif self.kernel == 'm': # display best fit Trees for the MATCH kernel
				if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows
					fitness_best = fitness; fittest_tree = tree_id # set best fitness Tree
					
			# elif self.kernel == '[other]': # use others as a template
					
			# print ('fitness_best:', fitness_best, 'fittest_tree:', fittest_tree)
			
		
		# test the most fit Tree and write to the .txt log
		self.fx_eval_poly(self.population_b[int(fittest_tree)]) # generate the raw and sympified expression for the given Tree using SymPy
		expr = str(self.algo_sym) # get simplified expression and process it by TF - tested 2017 02/02
		result = self.fx_fitness_eval(expr, self.data_test, get_pred_labels = True)
		
		file.write('\n\n Tree ' + str(fittest_tree) + ' is the most fit, with expression:')
		file.write('\n\n ' + str(self.algo_sym))
		
		if self.kernel == 'c':
			file.write('\n\n Classification fitness score: {}'.format(result['fitness']))
			file.write('\n\n Precision-Recall report:\n {}'.format(skm.classification_report(result['solution'], result['pred_labels'][0])))
			file.write('\n Confusion matrix:\n {}'.format(skm.confusion_matrix(result['solution'], result['pred_labels'][0])))
			
		elif self.kernel == 'r':
			MSE, fitness = skm.mean_squared_error(result['result'], result['solution']), result['fitness']
			file.write('\n\n Regression fitness score: {}'.format(fitness))
			file.write('\n Mean Squared Error: {}'.format(MSE))
			
		elif self.kernel == 'm':
			file.write('\n\n Matching fitness score: {}'.format(result['fitness']))
			
		# elif self.kernel == '[other]': # use others as a template
	
	else: file.write('\n\n There were no evolved solutions generated in this run... your species has gone extinct!')
	
	file.write('\n\n')
	file.close()
	
	return
			

#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Construct the 1st Generation  |
#+++++++++++++++++++++++++++++++++++++++++++++

def fx_init_construct(self, tree_type, tree_depth_base):
	
	'''
	This method constructs the initial population of Tree type 'tree_type' and of the size tree_depth_base. The Tree
	can be Full, Grow, or "Ramped Half/Half" as defined by John Koza.
	
	Called by: fx_karoo_gp
	
	Arguments required: tree_type, tree_depth_base
	'''
	
	if self.display == 'i':
		print ('\n\t\033[32m Press \033[36m\033[1m?\033[0;0m\033[32m at any \033[36m\033[1m(pause)\033[0;0m\033[32m, or \033[36m\033[1mENTER\033[0;0m \033[32mto continue the run\033[0;0m'); self.fx_karoo_pause_refer()
		
	if tree_type == 'r': # Ramped 50/50
		
		TREE_ID = 1
		for n in range(1, int((self.tree_pop_max / 2) / tree_depth_base) + 1): # split the population into equal parts
			for depth in range(1, tree_depth_base + 1): # build 2 Trees at each depth
				self.fx_init_tree_build(TREE_ID, 'f', depth) # build a Full Tree
				self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a'
				TREE_ID = TREE_ID + 1
				
				self.fx_init_tree_build(TREE_ID, 'g', depth) # build a Grow Tree
				self.fx_data_tree_append(self.tree) # append Tree to the list 'gp.population_a'
				TREE_ID = TREE_ID + 1
					
		if TREE_ID < self.tree_pop_max: # eg: split 100 by 2*3 and it will produce only 96 Trees ...
			for n in range(self.tree_pop_max - TREE_ID + 1): # ... so we complete the run
				self.fx_init_tree_build(TREE_ID, 'g', tree_depth_base)
				self.fx_data_tree_append(self.tree)
				TREE_ID = TREE_ID + 1
				
		else: pass
								
	else: # Full or Grow
		for TREE_ID in range(1, self.tree_pop_max + 1):
			self.fx_init_tree_build(TREE_ID, tree_type, tree_depth_base) # build the 1st generation of Trees
			self.fx_data_tree_append(self.tree)
			
	return
	

def fx_init_tree_build(self, TREE_ID, tree_type, tree_depth_base):

	'''
	This method combines 4 sub-methods into a single method for ease of deployment. It is designed to executed 
	within a loop such that an entire population is built. However, it may also be run from the command line, 
	passing a single TREE_ID to the method.
	
	'tree_type' is either (f)ull or (g)row. Note, however, that when the user selects 'ramped 50/50' at launch, 
	it is still (f) or (g) which are passed to this method.
	
	Called by: fx_init_construct, fx_evolve_crossover, fx_evolve_grow_mutate
	
	Arguments required: TREE_ID, tree_type, tree_depth_base
	'''
	
	self.fx_init_tree_initialise(TREE_ID, tree_type, tree_depth_base) # initialise a new Tree
	self.fx_init_root_build() # build the Root node
	self.fx_init_function_build() # build the Function nodes
	self.fx_init_terminal_build() # build the Terminal nodes
	
	return # each Tree is written to 'gp.tree'
	

def fx_init_tree_initialise(self, TREE_ID, tree_type, tree_depth_base):

	'''
	Assign 13 global variables to the array 'tree'.
	
	Build the array 'tree' with 13 rows and initally, just 1 column of labels. This array will grow horizontally as 
	each new node is appended. The values of this array are stored as string characters, numbers forced to integers at
	the point of execution.
	
	Use of the debug (db) interface mode enables the user to watch the genetic operations as they work on the Trees.
	
	Called by: fx_init_tree_build
	
	Arguments required: TREE_ID, tree_type, tree_depth_base
	'''
	
	self.pop_TREE_ID = TREE_ID 			# pos 0: a unique identifier for each tree
	self.pop_tree_type = tree_type	# pos 1: a global constant based upon the initial user setting
	self.pop_tree_depth_base = tree_depth_base	# pos 2: a global variable which conveys 'tree_depth_base' as unique to each new Tree
	self.pop_NODE_ID = 1 						# pos 3: unique identifier for each node; this is the INDEX KEY to this array
	self.pop_node_depth = 0 				# pos 4: depth of each node when committed to the array
	self.pop_node_type = '' 				# pos 5: root, function, or terminal
	self.pop_node_label = '' 				# pos 6: operator [+, -, *, ...] or terminal [a, b, c, ...]
	self.pop_node_parent = '' 			# pos 7: parent node
	self.pop_node_arity = '' 				# pos 8: number of nodes attached to each non-terminal node
	self.pop_node_c1 = '' 					# pos 9: child node 1
	self.pop_node_c2 = '' 					# pos 10: child node 2
	self.pop_node_c3 = '' 					# pos 11: child node 3 (assumed max of 3 with boolean operator 'if')
	self.pop_fitness = ''						# pos 12: fitness score following Tree evaluation
	
	self.tree = np.array([ ['TREE_ID'],['tree_type'],['tree_depth_base'],['NODE_ID'],['node_depth'],['node_type'],['node_label'],['node_parent'],['node_arity'],['node_c1'],['node_c2'],['node_c3'],['fitness'] ])
	
	return
	

### Root Node ###

def fx_init_root_build(self):

	'''
	Build the Root node for the initial population.
	
	Called by: fx_init_tree_build
	
	Arguments required: none
	'''
			
	self.fx_init_function_select() # select the operator for root
	
	if self.pop_node_arity == 1: # 1 child
		self.pop_node_c1 = 2
		self.pop_node_c2 = ''
		self.pop_node_c3 = ''
		
	elif self.pop_node_arity == 2: # 2 children
		self.pop_node_c1 = 2
		self.pop_node_c2 = 3
		self.pop_node_c3 = ''
		
	elif self.pop_node_arity == 3: # 3 children
		self.pop_node_c1 = 2
		self.pop_node_c2 = 3
		self.pop_node_c3 = 4

	else: print ('\n\t\033[31m ERROR! In fx_init_root_build: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08

	self.pop_node_type = 'root'
		
	self.fx_init_node_commit()

	return
	

### Function Nodes ###

def fx_init_function_build(self):

	'''
	Build the Function nodes for the intial population.
	
	Called by: fx_init_tree_build
	
	Arguments required: none
	'''
	
	for i in range(1, self.pop_tree_depth_base): # increment depth, from 1 through 'tree_depth_base' - 1
	
		self.pop_node_depth = i # increment 'node_depth'
		
		parent_arity_sum = 0
		prior_sibling_arity = 0 # reset for 'c_buffer' in 'children_link'
		prior_siblings = 0 # reset for 'c_buffer' in 'children_link'
		
		for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree'
		
			if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth
				parent_arity_sum = parent_arity_sum + int(self.tree[8][j]) # sum arities of all parent nodes at the prior depth
				
				# (do *not* merge these 2 "j" loops or it gets all kinds of messed up)
									
		for j in range(1, len(self.tree[3])): # increment through all nodes (exclude 0) in array 'tree'
		
			if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth

				for k in range(1, int(self.tree[8][j]) + 1): # increment through each degree of arity for each parent node
					self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID' ...
					prior_sibling_arity = self.fx_init_function_gen(parent_arity_sum, prior_sibling_arity, prior_siblings) # ... generate a Function ndoe
					prior_siblings = prior_siblings + 1 # sum sibling nodes (current depth) who will spawn their own children (cousins? :)
											
	return
	

def fx_init_function_gen(self, parent_arity_sum, prior_sibling_arity, prior_siblings):

	'''
	Generate a single Function node for the initial population.
	
	Called by fx_init_function_build
	
	Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings
	'''
	
	if self.pop_tree_type == 'f': # user defined as (f)ull
		self.fx_init_function_select() # retrieve a function
		self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children
		
	elif self.pop_tree_type == 'g': # user defined as (g)row
		rnd = np.random.randint(2)
		
		if rnd == 0: # randomly selected as Function
			self.fx_init_function_select() # retrieve a function
			self.fx_init_child_link(parent_arity_sum, prior_sibling_arity, prior_siblings) # establish links to children
			
		elif rnd == 1: # randomly selected as Terminal
			self.fx_init_terminal_select() # retrieve a terminal
			self.pop_node_c1 = ''
			self.pop_node_c2 = ''
			self.pop_node_c3 = ''
			
	self.fx_init_node_commit() # commit new node to array
	prior_sibling_arity = prior_sibling_arity + self.pop_node_arity # sum the arity of prior siblings
	
	return prior_sibling_arity
	

def fx_init_function_select(self):

	'''
	Define a single Function (operator extracted from the associated functions.csv) for the initial population.
	
	Called by: fx_init_function_gen, fx_init_root_build
	
	Arguments required: none
	'''
	
	self.pop_node_type = 'func'
	rnd = np.random.randint(0, len(self.functions[:,0])) # call the previously loaded .csv which contains all operators
	self.pop_node_label = self.functions[rnd][0]
	self.pop_node_arity = int(self.functions[rnd][1])
	
	return
	

### Terminal Nodes ###

def fx_init_terminal_build(self):

	'''
	Build the Terminal nodes for the intial population.
	
	Called by: fx_init_tree_build
	
	Arguments required: none
	'''
		
	self.pop_node_depth = self.pop_tree_depth_base # set the final node_depth (same as 'gp.pop_node_depth' + 1)
	
	for j in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree'
	
		if int(self.tree[4][j]) == self.pop_node_depth - 1: # find parent nodes which reside at the prior depth
		
			for k in range(1,(int(self.tree[8][j]) + 1)): # increment through each degree of arity for each parent node
				self.pop_node_parent = int(self.tree[3][j]) # set the parent 'NODE_ID'  ...
				self.fx_init_terminal_gen() # ... generate a Terminal node
				
	return
	

def fx_init_terminal_gen(self):

	'''
	Generate a single Terminal node for the initial population.
	
	Called by: fx_init_terminal_build
	
	Arguments required: none
	'''
	
	self.fx_init_terminal_select() # retrieve a terminal
	self.pop_node_c1 = ''
	self.pop_node_c2 = ''
	self.pop_node_c3 = ''

	self.fx_init_node_commit() # commit new node to array

	return
	

def fx_init_terminal_select(self):

	'''
	Define a single Terminal (variable extracted from the top row of the associated TRAINING data)
	
	Called by: fx_init_terminal_gen, fx_init_function_gen
	
	Arguments required: none
	'''
			
	self.pop_node_type = 'term'
	rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals
	self.pop_node_label = self.terminals[rnd]
	self.pop_node_arity = 0
	
	return
	

### The Lovely Children ###
		
def fx_init_child_link(self, parent_arity_sum, prior_sibling_arity, prior_siblings):

	'''
	Link each parent node to its children in the intial population.
	
	Called by: fx_init_function_gen
	
	Arguments required: parent_arity_sum, prior_sibling_arity, prior_siblings
	'''
	
	c_buffer = 0
	
	for n in range(1, len(self.tree[3]) ): # increment through all nodes (exclude 0) in array 'tree'
	
		if int(self.tree[4][n]) == self.pop_node_depth - 1: # find all nodes that reside at the prior (parent) 'node_depth'
		
			c_buffer = self.pop_NODE_ID + (parent_arity_sum + prior_sibling_arity - prior_siblings) # One algo to rule the world!
			
			if self.pop_node_arity == 0: # terminal in a Grow Tree
				self.pop_node_c1 = ''
				self.pop_node_c2 = ''
				self.pop_node_c3 = ''
				
			elif self.pop_node_arity == 1: # 1 child
				self.pop_node_c1 = c_buffer
				self.pop_node_c2 = ''
				self.pop_node_c3 = ''
				
			elif self.pop_node_arity == 2: # 2 children
				self.pop_node_c1 = c_buffer
				self.pop_node_c2 = c_buffer + 1
				self.pop_node_c3 = ''
				
			elif self.pop_node_arity == 3: # 3 children
				self.pop_node_c1 = c_buffer
				self.pop_node_c2 = c_buffer + 1
				self.pop_node_c3 = c_buffer + 2
				
			else: print ('\n\t\033[31m ERROR! In fx_init_child_link: pop_node_arity =', self.pop_node_arity, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
			
	return
	

def fx_init_node_commit(self):

	'''
	Commit the values of a new node (root, function, or terminal) to the array 'tree'.
	
	Called by: fx_init_root_build, fx_init_function_gen, fx_init_terminal_gen
	
	Arguments required: none
	'''
	
	self.tree = np.append(self.tree, [ [self.pop_TREE_ID],[self.pop_tree_type],[self.pop_tree_depth_base],[self.pop_NODE_ID],[self.pop_node_depth],[self.pop_node_type],[self.pop_node_label],[self.pop_node_parent],[self.pop_node_arity],[self.pop_node_c1],[self.pop_node_c2],[self.pop_node_c3],[self.pop_fitness] ], 1)
	
	self.pop_NODE_ID = self.pop_NODE_ID + 1
	
	return
	

#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Evaluate a Tree               |
#+++++++++++++++++++++++++++++++++++++++++++++		

def fx_eval_poly(self, tree):

	'''
	Evaluate a Tree and generate its multivariate expression (both raw and Sympified).
			
	We need to extract the variables from the expression. However, these variables are no longer correlated
	to the original variables listed across the top of each column of data.csv. Therefore, we must re-assign 
	the respective values for each subsequent row in the data .csv, for each Tree's unique expression.
	
	Called by: fx_karoo_pause, fx_data_params_write, fx_eval_label, fx_fitness_gym, fx_fitness_gene_pool, fx_display_tree
	
	Arguments required: tree
	'''
	
	self.algo_raw = self.fx_eval_label(tree, 1) # pass the root 'node_id', then flatten the Tree to a string
	self.algo_sym = sympify(self.algo_raw) # convert string to a functional expression (the coolest line in Karoo! :)
	
	return
			

def fx_eval_label(self, tree, node_id):

	'''
	Evaluate all or part of a Tree (starting at node_id) and return a raw mutivariate expression ('algo_raw').
	
	This method is called once per Tree, but may be called at any time to prepare an expression for any full or 
	partial (branch) Tree contained in 'population'. Pass the starting node for recursion via the local variable 
	'node_id' where the local variable 'tree' is a copy of the Tree you desire to evaluate.
	
	Called by: fx_eval_poly, fx_eval_label (recursively)
	
	Arguments required: tree, node_id
	'''
	
	# if tree[6, node_id] == 'not': tree[6, node_id] = ', not' # temp until this can be fixed at data_load
	
	node_id = int(node_id)
	
	if tree[8, node_id] == '0': # arity of 0 for the pattern '[term]'
		return tree[6, node_id] # 'node_label' (function or terminal)
		
	else:
		if tree[8, node_id] == '1': # arity of 1 for the explicit pattern 'not [term]'
			return tree[6, node_id] + '(' + self.fx_eval_label(tree, tree[9, node_id]) + ')'	 # rll 20210201
			
		elif tree[8, node_id] == '2': # arity of 2 for the pattern '[func] [term] [func]'
			if tree[6, node_id] == 'min':		
				return tree[6, node_id]  + '(' + self.fx_eval_label(tree, tree[9, node_id])+ ',' + self.fx_eval_label(tree, tree[10, node_id]) + ')'
			if tree[6, node_id] == 'max':	
				return tree[6, node_id]  + '(' + self.fx_eval_label(tree, tree[9, node_id])+ ',' + self.fx_eval_label(tree, tree[10, node_id]) + ')'
			else:
				return '(' + self.fx_eval_label(tree, tree[9, node_id]) + ')' + tree[6, node_id] + '(' + self.fx_eval_label(tree, tree[10, node_id]) +')'		# rll 20210201
			
		elif tree[8, node_id] == '3': # arity of 3 for the explicit pattern 'if [term] then [term] else [term]'
			# This fails in sympify. rll 20210206
			return tree[6, node_id] + self.fx_eval_label(tree, tree[9, node_id]) + ' then ' + self.fx_eval_label(tree, tree[10, node_id]) + ' else ' + self.fx_eval_label(tree, tree[11, node_id])
					

def fx_eval_id(self, tree, node_id):

	'''
	Evaluate all or part of a Tree and return a list of all 'NODE_ID's.

	This method generates a list of all 'NODE_ID's from the given Node and below. It is used primarily to generate 
	'branch' for the multi-generational mutation of Trees.

	Pass the starting node for recursion via the local variable 'node_id' where the local variable 'tree' is a copy 
	of the Tree you desire to evaluate.
	
	Called by: fx_eval_id (recursively), fx_evolve_branch_select
	
	Arguments required: tree, node_id	
	'''
	
	node_id = int(node_id)
	
	if tree[8, node_id] == '0': # arity of 0 for the pattern '[NODE_ID]'
		return tree[3, node_id] # 'NODE_ID'
		
	else:
		if tree[8, node_id] == '1': # arity of 1 for the pattern '[NODE_ID], [NODE_ID]'
			return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id])
			
		elif tree[8, node_id] == '2': # arity of 2 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID]'
			return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id])
			
		elif tree[8, node_id] == '3': # arity of 3 for the pattern '[NODE_ID], [NODE_ID], [NODE_ID], [NODE_ID]'
			return tree[3, node_id] + ', ' + self.fx_eval_id(tree, tree[9, node_id]) + ', ' + self.fx_eval_id(tree, tree[10, node_id]) + ', ' + self.fx_eval_id(tree, tree[11, node_id])
			

def fx_eval_generation(self):

	'''
	This method invokes the evaluation of an entire generation of Trees. It automatically evaluates population_b 
	before invoking the copy of _b to _a.
	
	Called by: fx_karoo_gp
	
	Arguments required: none
	'''
	
	if self.display != 's':
		if self.display == 'i': print ('')
		print ('\n Evaluate all Trees in Generation', self.gen_id)
		if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07
		
	for tree_id in range(1, len(self.population_b)): # renumber all Trees in given population - merged fx_evolve_tree_renum 2018 04/12
		self.population_b[tree_id][0][1] = tree_id
		
	self.fx_fitness_gym(self.population_b) # run fx_eval(), fx_fitness(), fx_fitness_store(), and fitness record
	self.fx_data_tree_write(self.population_b, 'a') # archive current population as foundation for next generation
	
	if self.display != 's':
		print ('\n Copy gp.population_b to gp.population_a\n')
		
	return
	

#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Train and Test a Tree         |
#+++++++++++++++++++++++++++++++++++++++++++++

def fx_fitness_gym(self, population):

	'''		
	Part 1 evaluates each expression against the data, line for line. This is the most time consuming and
	computationally expensive part of genetic programming. When GPUs are available, the performance can increase
	by many orders of magnitude for datasets measured in millions of data.
	
	Part 2 evaluates every Tree in each generation to determine which have the best, overall fitness score. This 
	could be the highest or lowest depending upon if the fitness function is maximising (higher is better) or 
	minimising (lower is better). The total fitness score is then saved with each Tree in the external .csv file.
	
	Part 3 compares the fitness of each Tree to the prior best fit in order to track those that improve with each
	comparison. For matching functions, all the Trees will have the same fitness score, but they may present more 
	than one solution. For minimisation and maximisation functions, the final Tree should present the best overall 
	fitness for that generation. It is important to note that Part 3 does *not* in any way influence the Tournament 
	Selection which is a stand-alone process.
	
	Called by: fx_karoo_gp, fx_eval_generations
	
	Arguments required: population
	'''
	
	fitness_best = 0
	self.fittest_dict = {}
	time_sum = 0
	
	for tree_id in range(1, len(population)):
	
		### PART 1 - GENERATE MULTIVARIATE EXPRESSION FOR EACH TREE ###
		self.fx_eval_poly(population[tree_id]) # extract the expression
		if self.display not in ('s'): print ('\t\033[36mTree', population[tree_id][0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
		
		### PART 2 - EVALUATE FITNESS FOR EACH TREE AGAINST TRAINING DATA ###
		fitness = 0
		
		expr = str(self.algo_sym) # get sympified expression and process it with TF - tested 2017 02/02
		# Sympy occasionally returns 'zoo', the complex infinity. This will raise a KeyError exception.	# rll 20210202
		try:
			result = self.fx_fitness_eval(expr, self.data_train)
			fitness = result['fitness'] # extract fitness score
		except KeyError:
			if self.kernel == 'c':
				fitness = float('-Infinity')
			elif self.kernel == 'r':
				fitness = float('Infinity')
			elif self.kernel == 'm':
				fitness == ('Infinity')		# We assume infinity does not occur in the training data.
		
		if self.display == 'i':
			print ('\t \033[36m with fitness sum:\033[1m', fitness, '\033[0;0m\n')
			
		self.fx_fitness_store(population[tree_id], fitness) # store Fitness with each Tree
		
		
		### PART 3 - COMPARE FITNESS OF ALL TREES IN CURRENT GENERATION ###
		if self.kernel == 'c': # display best fit Trees for the CLASSIFY kernel
			if fitness >= fitness_best: # find the Tree with Maximum fitness score
				fitness_best = fitness # set best fitness score
				self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness >= prior
				
		elif self.kernel == 'r': # display best fit Trees for the REGRESSION kernel
			if fitness_best == 0: fitness_best = fitness # set the baseline first time through
			if fitness <= fitness_best: # find the Tree with Minimum fitness score
				fitness_best = fitness # set best fitness score
				self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if fitness <= prior
				
		elif self.kernel == 'm': # display best fit Trees for the MATCH kernel
			if fitness == self.data_train_rows: # find the Tree with a perfect match for all data rows
				fitness_best = fitness # set best fitness score
				self.fittest_dict.update({tree_id:self.algo_sym}) # add to dictionary if all rows match
				
		# elif self.kernel == '[other]': # use others as a template
		
	print ('\n\033[36m ', len(list(self.fittest_dict.keys())), 'trees\033[1m', np.sort(list(self.fittest_dict.keys())), '\033[0;0m\033[36moffer the highest fitness scores.\033[0;0m')
	if self.display == 'g': self.fx_karoo_pause_refer() # 2019 06/07
	
	return
	

def fx_fitness_eval(self, expr, data, get_pred_labels = False):

	'''		
	Computes tree expression using TensorFlow (TF) returning results and fitness scores.
	
	This method orchestrates most of the TF routines by parsing input string 'expression' and converting it into a TF 
	operation graph which is then processed in an isolated TF session to compute the results and corresponding fitness 
	values.
	
		'self.tf_device' - controls which device will be used for computations (CPU or GPU).
		'self.tf_device_log' - controls device placement logging (debug only).
	Args:
		'expr' - a string containing math expression to be computed on the data. Variable names should match corresponding 
		terminal names in 'self.terminals'.
		
		'data' - an 'n by m' matrix of the data points containing n observations and m features per observation. 
		Variable order should match corresponding order of terminals in 'self.terminals'.
		'get_pred_labels' - a boolean flag which controls whether the predicted labels should be extracted from the 
		evolved results. This applies only to the CLASSIFY kernel and defaults to 'False'.
		
	Returns:
		A dict mapping keys to the following outputs:
			'result' - an array of the results of applying given expression to the data
			'pred_labels' - an array of the predicted labels extracted from the results; defined only for CLASSIFY kernel, else None
			'solution' - an array of the solution values extracted from the data (variable 's' in the dataset)
			'pairwise_fitness' - an array of the element-wise results of applying corresponding fitness kernel function
			'fitness' - aggregated scalar fitness score
			
	Called by: fx_karoo_pause, fx_data_params_write, fx_fitness_gym
	
	Arguments required: expr, data
	'''
	
	# Initialize TensorFlow session
	tf.reset_default_graph() # Reset TF internal state and cache (after previous processing)
	config = tf.ConfigProto(log_device_placement=self.tf_device_log, allow_soft_placement=True)
	config.gpu_options.allow_growth = True
	
	with tf.Session(config=config) as sess:
		with sess.graph.device(self.tf_device):
		
			# 1 - Load data into TF vectors
			tensors = {}
			for i in range(len(self.terminals)):
				var = self.terminals[i]
				tensors[var] = tf.constant(data[:, i], dtype=tf.float32) # converts data into vectors
				
			# 2- Transform string expression into TF operation graph
			result = self.fx_fitness_expr_parse(expr, tensors)
			pred_labels = tf.no_op() # a placeholder, applies only to CLASSIFY kernel
			solution = tensors['s'] # solution value is assumed to be stored in 's' terminal
			
			# 3- Add fitness computation into TF graph
			if self.kernel == 'c': # CLASSIFY kernel
			
				'''
				Creates element-wise fitness computation TensorFlow (TF) sub-graph for CLASSIFY kernel.
				
				This method uses the 'sympified' (SymPy) expression ('algo_sym') created in fx_eval_poly() and the data set 
				loaded at run-time to evaluate the fitness of the selected kernel.
				
				This multiclass classifer compares each row of a given Tree to the known solution, comparing predicted labels 
				generated by Karoo GP against the true classs labels. This method is able to work with any number of class 
				labels, from 2 to n. The left-most bin includes -inf. The right-most bin includes +inf. Those inbetween are 
				by default confined to the spacing of 1.0 each, as defined by:
				
					(solution - 1) < result <= solution
				
				The skew adjusts the boundaries of the bins such that they fall on both the negative and positive sides of the 
				origin. At the time of this writing, an odd number of class labels will generate an extra bin on the positive 
				side of origin as it has not yet been determined the effect of enabling the middle bin to include both a 
				negative and positive result.
				'''
				
				# was breaking with upgrade from Tensorflow 1.1 to 1.3; fixed by Iurii by replacing [] with () as of 20171026
				# if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = [tf.int32, tf.string], swap_memory = True)
				# dtype deprecated here with upgrade to TensorFlow 2.4; fixed with fn_output_signature, # rll 20210202
				# if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype = (tf.int32, tf.string), swap_memory = True)
				if get_pred_labels: pred_labels = tf.map_fn(self.fx_fitness_labels_map, result, fn_output_signature = (tf.int32, tf.string), swap_memory = True)
				
				skew = (self.class_labels / 2) - 1
				
				rule11 = tf.equal(solution, 0)
				rule12 = tf.less_equal(result, 0 - skew)
				rule13 = tf.logical_and(rule11, rule12)
				
				rule21 = tf.equal(solution, self.class_labels - 1)
				rule22 = tf.greater(result, solution - 1 - skew)
				rule23 = tf.logical_and(rule21, rule22)
				
				rule31 = tf.less(solution - 1 - skew, result)
				rule32 = tf.less_equal(result, solution - skew)
				rule33 = tf.logical_and(rule31, rule32)
				
				pairwise_fitness = tf.cast(tf.logical_or(tf.logical_or(rule13, rule23), rule33), tf.int32)
				
				
			elif self.kernel == 'r': # REGRESSION kernel
			
				'''
				A very, very basic REGRESSION kernel which is not designed to perform well in the real world. It requires
				that you raise the minimum node count to keep it from converging on the value of '1'. Consider writing or 
				integrating a more sophisticated kernel.
				'''
				
				#pairwise_fitness = tf.abs(solution - result)		# Inconsistent with skm.mean_squared_error elsewhere # rll 20210203
				pairwise_fitness = tf.squared_difference(solution, result)
				
				
			elif self.kernel == 'm': # MATCH kernel
			
				'''
				This is used for demonstration purposes only.
				'''

				# pairwise_fitness = tf.cast(tf.equal(solution, result), tf.int32) # breaks due to floating points
				RTOL, ATOL = 1e-05, 1e-08 # fixes above issue by checking if a float value lies within a range of values
				pairwise_fitness = tf.cast(tf.less_equal(tf.abs(solution - result), ATOL + RTOL * tf.abs(result)), tf.int32)
				
			# elif self.kernel == '[other]': # use others as a template
				
			else: raise Exception('Kernel type is wrong or missing. You entered {}'.format(self.kernel))
			
			fitness = tf.reduce_sum(pairwise_fitness)
			
			# Process TF graph and collect the results
			result, pred_labels, solution, fitness, pairwise_fitness = sess.run([result, pred_labels, solution, fitness, pairwise_fitness])
			
	return {'result': result, 'pred_labels': pred_labels, 'solution': solution, 'fitness': float(fitness), 'pairwise_fitness': pairwise_fitness}
	

def fx_fitness_expr_parse(self, expr, tensors):

	'''		
	Extract expression tree from the string algo_sym and transform into TensorFlow (TF) graph.
	
	Called by: fx_fitness_eval
	
	Arguments required: expr, tensors
	'''
	
	tree = ast.parse(expr, mode='eval').body
	
	return self.fx_fitness_node_parse(tree, tensors)


def fx_fitness_chain_bool(self, values, operation, tensors):

	'''
	Chains a sequence of boolean operations (e.g. 'a and b and c') into a single TensorFlow (TF) sub graph.
	
	Called by: fx_fitness_node_parse
	Arguments required: values, operation, tensors
	'''

	x = tf.cast(self.fx_fitness_node_parse(values[0], tensors), tf.bool)
	if len(values) > 1:
		return operation(x, self.fx_fitness_chain_bool(values[1:], operation, tensors))
	else:
		return x
		

def fx_fitness_chain_compare(self, comparators, ops, tensors):

	'''
	Chains a sequence of comparison operations (e.g. 'a > b < c') into a single TensorFlow (TF) sub graph.
	
	Called by: fx_fitness_node_parse
	Arguments required: comparators, ops, tensors
	'''

	x = self.fx_fitness_node_parse(comparators[0], tensors)
	y = self.fx_fitness_node_parse(comparators[1], tensors)
	if len(comparators) > 2:
		return tf.logical_and(operators[type(ops[0])](x, y), self.fx_fitness_chain_compare(comparators[1:], ops[1:], tensors))
	else:
		return operators[type(ops[0])](x, y)
	

def fx_fitness_node_parse(self, node, tensors):

	'''		
	Recursively transforms parsed expression tree into TensorFlow (TF) graph.
	
	Called by: fx_fitness_expr_parse, fx_fitness_chain_bool, fx_fitness_chain_compare
	
	Arguments required: node, tensors
	'''
	
	if isinstance(node, ast.Name): # <tensor_name>
		return tensors[node.id]
	
	elif isinstance(node, ast.Num): # <number>
		#shape = tensors[tensors.keys()[0]].get_shape() # Python 2.7
		shape = tensors[list(tensors.keys())[0]].get_shape()
		return tf.constant(node.n, shape=shape, dtype=tf.float32)
		
	elif isinstance(node, ast.BinOp): # <left> <operator> <right>, e.g., x + y
		return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
		
	elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
		return operators[type(node.op)](self.fx_fitness_node_parse(node.operand, tensors))

	elif isinstance(node, ast.Call):  # <function>(<arguments>) e.g., sin(x)
		return operators[node.func.id](*[self.fx_fitness_node_parse(arg, tensors) for arg in node.args])

	elif isinstance(node, ast.BoolOp):  # <left> <bool_operator> <right> e.g. x or y
		return self.fx_fitness_chain_bool(node.values, operators[type(node.op)], tensors)

	elif isinstance(node, ast.Compare):  # <left> <compare> <right> e.g., a > z
		return self.fx_fitness_chain_compare([node.left] + node.comparators, node.ops, tensors)
		
	else: raise TypeError(node)
	

def fx_fitness_labels_map(self, result):

	'''
	For the CLASSIFY kernel, creates a TensorFlow (TF) sub-graph defined as a sequence of boolean conditions based upon
	the quantity of true class labels provided in the data .csv. Outputs an array of tuples containing the predicted 
	labels based upon the result and corresponding boolean condition triggered.
	
	For comparison, the original (pre-TensorFlow) cod follows:
	
		skew = (self.class_labels / 2) - 1 # '-1' keeps a binary classification splitting over the origin
		if solution == 0 and result <= 0 - skew; fitness = 1: # check for first class (the left-most bin)
		elif solution == self.class_labels - 1 and result > solution - 1 - skew; fitness = 1: # check for last class (the right-most bin)
		elif solution - 1 - skew < result <= solution - skew; fitness = 1: # check for class bins between first and last
		else: fitness = 0 # no class match
		
	Called by: fx_fitness_eval
	
	Arguments required: result
	'''
	
	skew = (self.class_labels / 2) - 1
	label_rules = {self.class_labels - 1: (tf.constant(self.class_labels - 1), tf.constant(' > {}'.format(self.class_labels - 2 - skew)))}
	
	for class_label in range(self.class_labels - 2, 0, -1):
		cond = (class_label - 1 - skew < result) & (result <= class_label - skew)
		label_rules[class_label] = tf.cond(cond, lambda: (tf.constant(class_label), tf.constant(' <= {}'.format(class_label - skew))), lambda: label_rules[class_label + 1])
		
	pred_label = tf.cond(result <= 0 - skew, lambda: (tf.constant(0), tf.constant(' <= {}'.format(0 - skew))), lambda: label_rules[1])
		
	return pred_label
	

def fx_fitness_store(self, tree, fitness):

	'''
	Records the fitness and length of the raw algorithm (multivariate expression) to the Numpy array. Parsimony can 
	be used to apply pressure to the evolutionary process to select from a set of trees with the same fitness function 
	the one(s) with the simplest (shortest) multivariate expression.
	
	Called by: fx_fitness_gym
	
	Arguments required: tree, fitness
	'''

	fitness = float(fitness)
	fitness = round(fitness, self.precision)
	
	tree[12][1] = fitness # store the fitness with each tree
	tree[12][2] = len(str(self.algo_raw)) # store the length of the raw algo for parsimony
	# if len(tree[3]) > 4: # if the Tree array is wide enough -- SEE SCRATCHPAD
	
	return
	

def fx_fitness_tournament(self, tourn_size):

	'''
	Multiple contenders ('tourn_size') are randomly selected and then compared for their respective fitness, as 
	determined in fx_fitness_gym(). The tournament is engaged to select a single Tree for each invocation of the
	genetic operators: reproduction, mutation (point, branch), and crossover (sexual reproduction).
	
	The original Tournament Selection drew directly from the foundation generation (gp.generation_a). However, 
	with the introduction of a minimum number of nodes as defined by the user ('gp.tree_depth_min'), 
	'gp.gene_pool' limits the Trees to those which meet all criteria.
	
	Stronger boundary parameters (a reduced gap between the min and max number of nodes) may invoke more compact 
	solutions, but also runs the risk of elitism, even total population die-off where a healthy population once existed.
	
	Called by: fx_nextgen_reproduce, fx_nextgen_point_mutate, fx_nextgen_branch_mutate, fx_nextgen_crossover
	
	Arguments required: tourn_size
	'''
	
	tourn_test = 0
	# short_test = 0 # an incomplete parsimony test (seeking shortest solution)
	
	if self.display == 'i': print ('\n\tEnter the tournament ...')
	
	for n in range(tourn_size):
		# tree_id = np.random.randint(1, self.tree_pop_max + 1) # former method of selection from the unfiltered population
		rnd = np.random.randint(len(self.gene_pool)) # select one Tree at random from the gene pool
		tree_id = int(self.gene_pool[rnd])
		
		fitness = float(self.population_a[tree_id][12][1]) # extract the fitness from the array
		fitness = round(fitness, self.precision) # force 'result' and 'solution' to the same number of floating points
		
		if self.fitness_type == 'max': # if the fitness function is Maximising
		
			# first time through, 'tourn_test' will be initialised below
			
			# Try to avoid NaN blowing up the logic. # rll 20210202
			if math.isnan(fitness):
				#print ('*** NaN ***')
				fitness = float('-inf')
			
			if fitness > tourn_test: # if the current Tree's 'fitness' is greater than the priors'
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'has fitness', fitness, '>', tourn_test, 'and leads\033[0;0m')
				tourn_lead = tree_id # set 'TREE_ID' for the new leader
				tourn_test = fitness # set 'fitness' of the new leader
				# short_test = int(self.population_a[tree_id][12][2]) # set len(algo_raw) of new leader
				
			elif fitness == tourn_test: # if the current Tree's 'fitness' is equal to the priors'
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'has fitness', fitness, '=', tourn_test, 'and leads\033[0;0m')
				tourn_lead = tree_id # in case there is no variance in this tournament
				# tourn_test remains unchanged
				
				# NEED TO add option for parsimony
				# if int(self.population_a[tree_id][12][2]) < short_test:
					# short_test = int(self.population_a[tree_id][12][2]) # set len(algo_raw) of new leader
					# print ('\t\033[36m with improved parsimony score of:\033[1m', short_test, '\033[0;0m')
					
			elif fitness < tourn_test: # if the current Tree's 'fitness' is less than the priors'
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'has fitness', fitness, '<', tourn_test, 'and is ignored\033[0;0m')
				# tourn_lead remains unchanged
				# tourn_test remains unchanged
				
			else: print ('\n\t\033[31m ERROR! In fx_fitness_tournament: fitness =', fitness, 'and tourn_test =', tourn_test, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
			
		
		elif self.fitness_type == 'min': # if the fitness function is Minimising
		
			# Try to avoid NaN blowing up the logic. # rll 20210202
			if math.isnan(fitness):
				#print ('*** NaN ***')
				fitness = float('inf')
			
			if tourn_test == 0: # first time through, 'tourn_test' is given a baseline value
				tourn_test = fitness
				
			if fitness < tourn_test: # if the current Tree's 'fitness' is less than the priors'
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'has fitness', fitness, '<', tourn_test, 'and leads\033[0;0m')
				tourn_lead = tree_id # set 'TREE_ID' for the new leader
				tourn_test = fitness # set 'fitness' of the new leader
				
			elif fitness == tourn_test: # if the current Tree's 'fitness' is equal to the priors'
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'has fitness', fitness, '=', tourn_test, 'and leads\033[0;0m')
				tourn_lead = tree_id # in case there is no variance in this tournament
				# tourn_test remains unchanged
				
			elif fitness > tourn_test: # if the current Tree's 'fitness' is greater than the priors'
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'has fitness', fitness, '>', tourn_test, 'and is ignored\033[0;0m')
				# tourn_lead remains unchanged
				# tourn_test remains unchanged
				
			else: print ('\n\t\033[31m ERROR! In fx_fitness_tournament: fitness =', fitness, 'and tourn_test =', tourn_test, '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
				
	
	tourn_winner = np.copy(self.population_a[tourn_lead]) # copy full Tree so as to not inadvertantly modify the original tree
	
	if self.display == 'i': print ('\n\t\033[36mThe winner of the tournament is Tree:\033[1m', tourn_winner[0][1], '\033[0;0m')
	
	return tourn_winner
	

def fx_fitness_gene_pool(self):

	'''
	The gene pool was introduced as means by which advanced users could define additional constraints on the evolved
	functions, in an effort to guide the evolutionary process. The first constraint introduced is the 'mininum number
	of nodes' parameter (gp.tree_depth_min). This defines the minimum number of nodes (in the context of Karoo, this 
	refers to both functions (operators) and terminals (operands)).
	
	When the minimum node count is human guided, it can keep the solution from defaulting to a local minimum, as with
	't/t' in the Kepler problem, by forcing a more complex solution. If you find that when engaging the Regression 
	kernel you are met with a solution which is too simple (eg: linear instead of non-linear), try increasing the 
	minimum number of nodes (with the launch of Karoo, or mid-stream by way of the pause menu).
	
	With additional or alternative constraints, you may customize how the next generation is selected.
	
	At this time, the gene pool does *not* limit the number of times any given Tree may be selected for reproduction or
	mutation nor does it take into account parsimony (seeking the simplest multivariate expression).
	
	This method is automatically invoked with every Tournament Selection - fx_fitness_tournament().
	
	Called by: fx_karoo_gp
	
	Arguments required: none
	'''
	
	self.gene_pool = []
	if self.display == 'i': print ('\n Prepare a viable gene pool ...'); self.fx_karoo_pause_refer() # 2019 06/07
	
	for tree_id in range(1, len(self.population_a)):
	
		self.fx_eval_poly(self.population_a[tree_id]) # extract the expression
		
		if self.swim == 'p': # each tree must have the min number of nodes defined by the user
			if len(self.population_a[tree_id][3])-1 >= self.tree_depth_min and self.algo_sym != 1: # check if Tree meets the requirements
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'has >=', self.tree_depth_min, 'nodes and is added to the gene pool\033[0;0m')
				self.gene_pool.append(self.population_a[tree_id][0][1])
		
		elif self.swim == 'f': # each tree must contain at least one instance of each feature
			if len(np.intersect1d([self.population_a[tree_id][6]],[self.terminals])) == len(self.terminals)-1: # check if Tree contains at least one instance of each feature - 2018 04/14 APS, Ohio
				if self.display == 'i': print ('\t\033[36m Tree', tree_id, 'includes at least one of each feature and is added to the gene pool\033[0;0m')
				self.gene_pool.append(self.population_a[tree_id][0][1])
				
		# elif self.swim == '[other]' # use others as a template
				
	if len(self.gene_pool) > 0 and self.display == 'i': print ('\n\t The total population of the gene pool is', len(self.gene_pool)); self.fx_karoo_pause_refer() # 2019 06/07
	
	elif len(self.gene_pool) <= 0: # the evolutionary constraints were too tight, killing off the entire population
		# self.gen_id = self.gen_id - 1 # revert the increment of the 'gen_id'
		# self.gen_max = self.gen_id # catch the unused "cont" values in the fx_karoo_pause() method
		print ("\n\t\033[31m\033[3m 'They're dead Jim. They're all dead!'\033[0;0m There are no Trees in the gene pool. You should archive your population and (q)uit."); self.fx_karoo_pause_refer() # 2019 06/07
		
	return
	

def fx_fitness_test_classify(self, result):

	'''
	Print the Precision-Recall and Confusion Matrix for a CLASSIFICATION run against the test data.
	From scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
		Precision (P) = true_pos / true_pos + false_pos
		Recall (R) = true_pos / true_pos + false_neg
		harmonic mean of Precision and Recall (F1) = 2(P x R) / (P + R)
		
	From scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
		y_pred = result, the predicted labels generated by Karoo GP
		y_true = solution, the true labels associated with the data
	Called by: fx_karoo_pause
	
	Arguments required: result
	'''
	
	for i in range(len(result['result'])):
		print ('\t\033[36m Data row {} predicts class:\033[1m {} ({} True)\033[0;0m\033[36m as {:.2f}{}\033[0;0m'.format(i, int(result['pred_labels'][0][i]), int(result['solution'][i]), result['result'][i], result['pred_labels'][1][i]))
		
	print ('\n Fitness score: {}'.format(result['fitness']))
	print ('\n Precision-Recall report:\n', skm.classification_report(result['solution'], result['pred_labels'][0]))
	print (' Confusion matrix:\n', skm.confusion_matrix(result['solution'], result['pred_labels'][0]))
	
	return
	
	
def fx_fitness_test_regress(self, result):

	'''
	Print the Fitness score and Mean Squared Error for a REGRESSION run against the test data.
	
	Called by: fx_karoo_pause
	
	Arguments required: result
	'''
	
	for i in range(len(result['result'])):
		print ('\t\033[36m Data row {} predicts value:\033[1m {:.2f} ({:.2f} True)\033[0;0m'.format(i, result['result'][i], result['solution'][i]))
		
	MSE, fitness = skm.mean_squared_error(result['result'], result['solution']), result['fitness']
	print ('\n\t Regression fitness score: {}'.format(fitness))
	print ('\t Mean Squared Error: {}'.format(MSE))
	
	return
	
	
def fx_fitness_test_match(self, result):

	'''
	Print the accuracy for a MATCH kernel run against the test data.
	
	Called by: fx_karoo_pause
	
	Arguments required: result
	'''
	
	for i in range(len(result['result'])):
		print ('\t\033[36m Data row {} predicts match:\033[1m {:.2f} ({:.2f} True)\033[0;0m'.format(i, result['result'][i], result['solution'][i]))
		
	print ('\n\tMatching fitness score: {}'.format(result['fitness']))
	
	return
	

# def fx_fitness_test_[other](self, result): # use others as a template
	

#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Construct the next Generation |
#+++++++++++++++++++++++++++++++++++++++++++++

def fx_nextgen_reproduce(self):

	'''
	Through tournament selection, a single Tree from the prior generation is copied without mutation to the next 
	generation. This is analogous to a member of the prior generation directly entering the gene pool of the 
	subsequent (younger) generation.
	
	Called by: fx_karoo_gp
	
	Arguments required: none
	'''
	
	if self.display != 's':
		if self.display == 'i': print ('')
		print ('  Perform', self.evolve_repro, 'Reproductions ...')
		if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07
		
	for n in range(self.evolve_repro): # quantity of Trees to be copied without mutation
		tourn_winner = self.fx_fitness_tournament(self.tourn_size) # perform tournament selection for each reproduction
		tourn_winner = self.fx_evolve_fitness_wipe(tourn_winner) # wipe fitness data
		self.population_b.append(tourn_winner) # append array to next generation population of Trees
		
	return
	

def fx_nextgen_point_mutate(self):

	'''
	Through tournament selection, a copy of a Tree from the prior generation mutates before being added to the 
	next generation. In this method, a single point is selected for mutation while maintaining function nodes as 
	functions (operators) and terminal nodes as terminals (variables). The size and shape of the Tree will remain 
	identical.
	
	Called by: fx_karoo_gp
	
	Arguments required: none
	'''
	
	if self.display != 's':
		if self.display == 'i': print ('')
		print ('  Perform', self.evolve_point, 'Point Mutations ...')
		if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07
		
	for n in range(self.evolve_point): # quantity of Trees to be generated through mutation
		tourn_winner = self.fx_fitness_tournament(self.tourn_size) # perform tournament selection for each mutation
		tourn_winner, node = self.fx_evolve_point_mutate(tourn_winner) # perform point mutation; return single point for record keeping
		self.population_b.append(tourn_winner) # append array to next generation population of Trees
					
	return
	

def fx_nextgen_branch_mutate(self):

	'''
	Through tournament selection, a copy of a Tree from the prior generation mutates before being added to the 
	next generation. Unlike Point Mutation, in this method an entire branch is selected. If the evolutionary run is 
	designated as Full, the height of the Tree will remain identical, while the shape may change due to the various 
	arities of operators. 	If the evolutionary run is designated as Grow or Ramped Half/Half, the size and shape of the 
	Tree may grow smaller or larger, but it may not exceed tree_depth_max as defined by the user.
	
	Called by: fx_karoo_gp
	
	Arguments required: none
	'''
	
	if self.display != 's':
		if self.display == 'i': print ('')
		print ('  Perform', self.evolve_branch, 'Branch Mutations ...')
		if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07
		
	for n in range(self.evolve_branch): # quantity of Trees to be generated through mutation
		tourn_winner = self.fx_fitness_tournament(self.tourn_size) # perform tournament selection for each mutation
		branch = self.fx_evolve_branch_select(tourn_winner) # select point of mutation and all nodes beneath
		
		# TEST & DEBUG: comment the top or bottom to force all Full or all Grow methods
		
		if tourn_winner[1][1] == 'f': # perform Full method mutation on 'tourn_winner'
			tourn_winner = self.fx_evolve_full_mutate(tourn_winner, branch)
			
		elif tourn_winner[1][1] == 'g': # perform Grow method mutation on 'tourn_winner'
			tourn_winner = self.fx_evolve_grow_mutate(tourn_winner, branch)
			
		self.population_b.append(tourn_winner) # append array to next generation population of Trees
		
	return
	

def fx_nextgen_crossover(self):

	'''
	Through tournament selection, two trees are selected as parents to produce two offspring. Within each parent 
	Tree a branch is selected. Parent A is copied, with its selected branch deleted. Parent B's branch is then 
	copied to the former location of Parent A's branch and inserted (grafted). The size and shape of the child 
	Tree may be smaller or larger than either of the parents, but may not exceed 'tree_depth_max' as defined 
	by the user.
	
	This process combines genetic code from two parent Trees, both of which were chosen by the tournament process 
	as having a higher fitness than the average population. Therefore, there is a chance their offspring will 
	provide an improvement in total fitness. In most GP applications, Crossover is the most commonly applied 
	evolutionary operator (~70-80%).
	
	For those who like to watch, select 'db' (debug mode) at the launch of Karoo GP or at any (pause).
	
	Called by: fx_karoo_gp
	
	Arguments required: none
	'''
	
	if self.display != 's':
		if self.display == 'i': print ('')
		print ('  Perform', self.evolve_cross, 'Crossovers ...')
		if self.display == 'i': self.fx_karoo_pause_refer() # 2019 06/07
		
	#for n in range(self.evolve_cross / 2): # Python 2.7
	for n in range(self.evolve_cross // 2): # quantity of Trees to be generated through Crossover, accounting for 2 children each
		parent_a = self.fx_fitness_tournament(self.tourn_size) # perform tournament selection for 'parent_a'
		branch_a = self.fx_evolve_branch_select(parent_a) # select branch within 'parent_a', to copy to 'parent_b' and receive a branch from 'parent_b'
		
		parent_b = self.fx_fitness_tournament(self.tourn_size) # perform tournament selection for 'parent_b'
		branch_b = self.fx_evolve_branch_select(parent_b) # select branch within 'parent_b', to copy to 'parent_a' and receive a branch from 'parent_a'
		
		parent_c = np.copy(parent_a); branch_c = np.copy(branch_a) # else the Crossover mods affect the parent Trees, due to how Python manages '='
		parent_d = np.copy(parent_b); branch_d = np.copy(branch_b) # else the Crossover mods affect the parent Trees, due to how Python manages '='
		
		offspring_1 = self.fx_evolve_crossover(parent_a, branch_a, parent_b, branch_b) # perform Crossover
		self.population_b.append(offspring_1) # append the 1st child to next generation of Trees
		
		offspring_2 = self.fx_evolve_crossover(parent_d, branch_d, parent_c, branch_c) # perform Crossover
		self.population_b.append(offspring_2) # append the 2nd child to next generation of Trees
		
	return
	
	
#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Evolve a Population           |
#+++++++++++++++++++++++++++++++++++++++++++++

def fx_evolve_point_mutate(self, tree):

	'''
	Mutate a single point in any Tree (Grow or Full).
	
	Called by: fx_nextgen_point_mutate
	Arguments required: tree
	'''
	
	node = np.random.randint(1, len(tree[3])) # randomly select a point in the Tree (including root)
	if self.display == 'i': print ('\t\033[36m with', tree[5][node], 'node\033[1m', tree[3][node], '\033[0;0m\033[36mchosen for mutation\n\033[0;0m')
	elif self.display == 'db': print ('\n\n\033[33m *** Point Mutation *** \033[0;0m\n\n\033[36m This is the unaltered tourn_winner:\033[0;0m\n', tree)
	
	if tree[5][node] == 'root':
		rnd = np.random.randint(0, len(self.functions[:,0])) # call the previously loaded .csv which contains all operators
		while (int(self.functions[rnd][1]) != int(tree[8][node])) :
			rnd = np.random.randint(0, len(self.functions[:,0])) # keep looking for suitable arity
			
		tree[6][node] = self.functions[rnd][0] # replace function (operator)
		
	elif tree[5][node] == 'func':
		rnd = np.random.randint(0, len(self.functions[:,0])) # call the previously loaded .csv which contains all operators
		while (int(self.functions[rnd][1]) != int(tree[8][node])) :
			rnd = np.random.randint(0, len(self.functions[:,0])) # keep looking for suitable arity
			
		tree[6][node] = self.functions[rnd][0] # replace function (operator)
		
	elif tree[5][node] == 'term':
		rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals
		tree[6][node] = self.terminals[rnd] # replace terminal (variable)
	
	else: print ('\n\t\033[31m ERROR! In fx_evolve_point_mutate, node_type =', tree[5][node], '\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
	
	tree = self.fx_evolve_fitness_wipe(tree) # wipe fitness data
	
	if self.display == 'db': print ('\n\033[36m This is tourn_winner after node\033[1m', node, '\033[0;0m\033[36mmutation and updates:\033[0;0m\n', tree); self.fx_karoo_pause_refer() # 2019 06/07
	
	return tree, node # 'node' is returned only to be assigned to the 'tourn_trees' record keeping
	

def fx_evolve_full_mutate(self, tree, branch):

	'''
	Mutate a branch of a Full method Tree.
	
	The full mutate method receives a branch of identical height as the sub-tree being replaced. Therefore, a simple
	call to fx_evolve_branch_insert suffices.		rll 20210203
	
	Called by: fx_nextgen_branch_mutate
	
	Arguments required: tree, branch
	'''
	
	if self.display == 'db': print ('\n\n\033[33m *** Full Mutation: function to function *** \033[0;0m\n\n\033[36m This is the unaltered tourn_winner:\033[0;0m\n', tree)
	
	tree = self.fx_evolve_branch_insert(tree, branch) # insert new 'branch' at point of mutation 'branch_top' in tourn_winner 'tree'
	# because we already know the maximum depth of this branch, there is no need to prune after insertion
			
	tree = self.fx_evolve_fitness_wipe(tree) # wipe fitness data

	if self.display == 'db': print ('\n\033[36m This is tourn_winner after nodes\033[1m', branch, '\033[0;0m\033[36mwere mutated and updated:\033[0;0m\n', tree); self.fx_karoo_pause_refer() # 2019 06/07
	
	return tree
	

def fx_evolve_grow_mutate(self, tree, branch):

	'''
	Mutate a branch of a Grow method Tree.
	
	A branch is selected within a given tree.
	
	If the point of mutation ('branch_top') resides at 'tree_depth_max', we do not need to grow a new tree. As the 
	methods for building trees always assume root (node 0) to be a function, we need only mutate this terminal node
	to another terminal node, and this branch mutate method is complete.
	If the top of that branch is a terminal which does not reside at 'tree_depth_max', then it may either remain a 
	terminal (in which case a new value is randomly assigned) or it may mutate into a function. If it becomes a 
	function, a new branch (mini-tree) is generated to be appended to that node's current location. The same is true 
	for function-to-function mutation. Either way, the new branch will be only as deep as allowed by the distance 
	from it's branch_top to the bottom of the tree.
	
	If however a function mutates into a terminal, the entire branch beneath the function is deleted from the array
	and the entire array is updated, to fix parent/child links, associated arities, and node IDs.
	
	Called by: fx_nextgen_branch_mutate
	
	Arguments required: tree, branch		
	'''
	
	branch_top = int(branch[0]) # replaces 2 instances, below; tested 2016 07/09
	branch_depth = self.tree_depth_max - int(tree[4][branch_top]) # 'tree_depth_max' - depth at 'branch_top' to set max potential size of new branch - 2016 07/10
	
	if branch_depth < 0: # this has never occured ... yet
		print ('\n\t\033[31m ERROR! In fx_evolve_grow_mutate: branch_depth', branch_depth, '< 0'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
		
	elif branch_depth == 0: # the point of mutation ('branch_top') chosen resides at the maximum allowable depth, so mutate term to term
	
		if self.display == 'i': print ('\t\033[36m max depth branch node\033[1m', tree[3][branch_top], '\033[0;0m\033[36mmutates from \033[1mterm\033[0;0m \033[36mto \033[1mterm\033[0;0m\n')
		if self.display == 'db': print ('\n\n\033[33m *** Grow Mutation: terminal to terminal *** \033[0;0m\n\n\033[36m This is the unaltered tourn_winner:\033[0;0m\n', tree)
		
		rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals
		tree[6][branch_top] = self.terminals[rnd] # replace terminal (variable)

		if self.display == 'db': print ('\n\033[36m This is tourn_winner after terminal\033[1m', branch_top, '\033[0;0m\033[36mmutation, branch deletion, and updates:\033[0;0m\n', tree); self.fx_karoo_pause_refer() # 2019 06/07
		
	else: # the point of mutation ('branch_top') chosen is at least one depth from the maximum allowed
	
		# type_mod = '[func or term]' # TEST & DEBUG: force to 'func' or 'term' and comment the next 3 lines
		rnd = np.random.randint(2)
		if rnd == 0: type_mod = 'func' # randomly selected as Function
		elif rnd == 1: type_mod = 'term' # randomly selected as Terminal
		
		if type_mod == 'term': # mutate 'branch_top' to a terminal and delete all nodes beneath (no subsequent nodes are added to this branch)
			
			if self.display == 'i': print ('\t\033[36m branch node\033[1m', tree[3][branch_top], '\033[0;0m\033[36mmutates from\033[1m', tree[5][branch_top], '\033[0;0m\033[36mto\033[1m term \n\033[0;0m')
			if self.display == 'db': print ('\n\n\033[33m *** Grow Mutation: branch_top to terminal *** \033[0;0m\n\n\033[36m This is the unaltered tourn_winner:\033[0;0m\n', tree)
			
			rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals
			tree[5][branch_top] = 'term' # replace type ('func' to 'term' or 'term' to 'term')
			tree[6][branch_top] = self.terminals[rnd] # replace label
			
			tree = np.delete(tree, branch[1:], axis = 1) # delete all nodes beneath point of mutation ('branch_top')
			tree = self.fx_evolve_node_arity_fix(tree) # fix all node arities
			tree = self.fx_evolve_child_link_fix(tree) # fix all child links
			tree = self.fx_evolve_node_renum(tree) # renumber all 'NODE_ID's
			
			if self.display == 'db': print ('\n\033[36m This is tourn_winner after terminal\033[1m', branch_top, '\033[0;0m\033[36mmutation, branch deletion, and updates:\033[0;0m\n', tree); self.fx_karoo_pause_refer() # 2019 06/07
			
		
		if type_mod == 'func': # mutate 'branch_top' to a function (a new 'gp.tree' will be copied, node by node, into 'tourn_winner')
			
			if self.display == 'i': print ('\t\033[36m branch node\033[1m', tree[3][branch_top], '\033[0;0m\033[36mmutates from\033[1m', tree[5][branch_top], '\033[0;0m\033[36mto\033[1m func \n\033[0;0m')
			if self.display == 'db': print ('\n\n\033[33m *** Grow Mutation: branch_top to function *** \033[0;0m\n\n\033[36m This is the unaltered tourn_winner:\033[0;0m\n', tree)
			
			self.fx_init_tree_build('mutant', self.pop_tree_type, branch_depth) # build new Tree ('gp.tree') with a maximum depth which matches 'branch'
			
			if self.display == 'db': print ('\n\033[36m This is the new Tree to be inserted at node\033[1m', branch_top, '\033[0;0m\033[36min tourn_winner:\033[0;0m\n', self.tree); self.fx_karoo_pause_refer() # 2019 06/07
			
			tree = self.fx_evolve_branch_insert(tree, branch) # insert new 'branch' at point of mutation 'branch_top' in tourn_winner 'tree'
			# because we already know the maximum depth to which this branch can grow, there is no need to prune after insertion
			
	tree = self.fx_evolve_fitness_wipe(tree) # wipe fitness data
	
	return tree
	

def fx_evolve_crossover(self, parent, branch_x, offspring, branch_y):

	'''
	Refer to the method fx_nextgen_crossover() for a full description of the genetic operator Crossover.
	
	This method is called twice to produce 2 offspring per pair of parent Trees. Note that in the method 
	'karoo_fx_crossover' the parent/branch relationships are swapped from the first run to the second, such that 
	this method receives swapped components to produce the alternative offspring. Therefore 'parent_b' is first 
	passed to 'offspring' which will receive 'branch_a'. With the second run, 'parent_a' is passed to 'offspring' which 
	will receive 'branch_b'.
	
	Called by: fx_nextgen_crossover
	
	Arguments required: parent, branch_x, offspring, branch_y (parents_a / _b, branch_a / _b from fx_nextgen_crossover()
	'''
	
	crossover = int(branch_x[0]) # pointer to the top of the 1st parent branch passed from fx_nextgen_crossover()
	branch_top = int(branch_y[0]) # pointer to the top of the 2nd parent branch passed from fx_nextgen_crossover()
	
	if self.display == 'db': print ('\n\n\033[33m *** Crossover *** \033[0;0m')
	
	if len(branch_x) == 1: # if the branch from the parent contains only one node (terminal)
	
		if self.display == 'i': print ('\t\033[36m  terminal crossover from \033[1mparent', parent[0][1], '\033[0;0m\033[36mto \033[1moffspring', offspring[0][1], '\033[0;0m\033[36mat node\033[1m', branch_top, '\033[0;0m')
		
		if self.display == 'db':
			print ('\n\033[36m In a copy of one parent:\033[0;0m\n', offspring)
			print ('\n\033[36m ... we remove nodes\033[1m', branch_y, '\033[0;0m\033[36mand replace node\033[1m', branch_top, '\033[0;0m\033[36mwith a terminal from branch_x\033[0;0m'); self.fx_karoo_pause_refer() # 2019 06/07
			
		offspring[5][branch_top] = 'term' # replace type
		offspring[6][branch_top] = parent[6][crossover] # replace label with that of a particular node in 'branch_x'
		offspring[8][branch_top] = 0 # set terminal arity
		
		offspring = np.delete(offspring, branch_y[1:], axis = 1) # delete all nodes beneath point of mutation ('branch_top')
		offspring = self.fx_evolve_child_link_fix(offspring) # fix all child links
		offspring = self.fx_evolve_node_renum(offspring) # renumber all 'NODE_ID's
		
		if self.display == 'db': print ('\n\033[36m This is the resulting offspring:\033[0;0m\n', offspring); self.fx_karoo_pause_refer() # 2019 06/07
		
	
	else: # we are working with a branch from 'parent' >= depth 1 (min 3 nodes)
	
		if self.display == 'i': print ('\t\033[36m  branch crossover from \033[1mparent', parent[0][1], '\033[0;0m\033[36mto \033[1moffspring', offspring[0][1], '\033[0;0m\033[36mat node\033[1m', branch_top, '\033[0;0m')
		
		# self.fx_init_tree_build('test', 'f', 2) # TEST & DEBUG: disable the next 'self.tree ...' line
		self.tree = self.fx_evolve_branch_copy(parent, branch_x) # generate stand-alone 'gp.tree' with properties of 'branch_x'
		
		if self.display == 'db':
			print ('\n\033[36m From one parent:\033[0;0m\n', parent)
			print ('\n\033[36m ... we copy branch_x\033[1m', branch_x, '\033[0;0m\033[36mas a new, sub-tree:\033[0;0m\n', self.tree); self.fx_karoo_pause_refer() # 2019 06/07
			
		if self.display == 'db':
			print ('\n\033[36m ... and insert it into a copy of the second parent in place of the selected branch\033[1m', branch_y,':\033[0;0m\n', offspring); self.fx_karoo_pause_refer() # 2019 06/07
			
		offspring = self.fx_evolve_branch_insert(offspring, branch_y) # insert new 'branch_y' at point of mutation 'branch_top' in tourn_winner 'offspring'
		offspring = self.fx_evolve_tree_prune(offspring, self.tree_depth_max) # prune to the max Tree depth + adjustment - tested 2016 07/10
		
	offspring = self.fx_evolve_fitness_wipe(offspring) # wipe fitness data
	
	return offspring
	

def fx_evolve_branch_select(self, tree):

	'''
	Select all nodes in the 'tourn_winner' Tree at and below the randomly selected starting point.
	
	While Grow mutation uses this method to select a region of the 'tourn_winner' to delete, Crossover uses this 
	method to select a region of the 'tourn_winner' which is then converted to a stand-alone tree. As such, it is 
	imperative that the nodes be in the correct order, else all kinds of bad things happen.
	
	Called by: fx_nextgen_branch, fx_nextgen_crossover
	
	Arguments required: tree
	'''
	
	branch = np.array([]) # the array is necessary in order to len(branch) when 'branch' has only one element
	branch_top = np.random.randint(2, len(tree[3])) # randomly select a non-root node
	branch_eval = self.fx_eval_id(tree, branch_top) # generate tuple of 'branch_top' and subseqent nodes
	branch_symp = sympify(branch_eval) # convert string into something useful
	branch = np.append(branch, branch_symp) # append list to array
	
	branch = np.sort(branch) # sort nodes in branch for Crossover.
	
	if self.display == 'i': print ('\t \033[36mwith nodes\033[1m', branch, '\033[0;0m\033[36mchosen for mutation\033[0;0m')
	
	# return branch per Antonio's fix 20210125
	return branch.astype(int)
	

def fx_evolve_branch_insert(self, tree, branch):

	'''
	This method enables the insertion of Tree in place of a branch. It works with 3 inputs: local 'tree' is being 
	modified; local 'branch' is a section of 'tree' which will be removed; and the global 'gp.tree' (recycling this 
	variable from initial population generation) is the new Tree to be insertd into 'tree', replacing 'branch'. 
	
	The end result is a Tree with a mutated branch. Pretty cool, huh?
	
	Called by: fx_evolve_grow_mutate, fx_evolve_grow_crossover
	
	Arguments required: tree, branch
	'''
	
	# *_branch_top_copy merged with *_body_copy 2018 04/12
	
	### PART 1 - insert branch_top from 'gp.tree' into 'tree' ###
	
	branch_top = int(branch[0])
	
	tree[5][branch_top] = 'func' # update type ('func' to 'term' or 'term' to 'term'); this modifies gp.tree[5][1] from 'root' to 'func'
	tree[6][branch_top] = self.tree[6][1] # copy node_label from new tree
	tree[8][branch_top] = self.tree[8][1] # copy node_arity from new tree
	
	tree = np.delete(tree, branch[1:], axis = 1) # delete all nodes beneath point of mutation ('branch_top')
	
	c_buffer = self.fx_evolve_c_buffer(tree, branch_top) # generate c_buffer for point of mutation ('branch_top')
	tree = self.fx_evolve_child_insert(tree, branch_top, c_buffer) # insert a single new node ('branch_top')
	tree = self.fx_evolve_node_renum(tree) # renumber all 'NODE_ID's

	if self.display == 'db':
		print ('\n\t ... inserted node 1 of', len(self.tree[3])-1)
		print ('\n\033[36m This is the Tree after a new node is inserted:\033[0;0m\n', tree); self.fx_karoo_pause_refer() # 2019 06/07
	
	
	### PART 2 - insert branch_body from 'gp.tree' into 'tree' ###
	
	node_count = 2 # set node count for 'gp.tree' to 2 as the new root has already replaced 'branch_top' (above)
	
	while node_count < len(self.tree[3]): # increment through all nodes in the new Tree ('gp.tree'), starting with node 2
	
		for j in range(1, len(tree[3])): # increment through all nodes in tourn_winner ('tree')
		
			if self.display == 'db': print ('\tScanning tourn_winner node_id:', j)
			
			if tree[5][j] == '':					
				tree[5][j] = self.tree[5][node_count] # copy 'node_type' from branch to tree
				tree[6][j] = self.tree[6][node_count] # copy 'node_label' from branch to tree
				tree[8][j] = self.tree[8][node_count] # copy 'node_arity' from branch to tree
				
				if tree[5][j] == 'term':
					tree = self.fx_evolve_child_link_fix(tree) # fix all child links
					tree = self.fx_evolve_node_renum(tree) # renumber all 'NODE_ID's
					
				if tree[5][j] == 'func':
					c_buffer = self.fx_evolve_c_buffer(tree, j) # generate 'c_buffer' for point of mutation ('branch_top')
					tree = self.fx_evolve_child_insert(tree, j, c_buffer) # insert new nodes
					tree = self.fx_evolve_child_link_fix(tree) # fix all child links
					tree = self.fx_evolve_node_renum(tree) # renumber all 'NODE_ID's
					
				if self.display == 'db':
					print ('\n\t ... inserted node', node_count, 'of', len(self.tree[3])-1)
					print ('\n\033[36m This is the Tree after a new node is inserted:\033[0;0m\n', tree); self.fx_karoo_pause_refer() # 2019 06/07
					
				node_count = node_count + 1 # exit loop when 'node_count' reaches the number of columns in the array 'gp.tree'
						
	return tree
	

def fx_evolve_branch_copy(self, tree, branch):

	'''
	This method prepares a stand-alone Tree as a copy of the given branch.
	
	Called by: fx_evolve_crossover
	
	Arguments required: tree, branch
	'''
	
	new_tree = np.array([ ['TREE_ID'],['tree_type'],['tree_depth_base'],['NODE_ID'],['node_depth'],['node_type'],['node_label'],['node_parent'],['node_arity'],['node_c1'],['node_c2'],['node_c3'],['fitness'] ])
	
	# tested 2015 06/08
	for n in range(len(branch)):
	
		node = branch[n]
		branch_top = int(branch[0])
		
		TREE_ID = 'copy'
		tree_type = tree[1][1]
		tree_depth_base = int(tree[4][branch[-1]]) - int(tree[4][branch_top]) # subtract depth of 'branch_top' from the last in 'branch'
		NODE_ID = tree[3][node]
		node_depth = int(tree[4][node]) - int(tree[4][branch_top]) # subtract the depth of 'branch_top' from the current node depth
		node_type = tree[5][node]
		node_label = tree[6][node]
		node_parent = '' # updated by fx_evolve_parent_link_fix(), below
		node_arity = tree[8][node]
		node_c1 = '' # updated by fx_evolve_child_link_fix(), below
		node_c2 = ''
		node_c3 = ''
		fitness = ''
		
		new_tree = np.append(new_tree, [ [TREE_ID],[tree_type],[tree_depth_base],[NODE_ID],[node_depth],[node_type],[node_label],[node_parent],[node_arity],[node_c1],[node_c2],[node_c3],[fitness] ], 1)
		
	new_tree = self.fx_evolve_node_renum(new_tree)
	new_tree = self.fx_evolve_child_link_fix(new_tree)
	new_tree = self.fx_evolve_parent_link_fix(new_tree)
	new_tree = self.fx_data_tree_clean(new_tree)
	
	return new_tree
	

def fx_evolve_c_buffer(self, tree, node):

	'''
	This method serves the very important function of determining the links from parent to child for any given 
	node. The single, simple formula [parent_arity_sum + prior_sibling_arity - prior_siblings] perfectly determines 
	the correct position of the child node, already in place or to be inserted, no matter the depth nor complexity 
	of the tree.
	
	This method is currently called from the evolution methods, but will soon (I hope) be called from the first 
	generation Tree generation methods (above) such that the same method may be used repeatedly.
	Called by: fx_evolve_child_link_fix, fx_evolve_banch_top_copy, fx_evolve_branch_body_copy
	
	Arguments required: tree, node
	'''
	
	parent_arity_sum = 0
	prior_sibling_arity = 0
	prior_siblings = 0
	
	for n in range(1, len(tree[3])): # increment through all nodes (exclude 0) in array 'tree'
	
		if int(tree[4][n]) == int(tree[4][node])-1: # find parent nodes at the prior depth
			if tree[8][n] != '': parent_arity_sum = parent_arity_sum + int(tree[8][n]) # sum arities of all parent nodes at the prior depth
			
		if int(tree[4][n]) == int(tree[4][node]) and int(tree[3][n]) < int(tree[3][node]): # find prior siblings at the current depth
			if tree[8][n] != '': prior_sibling_arity = prior_sibling_arity + int(tree[8][n]) # sum prior sibling arity
			prior_siblings = prior_siblings + 1 # sum quantity of prior siblings
			
	c_buffer = node + (parent_arity_sum + prior_sibling_arity - prior_siblings) # One algo to rule the world!
	
	return c_buffer
	

def fx_evolve_child_link(self, tree, node, c_buffer):

	'''
	Link each parent node to its children.
	Called by: fx_evolve_child_link_fix
	
	Arguments required: tree, node, c_buffer
	'''
	
	if int(tree[3][node]) == 1: c_buffer = c_buffer + 1 # if root (node 1) is passed through this method
	
	if tree[8][node] != '':
	
		if int(tree[8][node]) == 0: # if arity = 0
			tree[9][node] = ''
			tree[10][node] = ''
			tree[11][node] = ''
			
		elif int(tree[8][node]) == 1: # if arity = 1
			tree[9][node] = c_buffer
			tree[10][node] = ''
			tree[11][node] = ''
			
		elif int(tree[8][node]) == 2: # if arity = 2
			tree[9][node] = c_buffer
			tree[10][node] = c_buffer + 1
			tree[11][node] = ''
			
		elif int(tree[8][node]) == 3: # if arity = 3
			tree[9][node] = c_buffer
			tree[10][node] = c_buffer + 1
			tree[11][node] = c_buffer + 2
			
		else: print ('\n\t\033[31m ERROR! In fx_evolve_child_link: node', node, 'has arity', tree[8][node]); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
			
	return tree
	

def fx_evolve_child_link_fix(self, tree):

	'''
	In a given Tree, fix 'node_c1', 'node_c2', 'node_c3' for all nodes.
	
	This is required anytime the size of the array 'gp.tree' has been modified, as with both Grow and Full mutation.
	
	Called by: fx_evolve_grow_mutate, fx_evolve_crossover, fx_evolve_branch_body_copy, fx_evolve_branch_copy
	
	Arguments required: tree
	'''
	
	# tested 2015 06/04
	for node in range(1, len(tree[3])):
	
		c_buffer = self.fx_evolve_c_buffer(tree, node) # generate c_buffer for each node
		tree = self.fx_evolve_child_link(tree, node, c_buffer) # update child links for each node
		
	return tree
	

def fx_evolve_child_insert(self, tree, node, c_buffer):

	'''
	Insert child node into the copy of a parent Tree.
	
	Called by: fx_evolve_branch_insert
			
	Arguments required: tree, node, c_buffer
	'''
	
	if int(tree[8][node]) == 0: # if arity = 0
		print ('\n\t\033[31m ERROR! In fx_evolve_child_insert: node', node, 'has arity 0\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
	
	elif int(tree[8][node]) == 1: # if arity = 1
		tree = np.insert(tree, c_buffer, '', axis=1) # insert node for 'node_c1'
		tree[3][c_buffer] = c_buffer # node ID
		tree[4][c_buffer] = int(tree[4][node]) + 1 # node_depth
		tree[7][c_buffer] = int(tree[3][node]) # parent ID
		
	elif int(tree[8][node]) == 2: # if arity = 2
		tree = np.insert(tree, c_buffer, '', axis=1) # insert node for 'node_c1'
		tree[3][c_buffer] = c_buffer # node ID
		tree[4][c_buffer] = int(tree[4][node]) + 1 # node_depth
		tree[7][c_buffer] = int(tree[3][node]) # parent ID
		
		tree = np.insert(tree, c_buffer + 1, '', axis=1) # insert node for 'node_c2'
		tree[3][c_buffer + 1] = c_buffer + 1 # node ID
		tree[4][c_buffer + 1] = int(tree[4][node]) + 1 # node_depth
		tree[7][c_buffer + 1] = int(tree[3][node]) # parent ID
		
	elif int(tree[8][node]) == 3: # if arity = 3
		tree = np.insert(tree, c_buffer, '', axis=1) # insert node for 'node_c1'
		tree[3][c_buffer] = c_buffer # node ID
		tree[4][c_buffer] = int(tree[4][node]) + 1 # node_depth
		tree[7][c_buffer] = int(tree[3][node]) # parent ID
		
		tree = np.insert(tree, c_buffer + 1, '', axis=1) # insert node for 'node_c2'
		tree[3][c_buffer + 1] = c_buffer + 1 # node ID
		tree[4][c_buffer + 1] = int(tree[4][node]) + 1 # node_depth
		tree[7][c_buffer + 1] = int(tree[3][node]) # parent ID
		
		tree = np.insert(tree, c_buffer + 2, '', axis=1) # insert node for 'node_c3'
		tree[3][c_buffer + 2] = c_buffer + 2 # node ID
		tree[4][c_buffer + 2] = int(tree[4][node]) + 1 # node_depth
		tree[7][c_buffer + 2] = int(tree[3][node]) # parent ID
		
	else: print ('\n\t\033[31m ERROR! In fx_evolve_child_insert: node', node, 'arity > 3\033[0;0m'); self.fx_karoo_pause() # consider special instructions for this (pause) - 2019 06/08
	
	return tree
	

def fx_evolve_parent_link_fix(self, tree):

	'''
	In a given Tree, fix 'parent_id' for all nodes.
	
	This is automatically handled in all mutations except with Crossover due to the need to copy branches 'a' and 
	'b' to their own trees before inserting them into copies of	the parents.
	
	Technically speaking, the 'node_parent' value is not used by any methods. The parent ID can be completely out 
	of whack and the expression will work perfectly. This is maintained for the sole purpose of granting the user 
	a friendly, makes-sense interface which can be read in both directions.
	
	Called by: fx_evolve_branch_copy
	
	Arguments required: tree
	'''
	
	### THIS METHOD MAY NOT BE REQUIRED AS SORTING 'branch' SEEMS TO HAVE FIXED 'parent_id' ###
	
	# tested 2015 06/05
	for node in range(1, len(tree[3])):
	
		if tree[9][node] != '':
			child = int(tree[9][node])
			tree[7][child] = node
			
		if tree[10][node] != '':
			child = int(tree[10][node])
			tree[7][child] = node
			
		if tree[11][node] != '':
			child = int(tree[11][node])
			tree[7][child] = node
			
	return tree
	

def fx_evolve_node_arity_fix(self, tree):

	'''
	In a given Tree, fix 'node_arity' for all nodes labeled 'term' but with arity 2.
	
	This is required after a function has been replaced by a terminal, as may occur with both Grow mutation and 
	Crossover.
	Called by: fx_evolve_grow_mutate, fx_evolve_tree_prune
	
	Arguments required: tree
	'''
	
	# tested 2015 05/31
	for n in range(1, len(tree[3])): # increment through all nodes (exclude 0) in array 'tree'
	
		if tree[5][n] == 'term': # check for discrepency
			tree[8][n] = '0' # set arity to 0
			tree[9][n] = '' # wipe 'node_c1'
			tree[10][n] = '' # wipe 'node_c2'
			tree[11][n] = '' # wipe 'node_c3'
			
	return tree
	
	
def fx_evolve_node_renum(self, tree):

	'''
	Renumber all 'NODE_ID' in a given tree.
	
	This is required after a new generation is evolved as the NODE_ID numbers are carried forward from the previous 
	generation but are no longer in order.
	
	Called by: fx_evolve_grow_mutate, fx_evolve_crossover, fx_evolve_branch_insert, fx_evolve_branch_copy
	
	Arguments required: tree
	'''
	
	for n in range(1, len(tree[3])):
	
		tree[3][n] = n  # renumber all Trees in given population
		
	return tree
	

def fx_evolve_fitness_wipe(self, tree):

	'''
	Remove all fitness data from a given tree.
	
	This is required after a new generation is evolved as the fitness of the same Tree prior to its mutation will 
	no longer apply.
	
	Called by: fx_nextgen_reproduce, fx_nextgen_point_mutate, fx_nextgen_full_mutate, fx_nextgen_grow_mutate, fx_nextgen_crossover
	
	Arguments required: tree
	'''
	
	tree[12][1:] = '' # wipe fitness data
	
	return tree
	

def fx_evolve_tree_prune(self, tree, depth):

	'''
	This method reduces the depth of a Tree. Used with Crossover, the input value 'branch' can be a partial Tree 
	(branch) or a full tree, and it will operate correctly. The input value 'depth' becomes the new maximum depth,
	where depth is defined as the local maximum + the user defined adjustment.
	Called by: fx_evolve_crossover
	
	Arguments required: tree, depth
	'''
	
	nodes = []
	
	# tested 2015 06/08
	for n in range(1, len(tree[3])):
	
		if int(tree[4][n]) == depth and tree[5][n] == 'func':
			rnd = np.random.randint(0, len(self.terminals) - 1) # call the previously loaded .csv which contains all terminals
			tree[5][n] = 'term' # mutate type 'func' to 'term'
			tree[6][n] = self.terminals[rnd] # replace label
			
		elif int(tree[4][n]) > depth: # record nodes deeper than the maximum allowed Tree depth
			nodes.append(n)
			
		else: pass # as int(tree[4][n]) < depth and will remain untouched
		
	tree = np.delete(tree, nodes, axis = 1) # delete nodes deeper than the maximum allowed Tree depth
	tree = self.fx_evolve_node_arity_fix(tree) # fix all node arities
	
	return tree
	

def fx_evolve_pop_copy(self, pop_a, title):

	'''
	Copy one population to another.
	
	Simply copying a list of arrays generates a pointer to the original list. Therefore we must append each array 
	to a new, empty array and then build a list of those new arrays.
	
	Called by: fx_karoo_gp
	
	Arguments required: pop_a, title
	'''
	
	pop_b = [title] # an empty list stores a copy of the prior generation

	for tree in range(1, len(pop_a)): # increment through each Tree in the current population

		tree_copy = np.copy(pop_a[tree]) # copy each array in the current population
		pop_b.append(tree_copy) # add each copied Tree to the new population list
		
	return pop_b
	

#+++++++++++++++++++++++++++++++++++++++++++++
#   Methods to Visualize a Tree              |
#+++++++++++++++++++++++++++++++++++++++++++++		
		
def fx_display_tree(self, tree):

	'''
	Display all or part of a Tree on-screen.
	
	This method displays all sequential node_ids from 'start' node through bottom, within the given tree.
	
	Called by: fx_karoo_gp, fx_karoo_pause
	
	Arguments required: tree
	'''
	
	ind = ''
	print ('\n\033[1m\033[36m Tree ID', int(tree[0][1]), '\033[0;0m')
	
	for depth in range(0, self.tree_depth_max + 1): # increment through all possible Tree depths - tested 2016 07/09
		print ('\n', ind,'\033[36m Tree Depth:', depth, 'of', tree[2][1], '\033[0;0m')
		
		for node in range(1, len(tree[3])): # increment through all nodes (redundant, I know)
			if int(tree[4][node]) == depth:
				print ('')
				print (ind,'\033[1m\033[36m NODE:', tree[3][node], '\033[0;0m')
				print (ind,'  type:', tree[5][node])
				print (ind,'  label:', tree[6][node], '\tparent node:', tree[7][node])
				print (ind,'  arity:', tree[8][node], '\tchild node(s):', tree[9][node], tree[10][node], tree[11][node])
				
		ind = ind + '\t'
		
	print ('')
	self.fx_eval_poly(tree) # generate the raw and sympified expression for the entire Tree
	print ('\t\033[36mTree', tree[0][1], 'yields (raw):', self.algo_raw, '\033[0;0m')
	print ('\t\033[36mTree', tree[0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
	
	return
	

def fx_display_branch(self, tree, start):

	'''
	Display a Tree branch on-screen.
	
	This method displays all sequential node_ids from 'start' node through bottom, within the given branch.
	
	Called by: This method is not used by Karoo GP at this time.
	
	Arguments required: tree, start
	'''
	
	branch = np.array([]) # the array is necessary in order to len(branch) when 'branch' has only one element
	branch_eval = self.fx_eval_id(tree, start) # generate tuple of given 'branch'
	branch_symp = sympify(branch_eval) # convert string from tuple to list
	branch = np.append(branch, branch_symp) # append list to array
	ind = ''
	
	# for depth in range(int(tree[4][start]), int(tree[2][1]) + self.tree_depth_max + 1): # increment through all Tree depths - tested 2016 07/09
	for depth in range(int(tree[4][start]), self.tree_depth_max + 1): # increment through all Tree depths - tested 2016 07/09
		print ('\n', ind,'\033[36m Tree Depth:', depth, 'of', tree[2][1], '\033[0;0m')
		
		for n in range(0, len(branch)): # increment through all nodes listed in the branch
			node = branch[n]
			
			if int(tree[4][node]) == depth:
				print ('')
				print (ind,'\033[1m\033[36m NODE:', node, '\033[0;0m')
				print (ind,'  type:', tree[5][node])
				print (ind,'  label:', tree[6][node], '\tparent node:', tree[7][node])
				print (ind,'  arity:', tree[8][node], '\tchild node(s):', tree[9][node], tree[10][node], tree[11][node])
				
		ind = ind + '\t'
				
	print ('')
	self.fx_eval_poly(tree) # generate the raw and sympified expression for the entire Tree
	print ('\t\033[36mTree', tree[0][1], 'yields (raw):', self.algo_raw, '\033[0;0m')
	print ('\t\033[36mTree', tree[0][1], 'yields (sym):\033[1m', self.algo_sym, '\033[0;0m')
	
	return

====================

Add a machine-parsable `log_test.json`

Currently the results of the runs are saved into a log_test.txt file. If you want to access these values programmatically, you have to manually parse the file.

In addition to the human-parsable log_test.txt, we could add a machine-parsable log_test.json. This will be useful for both users that want to access the result, and for us during testing.

To match closely log_test.txt, the file could include the following fields:

launched: datetime (str)
dataset: path (str)
outcome: SUCCESS/FAILURE (or EXTINCTION) (str)
results: only in case of SUCCESS (object)
    fittest_tree: tree number (int)
    expression: sympified expression (str)
    fitness_score: c/r/m fitness score (float)
    # only for classification kernel
    precision_recall_report: (object)
    confusion_matrix: (array of arrays)
    # only for regression kernel
    mean_squared_error: (float)

We can also add more fields, including:

the initial arguments (kernel_type, tree_type, population, etc.)
the version of Karoo and possibly a log_test.json format version number
the path of the cwd and possibly the operators file
start_time/end_time/total_time
the raw_expression
an ERROR outcome and an error_msg field

We could also make the creation of this report optional, and triggered with a command line argument (e.g. something like --output-format json/txt/json,txt). This can be added later though -- for now it's easier to just produce both reports.

tf.map_fn in population.py problem

Problem with dtype in tf.map_fn (file population.py), dtype replaced with fn_output_signature

working format:

pred_labels = tf.map_fn(fx_fitness_labels_map,
result, fn_output_signature=(tf.int32, tf.string),
swap_memory=True)

Fitness value truncated to 22 characters

I have chased down a bug whereby the fitness values are silently truncated when stored (in string form) into tree[12][1]. This occurs in fx_fitness_store, viz.:

	def fx_fitness_store(self, tree, fitness):
	
		'''
		Records the fitness and length of the raw algorithm (multivariate expression) to the Numpy array. Parsimony can 
		be used to apply pressure to the evolutionary process to select from a set of trees with the same fitness function 
		the one(s) with the simplest (shortest) multivariate expression.
		
		Called by: fx_fitness_gym
		
		Arguments required: tree, fitness
		'''
	
		#fitness = float(fitness)	# The function is called with a float.
		fitness = round(fitness, self.precision)
		
		#tree[12][1] = fitness # store the fitness with each tree		## rll20210215 
		tree[12][1] = "{:.14e}".format(fitness)					## It seems that the string cannot exceed ~22 characters, else it gets truncated. Why?
		tree[12][2] = len(str(self.algo_raw)) # store the length of the raw algo for parsimony
		# if len(tree[3]) > 4: # if the Tree array is wide enough -- SEE SCRATCHPAD
		
		## debug rll20210215
		if abs(fitness - float(tree[12][1]))/fitness >= 1.0e-14 and not math.isnan(fitness) :
		  print ('Error case found. Tree number: ', tree[0][1])
		  print ('Fitness that was stored: ', fitness)
		  print ('Finess retrieved from tree: ', tree[12][1])
		  
		
		return

I applied a fix in the code snip above, but it is in the way of a kludge, since I do not understand where the string length for the numpy array is being set. I believe that happens in fx_init_tree_initialize, but I see no strings there with length of 22.
Does anyone have an idea about this?

function Inquiry

Good Day,

In the update of Karoo in 2017, it mentions the inclusion of >,<,<=,>=. However, the latest karoo GP does not seem to accept these functions.

Please excuse me if this question is a repeat, but I am finding it hard to run my project without such functions, will they be included?

Logic & Comparison

I understand how logic operators and comparison operators are used in the latest Karoo update, however, I was wondering if the a functionality to implement the statement below could be incorporated:

(a if (c>d) or (c>e) else f)

Currently, this is not possible, but this functionality is very important. Boolean algebra order of operations is NOT then AND then OR.

'Full' trees are not actually full

(This issue affects the current branch but is not recent. It also affects 4154711, which is from March 20)

Karoo supports 3 tree generation methods:

Full: Trees are symmetrical out to the maximum depth. For default max depth (3) this makes a total of 15 nodes.
Grow: Trees add nodes one-by-one, flipping a coin at each one to see whether to terminate or continue (grow) up to the maximum depth. So the number of nodes is always between 3 and 15.
Ramped 50/50: Model is initialized with 50% full trees and 50% grow trees.

In the full method, all of the trees are 15 nodes long as expected. However, in the ramped method (the default) with a population 100 (the default), <20 are 15 nodes (out of expected 50), and the rest shorter - I believe this is a bug.

To see for yourself: Run the default simulation, look at the logs in runs/[date]/population_a.csv, look at the size of the first generation. This issue affects the current branch, but it also affects 4154711 which is from March 20 (as far back as I checked)

My latest update has a different api for generating trees, so the logic on making populations is new, so 50/100 are actually 15 nodes long. For this reason, the outputs are different, and the test doesn't pass.

Install process - Other

Thanks for your hard work.

AS TF, Scikit, packages are evolving a lot, it would be useful to document:
For each release of KAROO_GP:
Dependency list (TF 1.0.1, .....) of Python packages.
Regression test in a folder test_regression:
input_data
test_01.py

We can check install easily as well as the results.

We may want to integrate into other python data processing/code.
is there a way to import karoo as a package ?
import karoo as ka
For some problems, we may want to limit the operation list
(ex: cyclical data, we want to remove sin and keep only keep cos(a*t+b) )
Is there a way to add a config file than karoo can read (so we don't have to change the core source).
What about custom primitive ?
It may help for complex modelling and reducing tree depth.
Operator Primitive: exp( -a *x**2)

Ex: Multi-period data (5 <> period) and we split into medium period to simplify the problem.

evaluate a single Tree against the test data - type error

Hello
I got this package working with tensorflow and have started testing out the various features.
So far I really love the library, but I found an error trying to test a particular tree from the desktop ui.
Not sure what the issue is, but raising it here.

Here's the details.

I installed for linux, on centos.
I am using tensorflow-gpu. Version 1.5.0
I installed cudnn-9.1-linux-x64-v7.tgz

check my version of tensorflow:
[karoo_gp]$ python -c 'import tensorflow as tf; print(tf.version)'
1.5.0

[karoo_gp]$ python karoo_gp_main.py files/data_CLASSIFY.csv

I select classification, and run with defaults until model completes
after the model is done, I select "t" and try to test a model
the following error is raised:

Karoo GP has an ellapsed time of 42.032015

"It is not the strongest of the species that survive, nor the most intelligent,
but the one most responsive to change." --Charles Darwin

Congrats! Your multi-generational Karoo GP run is complete.

Type ? to review your options or q to quit.

 (pause) ?

Select one of the following options:
 i 	 Interactive display mode
 m 	 Minimal display mode
 g 	 Generation display mode
 s 	 Silent display mode
 db 	 De-Bug display mode

 ts 	 adjust the tournament size
 min 	 adjust the minimum number of nodes
 bal 	 adjust the balance of genetic operators

 l 	 list Trees with leading fitness scores
 t 	 evaluate a single Tree against the test data

 p 	 print a single Tree to screen
 id 	 display the current generation ID
 pop 	 list all Trees in current population
 dir 	 display current working directory

 cont 	 continue evolution, starting with the current population
 load 	 load population_s (seed) to replace population_a (current)
 w 	 write the evolving population_b to disk
 q 	 quit Karoo GP without saving population_b

 Remember to archive your final population before your next run!

 (pause) t

 Select a Tree in population_b to test: 8

Traceback (most recent call last):
File "karoo_gp_main.py", line 255, in
gp.fx_karoo_eol()
File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 884, in fx_karoo_eol
self.fx_karoo_pause(1)
File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 577, in fx_karoo_pause
if eol == 1: self.fx_karoo_pause(1) # return to pause menu as the GP run is complete
File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 723, in fx_karoo_pause
result = self.fx_fitness_eval(expr, self.data_test, get_labels=True)
File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1463, in fx_fitness_eval
if get_labels: labels = tf.map_fn(self.fx_fitness_labels_map, result, dtype=[tf.int32, tf.string], swap_memory=True)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 409, in map_fn
swap_memory=swap_memory)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2934, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2720, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2662, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 400, in compute
nest.assert_same_structure(dtype or elems, packed_fn_values)
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/nest.py", line 197, in assert_same_structure
_recursive_assert_same_structure(nest1, nest2, check_types)
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/nest.py", line 156, in _recursive_assert_same_structure
% (type_nest1, type_nest2))
TypeError: The two structures don't have the same sequence type. First structure has type <type 'list'>, while second structure has type <type 'tuple'>.

On unary operators

Kai and I have had an off-line discussion along the following lines. I want to publish my thinking before submitting code changes.

The main branch insists on binarizing unary operators. We do this by prefixing cos(x), say, by +,-,*,/. If we want balance among the four arithmetic primitives, then all four variations must appear in the operators_XXX.csv file.

This seems to me unnecessary. Two changes are required in fx_eval_label():
1. Treat unary operators as left prefixes. Emit op+'('+arg+')'. Walk this node pre-order.
2. Parenthesize the arguments of (infix) binary operators. Emit '('+arg1+')+op+'('+arg2+')'. Walk this node in-order.

There are knock-on effects elsewhere in the code. These arise from the assumption that all operators are binary. I have found most (all?) of them. Fixing them permits unary minus also to work as expected.

The ternary case (if/then/else) eludes me. I know of no equivalent syntax that sympy.sympify will accept. I propose replacing this code with an easily interpretable error message.

user guide not searchable with mobile device

the fact this user guide is exclusively hosted on github makes it impossible to open the file in the mobile browser and thus it is impossible to search without downloading it onto your device and opening it in an app.

Would it be possible to host this pdf somewhere and add a link to the pdf in the readme?

generation of expression that can't be evaluated

I have discovered a small issue while running a classification task using the desktop tool.
After having run large jobs for 15hours plus without issue, I decided to expand on the operators I'm using. After having done so, I'm getting an odd error. For some reason, the mutation function created a symbolic function operator called "zoo" which then failed to evaluate, throwing an error.

I managed to run it and get the same error again, and to capture the output. I think all the details to review the bug are below, but I am happy to provide the training file too if needed. A dump of the error is below, then I suggest some things I spotted digging into the code:

 9 trees [ 1  2  8 10 16 28 39 52 97] offer the highest fitness scores.

	 (pause) l

	 The leading Trees and their associated expressions are:
	  1 : T2*T21 + T26*T9
	  2 : T2*T21 + T26*T9
	  8 : log(T8) + 0.5
	  10 : Ez26 + log(Ez16)
	  16 : log(Ez11) + log(Ez20) + sign(Ez26) + 0.5
	  28 : log(Ez11) + log(Ez20) + sign(Ez7) + sign(T18) + 0.5
	  39 : log(Ez11) + log(Ez20) + cos(Ez26)*sign(Ez35)*sign(T7) + sign(T23) + 0.5
	  52 : Ez31 + log(Ez20) + sin(Ez11)
	  97 : sqrt(Ez31) + log(Ez20) + sin(Ez11) + 0.1

	 (pause) 

 Copy gp.population_b to gp.population_a


 Evolve a population of Trees for Generation 4 ...
  Perform 10 Reproductions ...
  Perform 0 Point Mutations ...
  Perform 20 Full or Grow Mutations ...
  Perform 70 Crossovers ...

 Evaluate all Trees in Generation 4
	Tree 1 yields (sym): sqrt(Ez31) + log(Ez20) + sin(Ez11) + 0.1 
	Tree 2 yields (sym): Ez5 + log(Ez25) 
	Tree 3 yields (sym): log(Ez11) + 2*log(Ez20) + sign(Ez11) + sign(Ez7) + 0.5 
	Tree 4 yields (sym): log(Ez20) + 0.5 
	Tree 5 yields (sym): log(Ez11) + log(Ez20) + cos(Ez26)*sign(Ez35)*sign(T7) + sign(T23) + 0.5 
	Tree 6 yields (sym): T9*sign(T26) + log(Ez11) + log(Ez20) + sign(T23) + 0.5 
	Tree 7 yields (sym): log(Ez20) + log(T9) + sign(Ez7) + sign(T23) + 0.5 
	Tree 8 yields (sym): Ez4 + sign(Ez7) + sign(T23) + sin(4)/Ez8 
	Tree 9 yields (sym): Ez31 + log(Ez20) + sin(Ez11) 
	Tree 10 yields (sym): log(Ez20) + log(T9) + sign(Ez7) + sign(T23) + 0.5 
	Tree 11 yields (sym): -Ez17 + 2*log(Ez20) + sign(Ez11) + sign(Ez28) + sign(Ez7) + 0.5 + 10.0*log(T26)/sqrt(Ez10) 
	Tree 12 yields (sym): log(Ez21) + sin(Ez23) + 0.5 
	Tree 13 yields (sym): Ez5 + log(Ez11) + log(Ez20) + sin(T33) + sign(Ez7) + sign(T23) 
	Tree 14 yields (sym): log(Ez11) + log(T10) + cos(Ez26)*sign(Ez35)*sign(T7) + sign(T23) + 0.5 
	Tree 15 yields (sym): Ez5*sign(Ez22) + T16 + log(Ez4)/sign(Ez13) 
	Tree 16 yields (sym): log(Ez11) + log(Ez20) + sign(T14) + 0.5 
	Tree 17 yields (sym): log(Ez11) + log(Ez20) + sign(Ez51) + 0.5 
	Tree 18 yields (sym): Ez24 + log(Ez16) 
	Tree 19 yields (sym): T19 + abs(T7)*sign(Ez7) + log(Ez20) + sign(T23) 
	Tree 20 yields (sym): log(Ez11) + log(Ez20) + cos(Ez26)*sign(T35)*sign(T7) + sign(T23) + 0.5 
	Tree 21 yields (sym): Ez54*log(Ez48)*tan(T30)/sqrt(Ez9) + log(Ez20) + sign(T5) + 0.5 
	Tree 22 yields (sym): abs(Ez5) + zoo*log(Ez12) + log(Ez4) + 0.5 
                                      ^^^^^^^^^^
Traceback (most recent call last):
  File "karoo_gp_main.py", line 251, in <module>
    gp.fx_eval_generation() # evaluate all Trees in a single generation
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1291, in fx_eval_generation
    self.fx_fitness_gym(self.population_b) # run 'fx_eval', 'fx_fitness', 'fx_fitness_store', and fitness record
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1339, in fx_fitness_gym
    result = self.fx_fitness_eval(expr, self.data_train)
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1424, in fx_fitness_eval
    result = self.fx_fitness_expr_parse(expr, tensors)
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1517, in fx_fitness_expr_parse
    return self.fx_fitness_node_parse(tree, tensors)
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1567, in fx_fitness_node_parse
    return operators[type(node.op)](self.fx_fitness_node_parse(node.left, tensors), self.fx_fitness_node_parse(node.right, tensors))
  File "/home/andrew/gep/karoo_gp/karoo_gp_base_class.py", line 1560, in fx_fitness_node_parse
    return tensors[node.id]
KeyError: 'zoo'
[andrew@srv02 karoo_gp]$ head -1  files/Karoo_dp_30_train_all_stocks_coef.csv 
T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,T23,T24,T25,T26,T27,T28,T29,T30,T31,T32,T33,T34,T35,Ez1,Ez2,Ez3,Ez4,Ez5,Ez6,Ez7,Ez8,Ez9,Ez10,Ez11,Ez12,Ez13,Ez14,Ez15,Ez16,Ez17,Ez18,Ez19,Ez20,Ez21,Ez22,Ez23,Ez24,Ez25,Ez26,Ez27,Ez28,Ez29,Ez30,Ez31,Ez32,Ez33,Ez34,Ez35,Ez36,Ez37,Ez38,Ez39,Ez40,Ez41,Ez42,Ez43,Ez44,Ez45,Ez46,Ez47,Ez48,Ez49,Ez50,Ez51,Ez52,Ez53,Ez54,0.1,0.2,0.3,0.4,0.5,1,2,3,4,5,s
[andrew@srv02 karoo_gp]$ ls files/
coefficients.csv   data_MATCH.csv  data_REGRESS.csv  Karoo_dp_30_train_all_stocks_coef.csv  operators_MATCH.csv  operators_REGRESS.csv
data_CLASSIFY.csv  data_PLAY.csv   Iris_dataset      operators_CLASSIFY.csv                 operators_PLAY.csv   templates
[andrew@srv02 karoo_gp]$ cat files/operators_CLASSIFY.csv
operator, arity
+,2
-,2
*,2
/,2
+,2
-,2
*,2
/,2
-,2
*,2
/,2
+,2
-,2
*,2
/,2
+ abs,2
- abs,2
* abs,2
/ abs,2
+ log,2
- log,2
* log,2
/ log,2
+ sign,2
- sign,2
* sign,2
/ sign,2
+ sqrt,2
- sqrt,2
* sqrt,2
/ sqrt,2
+ sin,2
- sin,2
* sin,2
/ sin,2
+ tan,2
- tan,2
* tan,2
/ tan,2
+ cos,2
- cos,2
* cos,2
/ cos,2
[andrew@srv02 karoo_gp]$ python karoo_gp_main.py files/Karoo_dp_30_train_a^C
[andrew@srv02 karoo_gp]$ cat karoo_gp_main.py | grep zoo
[andrew@srv02 karoo_gp]$ #

What I think may be going on, is that the translation of the raw tree of operations into a simpler symbolic representation (which happens in sympy from what I can gather), is evaluating a randomly generated expression as one representing a "complex infinity," which it names as the symbol "zoo".

There are some notes in the sympy docs about zoo here:
http://docs.sympy.org/latest/modules/core.html

scroll down till you see:

class sympy.core.numbers.ComplexInfinity[source]
Complex infinity.

In complex analysis the symbol ∞̃ ∞~, called “complex infinity”, represents a quantity with infinite magnitude, but undetermined complex phase.

ComplexInfinity is a singleton, and can be accessed by S.ComplexInfinity, or can be imported as zoo.

See also Infinity

Perhaps simply killing off mutations that are non-viable prior to evaluation could be a quick fix. It would mean filtering out symbolic trees that have strings found in a quarantine list, then running fitness evaluations on the filtered list only. Open to other suggestions too.

Andrew

Fix `-h/--help` in `karoo-gp.py`

Currently running karoo-gp.py starts the interactive mode without printing any help. This is because of this line:

karoo_gp/karoo-gp.py

Line 72 in 76613da

 if len(sys.argv) < 3: # either no command line argument, or only a filename is provided 

This should be refactored to use argparse from the start, and not just in the else to parse the cmd-line args.

Get Sklearn baseclass working with Regression

How to introduce "if" condition to Karoo? (+Approach)

How can the 'if-condition' be introduced to Karoo without problems with sympy, ast or tensorflow?

I am currently developing a special gp-framework based on Karoo which shall include if-then-else. As it is already partially included, maybe you can help me out.

Problem

Closure
I assume a conversion between numbers and bool is prefered. (positive values (>) are True, True is integer 1), etc. .
(Is this better than introducing 'bool'-operators?)

Tensorflow:
If-then-else needs to become a tensor.

Sympify does not support if (a) then (b) else (c). (and ITE() is bullshit)
Solutions:

Get rid of sympify.
Yeah, obviously not.
Using Piecewise expression: Piecewise((expr, cond),(expr, cond), ...), converted to if-then-else: Piecewise((b, a),(c, True))
-> Is imho too far away from the original if-then-else statement
-> Tensorflow seemingly offers no 'Piecewise'-similar tensor
Passing through custom expressions like ifte(a, b, c). These get sympify'ed as far as possible.
-> Ast+Tensorflow run into problems with partially sympify'ed expressions like
ifte(True, 1, 2) (True is a nameConstant) or
ifte(variable_1*2, b, c) (Tensor not of type boolean).
The expression can not be further reduced by sympify.
Write own sympify class (see below). Unfortunately this does not work, sympify is called on a string where the class is not known.

from sympy import Function

class ifte(Function):
    nargs = 3

    @classmethod
    def eval(cls, a, b, c):
        return b if a else c

My current solution

If is modeled as ifte(a, b, c) (sympify function passthrough).
ast-node functions are checked for ifte, the conditional-tensor is forced to be boolean
Karoo operators: 'ifte': tf.compat.v2.where,

This does not make me completely happy though.
(Also: Sympify's / 0-caused zoo is happening. Is string-replaced with '1000000'.)

What is the best solution?

What do you think would be the best approach to include if-then-else?
Are my assumptions correct?
Also, I am not sure how to handle closure (e. g. replacing sin(x) with ifte(x)).

Tensorflow meta-issue

As part of the engine-api PR, there is an option to choose NumpyEngine or TensorflowEngine. This has led to some discussion about what to do wrt tensorflow. There were several ongoing conversations, so I combine them here.

Outstanding issues:

We're using tensorflow 1, which is outdated and partially deprecated, so we should update to tensorflow 2.
tf is imported lazily by the TensorflowEngine because (at least for me) it takes about 2s extra to load. So if you use the NumpyEngine instead, you can save those 2 seconds by avoiding importing tf at all. The way I've done it is by copying a LazyLoader class from tensorflow themselves. This seems inelegant, and also may be sensitive to licensing. Should look for a better solution, like maybe importing tf in TensorflowEngine.__init__().
@asksak has asked a few questions about tensorflow:
- #39 Utilizing AMD GPU. Sounds like you got it to work, anything we should update?
- #66 Pointed out the limitations of tensorflow v1; noted.
- #72 Found a problem with tf.map_fn(); this is removed in engine-api and shouldn't be an issue going forward.

Finally, it's not obvious to me that tensorflow will ever be faster than numpy for what we're doing. It seems that tensorflow is fast when:

Working with matrices (2d), while we work with arrays (1d)
Doing multiplication or dot-products specifically, while we do many different operations

Anyway we should continue to support it, but monitor the performance and make sure users are getting the optimal performance.

Problem with new karoo

Hey;
Kstaats; After installing tensorflow and running the new karoo and finding out that the new karoo does not support exp and the other functions and deciding to go back to old karoo as I need them more than the speed only to find out that I cannot run old karoo no longer and simply getting the following the error:
` We have constructed a population of 100 Trees for Generation 1

Evaluate the first generation of Trees ...
Traceback (most recent call last):
File "karoo_gp_main.py", line 208, in
gp.fx_fitness_gym(gp.population_a) # 1) extract polynomial from each Tree; 2) evaluate fitness, store; 3) display
File "/home/abdu/Downloads/karoo/karoo_gp_base_class.py", line 1414, in fx_fitness_gym
self.fx_eval_poly(population[tree_id]) # extract the expression
File "/home/abdu/Downloads/karoo/karoo_gp_base_class.py", line 1185, in fx_eval_poly
self.algo_raw = self.fx_eval_label(tree, 1) # pass the root 'node_id', then flatten the Tree to a string
File "/home/abdu/Downloads/karoo/karoo_gp_base_class.py", line 1246, in fx_eval_label
return self.fx_eval_label(tree, tree[9, node_id]) + tree[6, node_id] + self.fx_eval_label(tree, tree[10, node_id])
File "/home/abdu/Downloads/karoo/karoo_gp_base_class.py", line 1236, in fx_eval_label
if tree[6, node_id] == 'not': tree[6, node_id] = ', not' # temp until this can be fixed at data_load
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
`
The problem did not appear before installing tensorflow. Even worse; after uninstalling tensorflow, I still get the same error. I hope you have some recommendation of how I might fix the problem.
Best;

Switch tree navigation from breadth-first to depth-first

Karoo Classic uses a breadth-first-search ordering (BFS) when navigating trees. The new Node API makes heavy use of recursion, which is by nature depth-first (DFS). We've implemented workarounds for breadth-first to guarantee continuity of test results through all the recent changes.

We've decided to switch to DFS by default, and remove support for BFS because:

We'd prefer to choose just one to support, so we don't have to write/update everything twice.
BFS is theoretically faster and light, requires less code, and we use a lot of recursion already.

NOTE: We've discussed that Python handles recursion a bit differently, so there are some implicit limits. However, we don't expect to reach those limits (works fine at tree_depth_base=15), and if we do, they can be increased manually.

What's the impact on Trees?

'Grow' trees are generated stochastically. Starting from the root node, at every 'child' position, flip a coin to decide whether to add a function (with additional children) or a terminal (no children). So they result in irregularly-shaped trees.

We'd expect that a population of grow trees generated with BFS and DFS will have different 'shapes'. Here's a diagram showing that DFS trees have a higher probability of being deep and wide (i.e. have lots of children).

I did an experiment where I generated 1000-each trees using DFS and BFS, and compare the frequency of depths and widths. Checks out.

So, expect a PR with this soon.

/proc/cpuinfo is not defined on Macs

$ python karoo_gp_main.py
Traceback (most recent call last):
  File "karoo_gp_main.py", line 35, in <module>
    import karoo_gp_base_class; gp = karoo_gp_base_class.Base_GP()
  File "/Users/bill/programming/karoo_gp/karoo_gp_base_class.py", line 116, in __init__
    self.core_count = pp.get_number_of_cores() # pprocess
  File "/usr/local/lib/python2.7/site-packages/pprocess.py", line 939, in _get_number_of_cores
    f = open("/proc/cpuinfo")
IOError: [Errno 2] No such file or directory: '/proc/cpuinfo'

Probably this should have exception handling, and return a default value if an error occurs.

Conditionals and Logical Operators

As part of the ongoing redesign, I'd like to optimize if and if else operations. This is also discussed in #27 and there is some good guidance there.

A logical tree should be able to express something like "If volume is above 100, price squared over 200, else short / 300." In a poorly-rendered tree structure that looks like:

               IF
     >         /          /
 vol  100   **  200  short  300
        price 2

As I see it, there ought to be three classes of nodes: operators, terminals and booleans(new). In addition to arity, different types of nodes now specify that their children be a certain class.

Class	Type	Symbol	Arity	Child Class
operator	arithmetic	`+, -, , /, *`	2	op/tm
operator	logic	`IF, IFELSE`	2, 3	bool, op/tm
boolean	comparison	`==, !=, >, <, >=, <=`	2	op/tm
boolean	logic	`AND, OR, NOT`	2, 1	bool
terminal	inputs	`a, b, c`	0	-
terminal	constants	`1, 2, 3`	0	-

Redo Tensorflow Comparison properly (CPU vs GPU)

Compare to original paper: https://arxiv.org/pdf/1708.03157.pdf

Methodically changing code to be suitable for Python 3

Hello,
I am working with Python 3.5
I have methodically made changes to karoo_gp_base_class.py, karoo_gp_main.py and karoo_gp_server.py to be compatible with Python 3.
I have now reached an error that I am unsure how to fix: (After accepting all default options for a run of karoo_gp_main I get)

Traceback (most recent call last):
File "karoo_gp_main.py", line 210, in
gp.fx_fitness_gym(gp.population_a) # generate expression, evaluate fitness, compare fitness
File "C:\Users\shiri\Documents\karoo_gp-master\karoo_gp_base_class.py", line 1351, in fx_fitness_gym
print ('\n\033[36m ', len(self.fittest_dict.keys()), 'trees\033[1m', np.sort(self.fittest_dict.keys()), '\033[0,0m\033[36moffer the highest fitness scores.\033[0,0m')
File "C:\Anaconda3\envs\py35\lib\site-packages\numpy\core\fromnumeric.py", line 822, in sort
a.sort(axis=axis, kind=kind, order=order)
numpy.core._internal.AxisError: axis -1 is out of bounds for array of dimension 0

Please advise how to fix this issue

How can I define new Functions (Operators)?

There are some project related to symbolic regression in github.
such as: gplearn/glyph/SimpleGP...
and they are slow compared to karoo_gp which can support multicore CPU/GPU.

But How can I define new Functions (Operators) in karoo_gp?
I find that your operators_list.txt just include some basic Functions, but when I want to do something about time Series, I want to use some functions such as:
1.delay(x, d) = value of x d days ago
2.correlation(x, y, d) = time-serial correlation of x and y for the past d days
3.delta(x, d) = today’s value of x minus the value of x d days ago
4.ts_min(x, d) = time-series min over the past d days
Can you give me some advice to define new functions in karoo_gp ?
Thank you

Square Root Problem

Hello,

As I was working on Karoo, and when testing I observed this:

SQUARE ROOT causes the runs to become very slow, and the more iterations are run the slower Karoo became until I got memory error. I googled the issue and it seems its a sympy issue with sqrt.

I used 50 as population and (as I modified the code) I used 2000 as iterations. the program ran very fast and completed without sqrt. However with sqrt, it was slow, then became unresponsive then gave a memory error.

FYI,

Aymen

Sporadic `ERROR! In fx_evolve_grow_mutate: branch_depth -1 < 0`

The following command sometimes fails with ERROR! In fx_evolve_grow_mutate: branch_depth -1 < 0:

python3 karoo-gp.py -ker r -typ g -bas 5 -pop 20 -fil karoo_gp/files/data_REGRESS.csv

To reproduce, you can run it in a loop until it fails using:

while true; do python3 karoo-gp.py -ker r -typ g -bas 5 -pop 20 -fil karoo_gp/files/data_REGRESS.csv; done

After the error it stops in the pause menu, which shouldn't happen for this type of runs.

Need to provide ERCs

Success in most symbolic regression and classification problems depends on the use of Ephemeral Random Constants as part of the solution library. As far as I understand it, in this version one has to explicitly provide specific constants, as part of the training data set.

As things stand, even with the availability of user-provided specific constants, any search that requires numerical optimization of coefficients is spending most of its effort on constructing numerical arguments, rather than actually fitting qualitative curves or decision planes.

A simple example:

One of the pedagogic examples I always use is the birthday polynomial. Given your own birthday in year month-number day-number format, and then attempt symbolic regression on a dataset of

y = year + month-number * x + day-number * x * x

Without ERCs, symbolic regression on this simple polynomial spends most of its effort trying to construct large numbers from units. The interplay between this "sub-task" of actually constructing large constants from small ones, and the more important task of explaining the data (which is usually the intent of the user) can lead to trouble, since from the standpoint of the GP system itself there is no differentiation between these tasks.

Error

Hello,

I got a consistent error with one of my setups. The data is error free and works with other operator setups but not this one:

+,2
-,2

=,2
/,2
*,2
or,2

`Evaluate all Trees in Generation 4
ValueError: Error from parse_expr with transformed code: "(((Symbol ('A17x' ))>=(Symbol ('A6x' )))>=((Symbol ('A3x' ))>=(Symbol ('A22x' ))))"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/aymansalsaket/Desktop/new/karoo-gp.py", line 593, in
gp.fit(X, y)
File "/Users/aymansalsaket/Desktop/new/karoo_gp/base_class.py", line 802, in fit
super().fit(X, y, *args, **kwargs)
File "/Users/aymansalsaket/Desktop/new/karoo_gp/base_class.py", line 458, in fit
self.population.evaluate(X_train, y_train, self.X_train_hash)
File "/Users/aymansalsaket/Desktop/new/karoo_gp/population.py", line 93, in evaluate
predictions = self.model.batch_predict(X, self.trees, X_hash)
File "/Users/aymansalsaket/Desktop/new/karoo_gp/base_class.py", line 492, in batch_predict
y = self.engine.predict(trees, X, X_hash)
File "/Users/aymansalsaket/Desktop/new/karoo_gp/engine.py", line 155, in predict
expr = tree.expression
File "/Users/aymansalsaket/Desktop/new/karoo_gp/tree.py", line 78, in expression
return self.root.parse(simplified=True)
File "/Users/aymansalsaket/Desktop/new/karoo_gp/node.py", line 317, in parse
return (f'({self.children[0].parse(simplified)}{ws}{self.label}'
File "/Users/aymansalsaket/Desktop/new/karoo_gp/node.py", line 318, in parse
f'{ws}{self.children[1].parse(simplified)})')
File "/Users/aymansalsaket/Desktop/new/karoo_gp/node.py", line 318, in parse
f'{ws}{self.children[1].parse(simplified)})')
File "/Users/aymansalsaket/Desktop/new/karoo_gp/node.py", line 318, in parse
f'{ws}{self.children[1].parse(simplified)})')
File "/Users/aymansalsaket/Desktop/new/karoo_gp/node.py", line 322, in parse
f'{ws}else{ws}{self.children[2].parse(simplified)})')
File "/Users/aymansalsaket/Desktop/new/karoo_gp/node.py", line 321, in parse
f'{ws}if{ws}{self.children[1].parse(simplified)}'
File "/Users/aymansalsaket/Desktop/new/karoo_gp/node.py", line 289, in parse
result = str(sympify(raw_expr))
File "/opt/anaconda3/envs/Master-TF/lib/python3.9/site-packages/sympy/core/sympify.py", line 495, in sympify
expr = parse_expr(a, local_dict=locals, transformations=transformations, evaluate=evaluate)
File "/opt/anaconda3/envs/Master-TF/lib/python3.9/site-packages/sympy/parsing/sympy_parser.py", line 1105, in parse_expr
raise e from ValueError(f"Error from parse_expr with transformed code: {code!r}")
File "/opt/anaconda3/envs/Master-TF/lib/python3.9/site-packages/sympy/parsing/sympy_parser.py", line 1096, in parse_expr
rv = eval_expr(code, local_dict, global_dict)
File "/opt/anaconda3/envs/Master-TF/lib/python3.9/site-packages/sympy/parsing/sympy_parser.py", line 915, in eval_expr
expr = eval(
File "", line 1, in
TypeError: '>=' not supported between instances of 'GreaterThan' and 'GreaterThan'
`

Tree Error!

Hello all,

please review the tree below, which is a result of running the current version, and also occurred before the last commits:

What is this section of the tree?

sl
pw
sl

In fact I'm finding it hard to read this LISP tree, the whole tree does not make sense?

I hope I am wrong, but if not, this is the main part that needs current attention.

Error

Dear Kai,

I ran Karoo GP as usual:

sudo nice -1 python karoo-gp.py

and the following error stopped Pause from loading at run-end:

Can't use "t" in the pause menu twice in a row

If I run in interactive mode (I've also seen it at the end of a "minimal" mode run), then I can use the pause menu at most once per generation:

  2 trees [ 1 38] offer the highest fitness scores.

         (pause) t
...
 Confusion matrix:
[[3072  984]
 [2997  947]]

         (pause) t
         Select from the options given. Try again ...

         (pause) ?
         Select from the options given. Try again ...

         (pause) p
         Select from the options given. Try again ...

         (pause) 30
         Select from the options given. Try again ...

I'm guessing it's some kind of parsing bug where the state of some buffer is left inconsistent after the first "t" operation. All I can do now is hit enter and let the next generation run, after which I'll be able to use "t" on (at most) one of the winners from that generation.

Functions that produce nan/inf values

Some of the operators we support will produce unusable values (nan or inf) in the course of normal use:

Operator	X == 0	X > 1e3	X < 0	X > 1
`/`	nan*
`**`		inf
`sqrt`	nan*		nan
`log`	-inf		nan
`log1p`			nan
`arcsin`				nan
`arccos`				nan

*We currently use helper functions for division and square root which ignore 0s.

What to do?

Here are 3 ideas:

Deal with them case-by-case.
- / and sqrt seem ok for now.
- log1p is a built-in function that extends log by ignoring 0s. We could add a helper which does sign(x) * log1p(abs(x)).
- arccos and arcsin are maybe rare enough, we could add a check in karoo.fit() when using them that -1 < X < 1, else raise a ValueError.
- That leaves **. X > 1e3 happens frequently with small numbers too when combined with other operators, e.g. 2 ** (1 / .001). Replacing with 0 is the simplest option, but it's a big nonlinearity (as X increases, outputs get exponentially larger and then drop to 0).
Accept a kwarg with a replacement value (e.g. 0) in the case that a nan and/or inf is produced. Basically like we do in the *'s above, for everything.
If and when a tree produces a nan or inf, just remove it from the gene pool and don't bother scoring it. This is basically the method used by swim, i.e. eliminate trees with less than the minimum number of nodes.

I lean toward 3.

tensorflow version used is outdated

I get the following error when trying to run the package, it seems that the version of tensorflow used in the package is an older version:

File "modules\karoo_gp_base_class.py", line 59, in
'log': tf.log, # e.g., log(a)
AttributeError: module 'tensorflow' has no attribute 'log'

Fittest tree issue

In GP, the fittest tree in a generation is, in one scenario, carried forward to the next generation.

With this in mind, the fittest tree in any generation cannot be lower than the inherited tree.

In my tests, I noticed that the fittest tree value drops, which does not conform to the "survival of the fittest" as described by Koza.

I wonder what you think?

Mutation and Crossover

Hello Everyone,

I came across these formulae with regards to efficient mutation and cross over:

Perhaps dynamic mutation and crossover are important to maintain a positive progression of the optimization.

Best,

Aymen

Use ncurses for display

ncurses will be much cleaner than current situation with print and codes.
Actually, ncurses is exactly the tool for such a job.
The only problem's that I am unsure if it will work under Windows.

TypeError: '<' not supported between instances of 'GreaterThan' and 'GreaterThan'

I get the following error: (note that I use anaconda navigator to create an environment with the required versions of the dependent modules) and the following command line and functions:

operator, arity
+,2
-,2
/,2
*,2
<,2
>=,2
or,2
and,2

(macos-TF2) aymansalsaket@AYMANs-MacBook-Pro latest % python -V
Python 3.8.8
(macos-TF2) aymansalsaket@AYMANs-MacBook-Pro latest % sudo nice -1 python karoo-gp.pypython -V 

Select (c)lassification, (r)egression, (m)atching, or (p)lay (default m): c
Select (f)ull, (g)row, or (r)amped 50/50 method (default r): r
Enter depth of the initial population of Trees (default 3): **7**
Enter maximum Tree depth (default 7): **7**
Enter minimum number of nodes for any given Tree (default 3; max 255): **5**
Enter number of Trees in each population (default 100): 75
Enter max number of generations (default 10): 100

In the settings above only 5,5,3 work till the end. Here, I used 7,7,5 and it crashed as shown in the trace:

ERROR TRACE:

Evaluate all Trees in Generation 4
ValueError: Error from parse_expr with transformed code:
   "(((Symbol ('A1x' ))>=(Symbol ('A21x' )))<((Symbol ('A11x' ))>=(Symbol ('A2x' ))))"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "karoo-gp.py", line 593, in <module>
    gp.fit(X, y)
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/base_class.py", line 804, in fit
    super().fit(X, y, *args, **kwargs)
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/base_class.py", line 460, in fit
    self.population.evaluate(X_train, y_train, self.X_train_hash)
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/population.py", line 93, in evaluate
    predictions = self.model.batch_predict(X, self.trees, X_hash)
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/base_class.py", line 494, in batch_predict
    y = self.engine.predict(trees, X, X_hash)
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/engine.py", line 155, in predict
    expr = tree.expression
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/tree.py", line 76, in expression
    return self.root.parse(simplified=True)
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/node.py", line 203, in parse
    return (f'({self.children[0].parse(simplified)}'
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/node.py", line 203, in parse
    return (f'({self.children[0].parse(simplified)}'
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/node.py", line 200, in parse
    return (f'({self.children[0].parse(simplified)}{ws}{self.label}'
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/node.py", line 200, in parse
    return (f'({self.children[0].parse(simplified)}{ws}{self.label}'
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/node.py", line 200, in parse
    return (f'({self.children[0].parse(simplified)}{ws}{self.label}'
  File "/Users/aymansalsaket/Desktop/latest/karoo_gp/node.py", line 172, in parse
    result = str(sympify(raw_expr))
  File "/opt/anaconda3/envs/macos-TF2/lib/python3.8/site-packages/sympy/core/sympify.py", line 495, in sympify
    expr = parse_expr(a, local_dict=locals, transformations=transformations, evaluate=evaluate)
  File "/opt/anaconda3/envs/macos-TF2/lib/python3.8/site-packages/sympy/parsing/sympy_parser.py", line 1105, in parse_expr
    raise e from ValueError(f"Error from parse_expr with transformed code: {code!r}")
  File "/opt/anaconda3/envs/macos-TF2/lib/python3.8/site-packages/sympy/parsing/sympy_parser.py", line 1096, in parse_expr
    rv = eval_expr(code, local_dict, global_dict)
  File "/opt/anaconda3/envs/macos-TF2/lib/python3.8/site-packages/sympy/parsing/sympy_parser.py", line 915, in eval_expr
    expr = eval(
  File "<string>", line 1, in <module>
TypeError: '<' not supported between instances of 'GreaterThan' and 'GreaterThan'

gpu issue

hello, I am using karoo in my work, buy I could not use karoo with my gpu. I set self.tf_device='/gpu:0' in base_class.py, and still no gpu use. I am sure my cuda enviroment is right cause other programs are all running well with gpu. So where should I set for gpu?

Latest KarooGP Issues

Hello Kai,

I tested the latest version of KarooGP. It seems it runs for arithmetic operators only at the moment, however, slick and fast as never before. Brilliant progress. Math and Logic are not supported as far as my test shows.

Now, as I have already converted the original KarooGP to work with all operators, I suggest the following:

1. For logical operators I used this:

ifgreaterthan(a,b)

This is mapped to tf.math.greater, otherwise order of operations does not work in sympify as it does sporadic evaluations even when EVALUATE is set to FALSE.

2. For arithmetic and math operators I used one of two:

addK(a,b) etc... where addK is a string that tensorflow is set to map it to tf.math.add
or
a+b

both work fine, but the first works flawlessly. Something to think about.

3. For Logical evaluations:

Since the result of a full logical tree will be 1 or 0, the fastest way was to cast the final result of the evaluation to float, so I did not have to modify anything else in the code.

In fx_fitness_node_parse #Do this

elif isinstance(node, ast.Call):
#print("()")
return tf.cast(operators[node.func.id](*[self.fx_fitness_node_parse(arg, tensors) for arg in node.args]), tf.float32)

NOTE: My version of KarooGP works flawlessly with all operators including logic, however, long equations with brackets take forever to crossover. Therefore, the current development process is clearly overcoming the biggest problem.

I hope the team get some good insight into some possible options.

Best,

Aymen

KarooGP on Mac with AMD GPU

It is possible to use the discrete AMD GPU on both MacBook Pro and iMAc by using tensorflow-macos and tensorflow-metal.

The current KarooGP implementation must be configured to run on the AMD GPU for this to work, especially with the NEW MAC STUDIO with prolific CPU and GPU power out soon.

Sympy & Numpy Issue

Hello,

After doing some testing, I am nearly certain that Sympy & Numpy don't play well together, especially when tree depth and min nodes are higher than 5.

There are many references on stack overflow regarding this issue.

pow(a,b) not supported as such

The code assumes that binary operators are infix. No prefix binary operator is handled correctly, viz., 'atan2(a,b)', 'pow(a,b)'. While atan2 is noted as "NOT supported at this time" in file operators_list.txt, we do not so denote pow.

I propose to add 'pow(a,b)' to the "NOT supported" list, with a parenthetical note that the infix form (a**b) is supported.

Additionally, 'pow' appears twice in the set 'operators' in base_class.py, first as "ast.Pow: tf.pow, # e.g., a ** 2", and later as "'pow': tf.pow, # e.g., pow(a, b)".

I propose striking the latter.

Tensorflow v2 behaviour

I notice using tf v1 compatibility. Perhaps it's better to decorate @tf.function in fx_fitness_eval and get rid of graph initialisation and the session. It is way more efficient to let tensorflow autograph and functions are way less expensive than sessions.

Best

Aymen

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

kstaats / karoo_gp Goto Github PK

karoo_gp's Introduction

Karoo GP

karoo_gp's People

Contributors

Stargazers

Watchers

Forkers

karoo_gp's Issues

Karoo 3.0 (include)

Karoo 3.N (later)

Karoo GP Base Class

Define the methods and global variables used by Karoo GP

by Kai Staats, MSc with TensorFlow support provided by Iurii Milovanov; see LICENSE.md

pip install package preparation by Antonio Spadaro and Ezio Melotti

version 2.4 for Python 3.8

TensorFlow Imports and Definitions

import tensorflow as tf

What's the impact on Trees?

A simple example:

What to do?

Recommend Projects

Recommend Topics

Recommend Org