S&P 500 Stock Clustering

This notebook demonstrates a clustering of the S&P 500 stock exchange, based on a select set of financial figures

The exchange consists of 500 companies, but includes 505 common stocks, due to 5 companies having two shares of stocks in the exchange (Facebook, Under-Armour, NewsCorp, Comcast and 21st Century Fox)

# libraries for making requests and parsing HTML
import requests
from bs4 import BeautifulSoup

# plotting
import matplotlib.pyplot as plt
import seaborn as sns

# sklearn for kmeans and model metrics
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# pandas, for data wrangling
import pandas as pd

/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Data Accquisition

For the data I wanted access to, the existing APIs for financial data did not work out. Instead. I decided to manually scrape the data, ussing Wikipedia and Yahoo Finance.

scrape the list of S&P 500 tickers from Wikipedia
scrape the financial figures for each stock ticker from Yahoo Finance

# URL to get S&P tickers from
TICKER_URL = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

# multi-level identifier, to select each row of ticker table in HTML response
TABLE_IDENTIFIER = '#constituents tbody tr td'

# yahoo finance URL we can use to scrape data for each company
YAHOO_URL = 'http://finance.yahoo.com/quote/'

# HTML classes for various elements on yahoo finance page

YAHOO_TABLE_CLASS = 'Ta(end) Fw(600) Lh(14px)'
# EPS (TTM) react-id
# Open price react-id
# Div/Yield react-id
YAHOO_IDS = ['OPEN-value', 'EPS_RATIO-value', 'DIVIDEND_AND_YIELD-value', 'PE_RATIO-value']

# get HTML content from wikipedia S&P 500 page
res = BeautifulSoup(requests.get(TICKER_URL).text, 'html.parser')

# get the table of stock ticker data, selecting on TABLE_ID
table_data = [ticker for ticker in res.select(TABLE_IDENTIFIER)]

# iterate over each row of table (9 elements of information), and extract the individual tickers
tickers = [table_data[i].text for i in range(0, len(table_data), 9)]

# iterate through the S&P 500 company tickers, and collect data from Yahoo Finance
def get_yahoo_ticker_data(tickers):
    ticker_data = []
    # make GET request for specified ticker
    print(len(tickers))
    for i, ticker in enumerate(tickers):
        print(i)
        try:
            REQ_URL = YAHOO_URL + ticker[:-1] + '?p=' + ticker[:-1]
            ticker_i_res = requests.get(REQ_URL)
            ticker_i_parser = BeautifulSoup(ticker_i_res.text, 'html.parser')

            ticker_i_data = [ticker[:-1]]
            ticker_i_open_eps_div = [ticker_i_parser.find(attrs={'class': YAHOO_TABLE_CLASS, 'data-test': id_}).text for id_ in YAHOO_IDS]
            for data in ticker_i_open_eps_div:
                    ticker_i_data.append(data)
            ticker_data.append(ticker_i_data)
        except:
            print("error for " + ticker)
            continue
    return ticker_data

Saving the data

The process of scraping all of the necessary data was rather cumbersome, so it made sense to save the data to file for future experiments

# convert yahoo finance data to dataframe

# will include:
# EPS (TTM) => earnings per share for trailing 12 months
# Dividend/Yield => dividend per share / price per share
# P/E ratio => share price / earnings per share
try:
    df = pd.read_csv('data.csv')
except:
    # iterate over stock tickers, and get 1 year of time-series data
    market_data = pd.DataFrame()
    yahoo_data = get_yahoo_ticker_data(tickers)
    df = pd.DataFrame(yahoo_data, columns=['ticker', 'open', 'eps', 'div'])#, 'pe'],)
    df.to_csv(path_or_buf='data.csv')
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Unnamed: 0	ticker	open	eps	div
0	0	MMM	169.78	8.43	5.76 (3.39%)
1	1	ABT	87.08	1.84	1.44 (1.65%)
2	2	ABBV	90.05	2.18	4.72 (5.24%)
3	3	ABMD	179.85	4.79	N/A (N/A)
4	4	ACN	203.60	7.36	3.72 (1.83%)

df['div'] = df['div'].replace({'N/A (N/A)': 0})

Preprocessing

Some data preprocessing is required before proceeding forward with experimentation

separating percentage dividend yield and dividend yield amount into two separate featuress
reformatting some features into representations that could be converted to numerical types
casting features of DataFrame to numerical types

# drop NaN values
df = df.dropna()

# remove NaN values that aren't using NaN value
#df = df[df['eps'] != 'N/A']
df['eps'] = df['eps'].astype(float)


# preprocess open values
df['open'] = df['open'].astype(str)
df['open'] = df['open'].apply(lambda x: x.replace(',', '')).astype(float)

# split dividend into amount and percentage
df['div'] = df['div'].astype(str)
df['div_pct'] = df['div'].apply(lambda x: x.split(' ')[1] if len(x.split(' ')) > 1 else '(0%)')
df['div_pct'] = df['div_pct'].apply(lambda x: x[1:-2]).astype(float)
df['div_amt'] = df['div'].apply(lambda x: x.split(' ')[0]).astype(float)
df = df.drop(['div'], axis=1)
df.isnull().sum()

Unnamed: 0    0
ticker        0
open          0
eps           0
div_pct       0
div_amt       0
dtype: int64

# relevant data for now, will be using these columns for k-means clustering
two_dim_cluster_data = df[['ticker', 'eps', 'div_pct']]
four_dim_cluster_data = df[['ticker', 'eps', 'open', 'div_pct', 'div_amt']]

sns.scatterplot(x='eps', y='div_pct', data=two_dim_cluster_data)

<matplotlib.axes._subplots.AxesSubplot at 0x1a10f6b6a0>

Clustering the data: The K-Means algorithm

Now that the data the accquisition and preprocessing was complete, the next step is clustering our stock data, analyzing the performance of the clustering, based on the number of centroids, and then generating a final clustering based on some performance metrics.

The K-means algorithm operates as follows:

1. a number of "centroids" are randomly initialized (the number of hyperparameter of the model), these centroid
   match the dimension of the feature set, and can be imagine as a vector into some n-dimensional space
2. every sample in the data set is then compared to each of the randomly initialized centroids, to see how far 
   it is away from the centroid. Since the samples and centroids are vectors, the distance 
   between a vector v and a centroid u is the vector normal of the difference between the two vectors 
   ((u1-v1)^2 + (u2-v2)^2 + ....)^(1/2). Each sample is then "clustered" with the centroid it is closest to.
3. After each sample has been clustered with a specific centroid, each centroid is repositioned, such that it
   is the average of all of the samples that have been clustered with it.
4. The sample association and centroid repositioning steps are then repeated for some number of iterations

# iterate over a variety of amounts of cluster centroids for clustering our stock data
# looking for an "elbow" in the sum of squared error plot, for different amounts of centroids
def k_means_func(data, max_centroids=25):
    # transform numerical features (eps and percentage dividend)
    transform_data = StandardScaler().fit_transform(data.iloc[:,1:])
    
    sum_square_err = {}
    sil_score = {}
    for num_centroids in range(2,max_centroids):
        model = KMeans(n_clusters=num_centroids, random_state=2, n_init=10)
        model.fit(transform_data)
        sum_square_err[num_centroids] = model.inertia_
        sil_score[num_centroids] = silhouette_score(transform_data, model.labels_, random_state=2)
    
    plt.figure(figsize=(16,6))
    ax1 = plt.subplot(211)
    plt.plot(list(sum_square_err.keys()), list(sum_square_err.values()))
    ax1.title.set_text("k-means sum squared error")
    plt.xlabel("num. centroids")
    plt.ylabel("sum squared error")
    plt.xticks([i for i in range(2, max_centroids)])
    
    ax2 = plt.subplot(212)
    plt.plot(list(sil_score.keys()), list(sil_score.values()))
    ax2.title.set_text("k-means silhouette score")
    plt.xlabel("num. centroids")
    plt.ylabel("score")
    plt.xticks([i for i in range(2, max_centroids)])
    plt.yticks([i / 10 for i in range(10)])

Measuring the performance of K-Means clustering

The K-means algorithm cannot be measured in performance in the same way as supervised learning algorithms. There is no prediction error, since the data we are given is unlabeled, and instead, we measure the performance of the k-means algorithm based on the ability of the chosen number of centroids to effectively cluster the data. Notely, one of the common metrics for K-means is measuring the squared sum of errors between each sample and the centroid it is clustered with, where the squared error is just the squared vector normal of the difference between the sample and the centroid

In addition to the squared sum of errors, K-means is often measured using the silhouette score. This metric is the mean of the silhouette coefficient for every sample. The silhouette coefficient can be defined as follows:

for a sample S, we define A(S) as the mean distance between S and every other element in S's assigned cluster
we define B(S) as the mean distance between S, and every point in the closest cluster to S, other than S's assigned cluster
we define SC(S), the silhouette coefficient, as the difference between A(S) and B(S), divided by the larger of A(S) and B(S)
therefore, SC(S) ranges from 0 to 1, where SC(S) = 1 means the mean distance from S to every point in S's cluster is 0, and SC(S) = 0 means that the mean distance from S to every point in its cluster is the same as the mean distance from S to every point in the nearest other cluster

Below, we plot these metrics for our application of K-means to the stock data, we can see the following:

The silhouette score drops rather quickly after n grows greater than 3-4, this implies that a small amount of clusters most likely results in a few disparate clusters (with a single cluster comprising much of the data)
The silhouette score stabilizes after it drops to ~~0.4, while the SSE continues to drop rapidly until n~~10
The silhouette score bumps up slightly for a few values of n (n = 11, n = 15, n = 20), these are likely good values for n, since the silhouette score is stable but slightly up, while the SSE continues to go down

k_means_func(two_dim_cluster_data)

k_means_func(four_dim_cluster_data)

Finalizing our clusterings

Given that we have identified a few values for our centroid hyperparameter that seem fruitful, the next step is to fit and cluster the data for these specified values, our results will not be predictions of an output variable, as is the case in supervised learning, but rather, predictions of certain groupings of our stock tickers

def classify_four_dim_stocks(data, cluster_configs):
    transform_data = StandardScaler().fit_transform(data.iloc[:,1:])
    # initialize K-means models with each of the specified cluster hyperparameter valuess
    for config in cluster_configs.keys():
        model = KMeans(n_clusters=cluster_configs[config], random_state=5, n_init=10)
        model.fit(transform_data)
        data[config] = model.labels_
    return data

cluster_config_one = {
    'cluster_five': 5,
    'cluster_ten': 10,
    'cluster_fourteen': 14,
    'cluster_twenty': 20
}
four_dim_cluster_data = classify_four_dim_stocks(four_dim_cluster_data[['ticker', 'eps', 'open', 'div_pct', 'div_amt']], cluster_config_one)

/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys

four_dim_cluster_data

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	ticker	eps	open	div_pct	div_amt	cluster_five	cluster_ten	cluster_fourteen	cluster_twenty
0	MMM	8.43	169.78	3.39	5.76	0	4	11	19
1	ABT	1.84	87.08	1.65	1.44	2	5	1	2
2	ABBV	2.18	90.05	5.24	4.72	0	4	13	1
3	ABMD	4.79	179.85	0.00	0.00	2	8	5	16
4	ACN	7.36	203.60	1.83	3.72	0	4	3	13
5	ATVI	2.11	58.34	0.63	0.37	2	8	5	7
6	ADBE	6.00	322.10	0.00	0.00	2	2	12	16
7	AMD	0.19	42.79	0.00	0.00	2	8	5	7
8	AAP	6.17	158.13	0.16	0.24	2	8	5	16
9	AES	0.76	18.88	3.03	0.57	3	0	9	10
10	AMG	-3.35	86.68	1.50	1.28	2	5	1	2
11	AFL	4.05	53.33	2.03	1.08	2	5	1	2
12	A	3.37	83.75	0.85	0.72	2	5	1	2
13	APD	7.94	235.09	1.98	4.64	0	4	11	13
14	AKAM	2.74	84.44	0.00	0.00	2	8	5	7
15	ALK	4.92	70.41	2.03	1.40	2	5	1	2
16	ALB	5.38	68.90	2.22	1.47	2	5	9	10
17	ARE	1.09	155.29	2.63	4.12	0	4	3	13
18	ALXN	6.52	109.43	0.00	0.00	2	8	5	7
19	ALGN	5.21	269.48	0.00	0.00	2	8	5	16
20	ALLE	4.79	123.53	0.87	1.08	2	5	1	2
21	AGN	-27.98	190.50	1.56	2.96	2	5	1	5
22	ADS	8.81	109.78	2.31	2.52	0	7	3	5
23	LNT	2.24	54.08	2.64	1.42	3	0	9	10
24	ALL	7.32	110.18	1.82	2.00	2	7	7	5
25	GOOGL	46.60	1357.00	0.00	0.00	1	1	8	4
26	GOOG	46.60	1356.60	0.00	0.00	1	1	8	4
27	MO	0.93	50.90	6.61	3.36	3	6	0	9
28	AMZN	22.57	1795.02	0.00	0.00	1	1	8	18
29	AMCR	0.31	10.75	4.26	0.46	3	0	10	0
...	...	...	...	...	...	...	...	...	...
474	V	5.32	185.52	0.65	1.20	2	5	7	15
475	VNO	15.73	65.26	4.06	2.64	3	0	13	17
476	VMC	4.50	143.12	0.87	1.24	2	5	1	2
477	WRB	3.61	69.64	0.63	0.44	2	8	5	7
478	WAB	1.46	74.31	0.64	0.48	2	8	5	7
479	WMT	5.00	121.51	1.75	2.12	2	7	7	5
480	WBA	4.31	57.23	3.21	1.83	3	0	9	10
481	DIS	6.64	147.77	1.19	1.76	2	7	7	15
482	WM	4.09	113.02	1.81	2.05	2	7	7	5
483	WAT	8.13	231.00	0.00	0.00	2	8	5	16
484	WEC	3.45	91.33	2.78	2.53	3	0	9	5
485	WCG	12.44	317.40	0.00	0.00	2	2	12	16
486	WFC	4.65	54.46	3.75	2.04	3	0	9	17
487	WELL	2.80	76.97	4.50	3.48	3	4	13	1
488	WDC	-5.26	57.17	3.49	2.00	3	0	9	17
489	WU	2.60	26.87	2.98	0.80	3	0	9	10
490	WRK	3.33	41.87	4.43	1.86	3	0	10	17
491	WY	-0.21	29.74	4.59	1.36	3	0	10	0
492	WHR	16.58	147.25	3.27	4.80	0	4	11	13
493	WMB	0.12	23.04	6.63	1.52	3	6	0	9
494	WLTW	6.74	201.02	1.29	2.60	2	7	7	15
495	WYNN	6.16	138.00	3.00	4.00	0	4	3	1
496	XEL	2.50	63.57	2.55	1.62	3	0	9	10
497	XRX	2.84	36.93	2.69	1.00	3	0	9	10
498	XLNX	3.71	96.26	1.54	1.48	2	5	1	2
499	XYL	2.80	78.10	1.23	0.96	2	5	1	2
500	YUM	3.62	99.48	1.69	1.68	2	5	7	5
501	ZBH	-0.44	149.90	0.64	0.96	2	5	1	2
502	ZION	4.27	51.60	2.64	1.36	3	0	9	10
503	ZTS	3.02	127.15	0.63	0.80	2	8	5	2

497 rows × 9 columns

def output_cluster_tickers(original_data, cluster_data, cluster, show_tickers=[]): 
    for i in range(0, max(cluster_data[cluster])):
        if(i in show_tickers or len(show_tickers) == 0):
            # list of tickers for the current cluster
            ticker_list = list(cluster_data[cluster_data[cluster] == i]['ticker'])
            print("cluster " + str(i) + ":")
            print("includes " + str(len(ticker_list)) + " stocks")
            print(ticker_list)
            # original data for tickers that are part of cluster, more useful than
            # the transformed data
            curr_data = original_data[original_data['ticker'].isin(ticker_list)]
            print(curr_data[['open', 'div_pct', 'div_amt', 'eps']].mean())
            print()

output_cluster_tickers(df, four_dim_cluster_data, 'cluster_twenty')

cluster 0:
includes 24 stocks
['AMCR', 'APA', 'T', 'CAH', 'CNP', 'COTY', 'F', 'BEN', 'GPS', 'HRB', 'HBI', 'HST', 'HBAN', 'IPG', 'KIM', 'KMI', 'KHC', 'NWL', 'NLSN', 'PBCT', 'PPL', 'SLB', 'TPR', 'WY']
open       23.628750
div_pct     4.732500
div_amt     1.106667
eps        -0.760000
dtype: float64

cluster 1:
includes 27 stocks
['ABBV', 'BXP', 'CVX', 'CCI', 'DRI', 'DLR', 'D', 'DTE', 'DUK', 'ETR', 'EXR', 'XOM', 'FRT', 'SJM', 'KMB', 'LYB', 'MAA', 'OKE', 'PM', 'PSX', 'PNW', 'PRU', 'SLG', 'UPS', 'VLO', 'WELL', 'WYNN']
open       105.968148
div_pct      3.819630
div_amt      3.918148
eps          4.584444
dtype: float64

cluster 2:
includes 82 stocks
['ABT', 'AMG', 'AFL', 'A', 'ALK', 'ALLE', 'AAL', 'APH', 'AOS', 'AMAT', 'APTV', 'BLL', 'BAC', 'BAX', 'BWA', 'CBOE', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTXS', 'CTSH', 'CMCSA', 'CTVA', 'CSX', 'DHI', 'DVN', 'FANG', 'DD', 'ETFC', 'EBAY', 'EOG', 'EFX', 'EXPE', 'EXPD', 'FIS', 'FRC', 'FLIR', 'FLS', 'FMC', 'FBHS', 'FOXA', 'FOX', 'FCX', 'GL', 'HIG', 'HES', 'HRL', 'ICE', 'JBHT', 'LW', 'LDOS', 'MRO', 'MAS', 'MCK', 'MGM', 'MCHP', 'MOS', 'NEM', 'NWSA', 'NWS', 'NKE', 'NBL', 'ORCL', 'PCAR', 'PNR', 'PRGO', 'PHM', 'RJF', 'RHI', 'ROL', 'ROST', 'SEE', 'LUV', 'TJX', 'TSCO', 'VRSK', 'VMC', 'XLNX', 'XYL', 'ZBH', 'ZTS']
open       71.278902
div_pct     1.359146
div_amt     0.892927
eps         2.791463
dtype: float64

cluster 3:
includes 1 stocks
['NVR']
open       3820.00
div_pct       0.00
div_amt       0.00
eps         215.31
dtype: float64

cluster 4:
includes 3 stocks
['GOOGL', 'GOOG', 'AZO']
open       1311.956667
div_pct       0.000000
div_amt       0.000000
eps          52.210000
dtype: float64

cluster 5:
includes 66 stocks
['AGN', 'ADS', 'ALL', 'AEE', 'AEP', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'CHRW', 'CE', 'CB', 'CINF', 'C', 'STZ', 'CVS', 'DFS', 'DOV', 'ETN', 'EMR', 'EQR', 'ES', 'FDX', 'GRMN', 'GPC', 'HAS', 'HSY', 'IR', 'IFF', 'LLY', 'LOW', 'MMC', 'MDT', 'MRK', 'MSI', 'NDAQ', 'NTRS', 'PAYX', 'PPG', 'PG', 'PLD', 'QCOM', 'DGX', 'RL', 'RSG', 'SWKS', 'SWK', 'SBUX', 'STT', 'SYY', 'TROW', 'TGT', 'TEL', 'TIF', 'TSN', 'UTX', 'VFC', 'WMT', 'WM', 'WEC', 'YUM']
open       110.851970
div_pct      2.137273
div_amt      2.301212
eps          4.087879
dtype: float64

cluster 6:
includes 3 stocks
['IBM', 'PSA', 'SPG']
open       161.196667
div_pct      4.833333
div_amt      7.626667
eps          8.170000
dtype: float64

cluster 7:
includes 64 stocks
['ATVI', 'AMD', 'AKAM', 'ALXN', 'AME', 'ARNC', 'ADSK', 'BSX', 'CDNS', 'CPRI', 'KMX', 'CBRE', 'CNC', 'CXO', 'CPRT', 'DHR', 'DVA', 'XRAY', 'DISCA', 'DISCK', 'DISH', 'DLTR', 'FISV', 'FTNT', 'FTV', 'IT', 'GE', 'GPN', 'HSIC', 'HLT', 'HOLX', 'INFO', 'INCY', 'IPGP', 'IQV', 'JEC', 'KEYS', 'LEN', 'LKQ', 'L', 'MU', 'MNST', 'MYL', 'NOV', 'NCLH', 'NRG', 'PYPL', 'PKI', 'PGR', 'PVH', 'QRVO', 'PWR', 'CRM', 'SNPS', 'TMUS', 'TTWO', 'TXT', 'TRIP', 'TWTR', 'UAA', 'UA', 'VAR', 'WRB', 'WAB']
open       78.296094
div_pct     0.152500
div_amt     0.110156
eps         2.325156
dtype: float64

cluster 8:
includes 4 stocks
['BIIB', 'MHK', 'REGN', 'SIVB']
open       264.5475
div_pct      0.0000
div_amt      0.0000
eps         29.5425
dtype: float64

cluster 9:
includes 10 stocks
['MO', 'CTL', 'HP', 'IVZ', 'IRM', 'LB', 'MAC', 'M', 'OXY', 'WMB']
open       27.793
div_pct     7.789
div_amt     2.130
eps         0.224
dtype: float64

cluster 10:
includes 56 stocks
['AES', 'ALB', 'LNT', 'AIG', 'AIV', 'ADM', 'BK', 'BMY', 'COG', 'CPB', 'CF', 'CSCO', 'CFG', 'CMS', 'KO', 'CL', 'CAG', 'COP', 'GLW', 'DAL', 'DRE', 'DXC', 'EVRG', 'EXC', 'FAST', 'FITB', 'FE', 'HAL', 'HPE', 'HFC', 'HPQ', 'INTC', 'JCI', 'JNPR', 'KEY', 'KR', 'LEG', 'LNC', 'MXIM', 'MDLZ', 'MS', 'NTAP', 'NI', 'NUE', 'PEG', 'RF', 'SYF', 'FTI', 'UDR', 'USB', 'UNM', 'WBA', 'WU', 'XEL', 'XRX', 'ZION']
open       44.340000
div_pct     2.839643
div_amt     1.240000
eps         2.648036
dtype: float64

cluster 11:
includes 4 stocks
['BLK', 'AVGO', 'EQIX', 'LMT']
open       443.6425
div_pct      2.7200
div_amt     11.4100
eps         14.8175
dtype: float64

cluster 12:
includes 1 stocks
['BKNG']
open       2008.67
div_pct       0.00
div_amt       0.00
eps          97.36
dtype: float64

cluster 13:
includes 35 stocks
['ACN', 'APD', 'ARE', 'AMT', 'AMP', 'ADP', 'CAT', 'CLX', 'DE', 'HON', 'HII', 'ITW', 'JNJ', 'JPM', 'KLAC', 'LRCX', 'LIN', 'MTB', 'MCD', 'NEE', 'NSC', 'PKG', 'PH', 'PEP', 'PNC', 'RTN', 'ROK', 'RCL', 'SRE', 'SNA', 'TXN', 'TRV', 'UNP', 'UNH', 'WHR']
open       181.172857
div_pct      2.270571
div_amt      3.953714
eps          9.180000
dtype: float64

cluster 14:
includes 5 stocks
['CMG', 'ISRG', 'MTD', 'ORLY', 'TDG']
open       642.804
div_pct      0.000
div_amt      0.000
eps         14.996
dtype: float64

cluster 15:
includes 38 stocks
['AXP', 'ANTM', 'AON', 'AAPL', 'BDX', 'COF', 'CDW', 'CTAS', 'CME', 'COST', 'DG', 'ECL', 'EL', 'HCA', 'HUM', 'IEX', 'INTU', 'JKHY', 'KSU', 'LHX', 'MKTX', 'MAR', 'MLM', 'MA', 'MKC', 'MSFT', 'MCO', 'MSCI', 'PXD', 'RMD', 'ROP', 'SPGI', 'SBAC', 'SYK', 'TFX', 'V', 'DIS', 'WLTW']
open       219.314474
div_pct      1.013947
div_amt      2.072105
eps          7.210789
dtype: float64

cluster 16:
includes 30 stocks
['ABMD', 'ADBE', 'AAP', 'ALGN', 'ANSS', 'ANET', 'CHTR', 'CI', 'COO', 'EW', 'EA', 'FFIV', 'FB', 'FLT', 'IDXX', 'ILMN', 'LH', 'NFLX', 'NVDA', 'ODFL', 'NOW', 'TMO', 'ULTA', 'UAL', 'URI', 'UHS', 'VRSN', 'VRTX', 'WAT', 'WCG']
open       235.018000
div_pct      0.055667
div_amt      0.107333
eps          7.425000
dtype: float64

cluster 17:
includes 31 stocks
['CCL', 'CMA', 'ED', 'DOW', 'EMN', 'EIX', 'GIS', 'GM', 'GILD', 'HOG', 'IP', 'K', 'KSS', 'LVS', 'MPC', 'MET', 'TAP', 'JWN', 'OMC', 'PFE', 'PFG', 'O', 'REG', 'STX', 'SO', 'VTR', 'VZ', 'VNO', 'WFC', 'WDC', 'WRK']
open       58.530645
div_pct     4.003871
div_amt     2.306452
eps         3.771935
dtype: float64

cluster 18:
includes 1 stocks
['AMZN']
open       1795.02
div_pct       0.00
div_amt       0.00
eps          22.57
dtype: float64

Changing our approach: The Wealthy Investor technique

I don't have too much expertise with stock trading, but have been listening to a podcast lately called trading stocks made easy by Tyrone Jackson (great podcast that I'd reccomend to anyone trying to learn more). He heavily advocates for stocks which pay out a dividend, a portion of their profits that isn't reinvested into the company, but rather goes to the shareholders. Additonally, he advocates for stocks that have sshowed consistent quarterly earnings growth. Between the two, dividend yield is a part of the data that has been collected, so I decided to cluster the subset of data for stocks which do pay out a dividend

# get stocks which pay dividend
div_yielding_data = four_dim_cluster_data[four_dim_cluster_data['div_amt'] > 0].drop(columns=cluster_config_one.keys(), axis=1)

k_means_func(data=div_yielding_data)

# apply model for n = {12, 14, 19}
cluster_config_two = {
    'cluster_fourteen': 14,
    'cluster_nineteen': 19,
    'cluster_twenty_three': 23
}

div_yielding_data = classify_four_dim_stocks(div_yielding_data, cluster_config_two)

output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_twenty_three')

cluster 0:
includes 24 stocks
['ACN', 'APD', 'AMT', 'ADP', 'CB', 'CME', 'STZ', 'DE', 'HSY', 'HON', 'ITW', 'KLAC', 'LHX', 'LIN', 'MKC', 'MCD', 'MSI', 'NEE', 'ROK', 'SWK', 'SYK', 'UNP', 'UTX', 'WLTW']
open       187.022500
div_pct      1.841250
div_amt      3.428750
eps          6.843333
dtype: float64

cluster 1:
includes 56 stocks
['AFL', 'ALK', 'ALB', 'LNT', 'AEE', 'AIG', 'AIV', 'BK', 'BMY', 'CHRW', 'CF', 'CSCO', 'C', 'CMS', 'KO', 'CL', 'COP', 'CVS', 'DAL', 'EMN', 'EMR', 'EQR', 'EVRG', 'ES', 'HIG', 'HAS', 'HFC', 'INTC', 'JCI', 'K', 'LEG', 'LNC', 'MPC', 'MXIM', 'MRK', 'MET', 'MDLZ', 'MS', 'NTAP', 'NUE', 'OMC', 'PAYX', 'PLD', 'PEG', 'QCOM', 'RHI', 'STT', 'SYF', 'SYY', 'USB', 'WBA', 'WEC', 'WFC', 'XEL', 'XRX', 'ZION']
open       64.544643
div_pct     2.723750
div_amt     1.752500
eps         3.903393
dtype: float64

cluster 2:
includes 10 stocks
['MMM', 'AMGN', 'AVB', 'BA', 'ESS', 'RE', 'HD', 'IBM', 'PSA', 'SPG']
open       222.526
div_pct      3.329
div_amt      6.878
eps          8.663
dtype: float64

cluster 3:
includes 9 stocks
['APA', 'COTY', 'DXC', 'KHC', 'NWL', 'NLSN', 'SLB', 'FTI', 'WDC']
open       28.590000
div_pct     4.234444
div_amt     1.165556
eps        -4.783333
dtype: float64

cluster 4:
includes 61 stocks
['ATVI', 'A', 'AAL', 'AOS', 'AMAT', 'ARNC', 'BLL', 'BAC', 'BAX', 'BWA', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTSH', 'CMCSA', 'CTVA', 'CSX', 'DHI', 'XRAY', 'DVN', 'DD', 'ETFC', 'EBAY', 'EXPD', 'FLIR', 'FLS', 'FBHS', 'FOXA', 'FOX', 'FCX', 'GE', 'HES', 'HRL', 'KR', 'LW', 'LEN', 'L', 'MRO', 'MAS', 'MGM', 'MOS', 'NEM', 'NWSA', 'NWS', 'NBL', 'NRG', 'ORCL', 'PNR', 'PRGO', 'PGR', 'PHM', 'PWR', 'ROL', 'SEE', 'LUV', 'TXT', 'TJX', 'WRB', 'WAB', 'XYL']
open       48.194426
div_pct     1.265738
div_amt     0.588852
eps         2.369180
dtype: float64

cluster 5:
includes 15 stocks
['AAPL', 'BDX', 'CTAS', 'COO', 'COST', 'INTU', 'MKTX', 'MLM', 'MA', 'MCO', 'MSCI', 'ROP', 'SPGI', 'TFX', 'TMO']
open       295.877333
div_pct      0.719333
div_amt      2.038667
eps          8.054667
dtype: float64

cluster 6:
includes 11 stocks
['MO', 'CTL', 'F', 'GPS', 'HP', 'IVZ', 'IRM', 'KIM', 'LB', 'OXY', 'WMB']
open       25.681818
div_pct     6.781818
div_amt     1.770909
eps         0.169091
dtype: float64

cluster 7:
includes 27 stocks
['ARE', 'AEP', 'BXP', 'CVX', 'CLX', 'ED', 'CCI', 'DRI', 'DLR', 'DTE', 'DUK', 'ETN', 'ETR', 'EXR', 'FRT', 'GPC', 'IFF', 'SJM', 'JNJ', 'KMB', 'MAA', 'PNW', 'PG', 'TXN', 'UPS', 'VLO', 'WYNN']
open       118.426296
div_pct      3.174815
div_amt      3.709259
eps          4.370370
dtype: float64

cluster 8:
includes 2 stocks
['CAH', 'NOV']
open       37.445
div_pct     2.210
div_amt     1.060
eps       -14.465
dtype: float64

cluster 9:
includes 1 stocks
['AVGO']
open       324.40
div_pct      4.01
div_amt     13.00
eps          6.43
dtype: float64

cluster 10:
includes 2 stocks
['BLK', 'LMT']
open       445.285
div_pct      2.555
div_amt     11.400
eps         23.465
dtype: float64

cluster 11:
includes 25 stocks
['AAP', 'ALLE', 'AXP', 'AON', 'COF', 'CDW', 'CI', 'CXO', 'FANG', 'DG', 'ECL', 'EL', 'FRC', 'FTV', 'GL', 'HCA', 'IEX', 'KSU', 'NVDA', 'ODFL', 'PVH', 'UHS', 'V', 'VMC', 'DIS']
open       147.0420
div_pct      0.7716
div_amt      1.1076
eps          6.7476
dtype: float64

cluster 12:
includes 28 stocks
['ABBV', 'T', 'CCL', 'D', 'DOW', 'EIX', 'XOM', 'GIS', 'GM', 'GILD', 'IP', 'KSS', 'LVS', 'TAP', 'OKE', 'PM', 'PPL', 'PFG', 'O', 'REG', 'STX', 'SLG', 'SO', 'TPR', 'VTR', 'VZ', 'WELL', 'WRK']
open       60.232500
div_pct     4.505714
div_amt     2.699643
eps         2.902500
dtype: float64

cluster 13:
includes 1 stocks
['EQIX']
open       559.60
div_pct      1.76
div_amt      9.84
eps          5.91
dtype: float64

cluster 14:
includes 10 stocks
['AMP', 'CAT', 'CMI', 'MTB', 'NSC', 'PH', 'PNC', 'RTN', 'SNA', 'WHR']
open       175.944
div_pct      2.468
div_amt      4.241
eps         12.811
dtype: float64

cluster 15:
includes 1 stocks
['SHW']
open       579.73
div_pct      0.78
div_amt      4.52
eps         14.86
dtype: float64

cluster 16:
includes 36 stocks
['AES', 'AMCR', 'ADM', 'COG', 'CPB', 'CNP', 'CFG', 'CAG', 'GLW', 'DRE', 'EXC', 'FAST', 'FITB', 'FE', 'BEN', 'HRB', 'HAL', 'HBI', 'HOG', 'HPE', 'HST', 'HPQ', 'HBAN', 'IPG', 'JNPR', 'KEY', 'KMI', 'NI', 'JWN', 'PBCT', 'PFE', 'RF', 'UDR', 'UNM', 'WU', 'WY']
open       28.308056
div_pct     3.518889
div_amt     0.972778
eps         1.740278
dtype: float64

cluster 17:
includes 1 stocks
['AGN']
open       190.50
div_pct      1.56
div_amt      2.96
eps        -27.98
dtype: float64

cluster 18:
includes 25 stocks
['ABT', 'AMG', 'AME', 'APH', 'APTV', 'DHR', 'EFX', 'EXPE', 'FDX', 'FIS', 'GPN', 'HLT', 'ICE', 'JKHY', 'JBHT', 'MCK', 'MCHP', 'NKE', 'PKI', 'RMD', 'ROST', 'SBAC', 'VRSK', 'ZBH', 'ZTS']
open       126.8680
div_pct      0.9396
div_amt      1.1628
eps          1.9892
dtype: float64

cluster 19:
includes 8 stocks
['ANTM', 'GS', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'UNH']
open       299.73875
div_pct      1.47500
div_amt      4.31000
eps         16.78125
dtype: float64

cluster 20:
includes 46 stocks
['ALL', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'CBOE', 'CE', 'CINF', 'CTXS', 'DFS', 'DOV', 'EOG', 'FMC', 'GRMN', 'IR', 'LDOS', 'LOW', 'MAR', 'MMC', 'MDT', 'MSFT', 'NDAQ', 'PCAR', 'PXD', 'PPG', 'DGX', 'RL', 'RJF', 'RSG', 'SWKS', 'SBUX', 'TGT', 'TEL', 'TIF', 'TSCO', 'TSN', 'VFC', 'WMT', 'WM', 'XLNX', 'YUM']
open       110.026087
div_pct      1.757174
div_amt      1.916739
eps          4.684783
dtype: float64

cluster 21:
includes 2 stocks
['MAC', 'M']
open       21.180
div_pct    10.490
div_amt     2.255
eps         1.835
dtype: float64

other_keys = [key for key in cluster_config_two.keys() if key != 'cluster_twenty_three']
div_yielding_agg = div_yielding_data.drop(columns=other_keys, axis=1).groupby('cluster_twenty_three').mean()

div_yielding_agg

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	eps	open	div_pct	div_amt
cluster_twenty_three
0	6.843333	187.022500	1.841250	3.428750
1	3.903393	64.544643	2.723750	1.752500
2	8.663000	222.526000	3.329000	6.878000
3	-4.783333	28.590000	4.234444	1.165556
4	2.369180	48.194426	1.265738	0.588852
5	8.054667	295.877333	0.719333	2.038667
6	0.169091	25.681818	6.781818	1.770909
7	4.370370	118.426296	3.174815	3.709259
8	-14.465000	37.445000	2.210000	1.060000
9	6.430000	324.400000	4.010000	13.000000
10	23.465000	445.285000	2.555000	11.400000
11	6.747600	147.042000	0.771600	1.107600
12	2.902500	60.232500	4.505714	2.699643
13	5.910000	559.600000	1.760000	9.840000
14	12.811000	175.944000	2.468000	4.241000
15	14.860000	579.730000	0.780000	4.520000
16	1.740278	28.308056	3.518889	0.972778
17	-27.980000	190.500000	1.560000	2.960000
18	1.989200	126.868000	0.939600	1.162800
19	16.781250	299.738750	1.475000	4.310000
20	4.684783	110.026087	1.757174	1.916739
21	1.835000	21.180000	10.490000	2.255000
22	9.195333	114.239333	2.994667	3.286000

Plotting the results

Finally! We have some simple visualization of the aggregated data for our clustered dividend yielding S&P 500 stocks. Based on these plots, I'm going to take a closer look at a few of the clusters:

cluster 10/19: these clusters has the highest earnings per share on average of all clusters
cluster 9/10/13: These clusters had the highest average dividend amounts per share of any cluster
cluster 6/21: these clusters by far had the highest percentage dividend of any cluster

Although open value was included in the feature set (with the intention of clustering stocks based on similar cost per share), open value for an arbritrary day does not seem like a good feature to indicate a specific cluster to consider more carefully

plt.figure(figsize=(12,12))
ax1 = plt.subplot(221)
ax1.title.set_text('average EPS per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.eps)
ax2 = plt.subplot(222)
ax2.title.set_text('average dividend amount per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.div_amt)
ax3 = plt.subplot(223)
ax3.title.set_text('average dividend percentage per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.div_pct)
ax4 = plt.subplot(224)
ax4.title.set_text('average open value per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.open)

<matplotlib.axes._subplots.AxesSubplot at 0x1a1fe5d630>

Results

Although these results are far from finished, and I will need to comb through financial figures and track these stocks for more than just one day, it is clear that clustering through the K-means algorithm has allowed me to hone initial search for potentially lucrative S&P 500 stocks. This was a fun and quick 1-day venture that allowed me to get more familiar with relevant financial figures for stock trading, scraping stock data, and applying machine learning techniques to an interesting data set

# we can use the output cluster tickers function, passsing an optional parameter which specifies
# which clusters to show the tickers for.
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_twenty_three', show_tickers=[6, 9, 10, 13, 19, 21])

cluster 6:
includes 11 stocks
['MO', 'CTL', 'F', 'GPS', 'HP', 'IVZ', 'IRM', 'KIM', 'LB', 'OXY', 'WMB']
open       25.681818
div_pct     6.781818
div_amt     1.770909
eps         0.169091
dtype: float64

cluster 9:
includes 1 stocks
['AVGO']
open       324.40
div_pct      4.01
div_amt     13.00
eps          6.43
dtype: float64

cluster 10:
includes 2 stocks
['BLK', 'LMT']
open       445.285
div_pct      2.555
div_amt     11.400
eps         23.465
dtype: float64

cluster 13:
includes 1 stocks
['EQIX']
open       559.60
div_pct      1.76
div_amt      9.84
eps          5.91
dtype: float64

cluster 19:
includes 8 stocks
['ANTM', 'GS', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'UNH']
open       299.73875
div_pct      1.47500
div_amt      4.31000
eps         16.78125
dtype: float64

cluster 21:
includes 2 stocks
['MAC', 'M']
open       21.180
div_pct    10.490
div_amt     2.255
eps         1.835
dtype: float64

# we can use the output cluster tickers function, passsing an optional parameter which specifies
# which clusters to show the tickers for.
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_nineteen')

cluster 0:
includes 17 stocks
['AAPL', 'BDX', 'CI', 'CTAS', 'COO', 'COST', 'INTU', 'MKTX', 'MLM', 'MA', 'MCO', 'MSCI', 'ROP', 'SPGI', 'SYK', 'TFX', 'TMO']
open       284.572941
div_pct      0.702353
div_amt      1.936471
eps          8.352353
dtype: float64

cluster 1:
includes 58 stocks
['ALK', 'ALB', 'LNT', 'AEE', 'AEP', 'AIV', 'ADM', 'BK', 'BMY', 'CHRW', 'CSCO', 'C', 'CFG', 'CMS', 'KO', 'CL', 'COP', 'CVS', 'DAL', 'EMN', 'EMR', 'EQR', 'EVRG', 'ES', 'EXC', 'FITB', 'FE', 'GIS', 'HIG', 'HAS', 'HFC', 'INTC', 'JCI', 'K', 'LEG', 'LNC', 'MPC', 'MXIM', 'MRK', 'MET', 'MS', 'NTAP', 'NUE', 'OMC', 'PAYX', 'PFG', 'PLD', 'PEG', 'QCOM', 'STT', 'SYF', 'SYY', 'USB', 'WBA', 'WEC', 'WFC', 'XEL', 'ZION']
open       64.173103
div_pct     2.854655
div_amt     1.809828
eps         3.905172
dtype: float64

cluster 2:
includes 7 stocks
['AMP', 'CMI', 'GS', 'MTB', 'SNA', 'VNO', 'WHR']
open       162.411429
div_pct      2.827143
div_amt      4.325714
eps         15.898571
dtype: float64

cluster 3:
includes 32 stocks
['ARE', 'ADS', 'BXP', 'CVX', 'CLX', 'CMA', 'ED', 'DRI', 'DTE', 'ETN', 'ETR', 'FRT', 'GPC', 'HSY', 'SJM', 'JNJ', 'KMB', 'LLY', 'LYB', 'MAA', 'NTRS', 'PKG', 'PEP', 'PSX', 'PRU', 'RCL', 'TROW', 'TXN', 'TRV', 'UPS', 'VLO', 'WYNN']
open       119.817812
div_pct      3.031875
div_amt      3.562812
eps          6.331563
dtype: float64

cluster 4:
includes 8 stocks
['APA', 'CAH', 'CTL', 'COTY', 'KHC', 'NLSN', 'SLB', 'WDC']
open       30.72875
div_pct     4.91375
div_amt     1.39125
eps        -6.71250
dtype: float64

cluster 5:
includes 4 stocks
['BA', 'AVGO', 'EQIX', 'ESS']
open       377.2375
div_pct      2.7275
div_amt      9.7150
eps          6.3675
dtype: float64

cluster 6:
includes 56 stocks
['ALL', 'AXP', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'COF', 'CBOE', 'CE', 'CINF', 'CTXS', 'STZ', 'DFS', 'DOV', 'EXPE', 'FDX', 'FMC', 'GRMN', 'IR', 'IFF', 'LDOS', 'LOW', 'MAR', 'MMC', 'MKC', 'MDT', 'MCHP', 'MSFT', 'MSI', 'NDAQ', 'PCAR', 'PPG', 'PG', 'DGX', 'RL', 'RJF', 'RSG', 'SWKS', 'SWK', 'SBUX', 'TGT', 'TEL', 'TIF', 'TSCO', 'TSN', 'UTX', 'VFC', 'WMT', 'WM', 'XLNX', 'YUM']
open       116.192143
div_pct      1.759286
div_amt      2.030893
eps          4.663929
dtype: float64

cluster 7:
includes 50 stocks
['ATVI', 'A', 'AAL', 'AME', 'APH', 'AMAT', 'APTV', 'ARNC', 'BLL', 'BAX', 'BWA', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTSH', 'CXO', 'CSX', 'DHI', 'XRAY', 'DVN', 'FANG', 'ETFC', 'EOG', 'EXPD', 'FLIR', 'FTV', 'FBHS', 'FOXA', 'FOX', 'GE', 'HLT', 'ICE', 'LW', 'LEN', 'L', 'MAS', 'NEM', 'NKE', 'NRG', 'PKI', 'PGR', 'PHM', 'PWR', 'LUV', 'TXT', 'TJX', 'WRB', 'WAB', 'XYL']
open       63.3910
div_pct     0.9736
div_amt     0.6050
eps         3.2894
dtype: float64

cluster 8:
includes 1 stocks
['AGN']
open       190.50
div_pct      1.56
div_amt      2.96
eps        -27.98
dtype: float64

cluster 9:
includes 2 stocks
['BLK', 'LMT']
open       445.285
div_pct      2.555
div_amt     11.400
eps         23.465
dtype: float64

cluster 10:
includes 31 stocks
['AAP', 'ALLE', 'AON', 'CDW', 'DHR', 'DG', 'ECL', 'EL', 'FIS', 'FRC', 'GL', 'GPN', 'HCA', 'IEX', 'JKHY', 'JBHT', 'KSU', 'NVDA', 'ODFL', 'PXD', 'PVH', 'RMD', 'ROST', 'SBAC', 'UHS', 'VRSK', 'V', 'VMC', 'DIS', 'ZBH', 'ZTS']
open       155.143548
div_pct      0.778065
div_amt      1.189677
eps          4.845484
dtype: float64

cluster 11:
includes 10 stocks
['MO', 'F', 'HP', 'IVZ', 'IRM', 'LB', 'MAC', 'M', 'OXY', 'WMB']
open       27.418
div_pct     7.692
div_amt     2.090
eps         1.003
dtype: float64

cluster 12:
includes 41 stocks
['ABT', 'AES', 'AFL', 'AIG', 'AOS', 'BAC', 'COG', 'CPB', 'CF', 'CMCSA', 'CAG', 'GLW', 'CTVA', 'DRE', 'DD', 'EBAY', 'FAST', 'FLS', 'FCX', 'HAL', 'HES', 'HPE', 'HRL', 'JNPR', 'KR', 'MRO', 'MGM', 'MDLZ', 'MOS', 'NWSA', 'NWS', 'NI', 'ORCL', 'PNR', 'PRGO', 'RHI', 'ROL', 'SEE', 'UDR', 'WU', 'XRX']
open       37.565122
div_pct     2.142927
div_amt     0.790732
eps         1.632439
dtype: float64

cluster 13:
includes 10 stocks
['AMGN', 'ANTM', 'RE', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'SHW', 'UNH']
open       326.680
div_pct      1.528
div_amt      4.660
eps         15.004
dtype: float64

cluster 14:
includes 27 stocks
['MMM', 'ACN', 'APD', 'AMT', 'ADP', 'AVB', 'CAT', 'CB', 'CME', 'DE', 'HD', 'HON', 'ITW', 'JPM', 'KLAC', 'LHX', 'LIN', 'MCD', 'NEE', 'NSC', 'PH', 'PNC', 'RTN', 'ROK', 'SRE', 'UNP', 'WLTW']
open       189.472593
div_pct      2.135185
div_amt      3.988148
eps          8.268148
dtype: float64

cluster 15:
includes 23 stocks
['ABBV', 'CCI', 'DLR', 'D', 'DOW', 'DUK', 'EIX', 'EXR', 'XOM', 'GILD', 'KSS', 'LVS', 'OKE', 'PM', 'PNW', 'O', 'REG', 'STX', 'SLG', 'SO', 'VTR', 'VZ', 'WELL']
open       77.526087
div_pct     4.353913
div_amt     3.316087
eps         2.796087
dtype: float64

cluster 16:
includes 3 stocks
['IBM', 'PSA', 'SPG']
open       161.196667
div_pct      4.833333
div_amt      7.626667
eps          8.170000
dtype: float64

cluster 17:
includes 29 stocks
['AMCR', 'T', 'CCL', 'CNP', 'BEN', 'GPS', 'GM', 'HRB', 'HBI', 'HOG', 'HST', 'HPQ', 'HBAN', 'IP', 'IPG', 'KEY', 'KIM', 'KMI', 'TAP', 'NWL', 'JWN', 'PBCT', 'PFE', 'PPL', 'RF', 'TPR', 'UNM', 'WRK', 'WY']
open       27.975517
div_pct     4.359310
div_amt     1.215172
eps         2.048621
dtype: float64

aashrayanand / stock-data-cluster-analysis Goto Github PK

stock-data-cluster-analysis's Introduction

S&P 500 Stock Clustering

Data Accquisition

Saving the data

Preprocessing

Clustering the data: The K-Means algorithm

Measuring the performance of K-Means clustering

Finalizing our clusterings

Changing our approach: The Wealthy Investor technique

Plotting the results

Results

stock-data-cluster-analysis's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent