This notebook demonstrates a clustering of the S&P 500 stock exchange, based on a select set of financial figures
The exchange consists of 500 companies, but includes 505 common stocks, due to 5 companies having two shares of stocks in the exchange (Facebook, Under-Armour, NewsCorp, Comcast and 21st Century Fox)
# libraries for making requests and parsing HTML
import requests
from bs4 import BeautifulSoup
# plotting
import matplotlib.pyplot as plt
import seaborn as sns
# sklearn for kmeans and model metrics
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
# pandas, for data wrangling
import pandas as pd
/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
For the data I wanted access to, the existing APIs for financial data did not work out. Instead. I decided to manually scrape the data, ussing Wikipedia and Yahoo Finance.
- scrape the list of S&P 500 tickers from Wikipedia
- scrape the financial figures for each stock ticker from Yahoo Finance
# URL to get S&P tickers from
TICKER_URL = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
# multi-level identifier, to select each row of ticker table in HTML response
TABLE_IDENTIFIER = '#constituents tbody tr td'
# yahoo finance URL we can use to scrape data for each company
YAHOO_URL = 'http://finance.yahoo.com/quote/'
# HTML classes for various elements on yahoo finance page
YAHOO_TABLE_CLASS = 'Ta(end) Fw(600) Lh(14px)'
# EPS (TTM) react-id
# Open price react-id
# Div/Yield react-id
YAHOO_IDS = ['OPEN-value', 'EPS_RATIO-value', 'DIVIDEND_AND_YIELD-value', 'PE_RATIO-value']
# get HTML content from wikipedia S&P 500 page
res = BeautifulSoup(requests.get(TICKER_URL).text, 'html.parser')
# get the table of stock ticker data, selecting on TABLE_ID
table_data = [ticker for ticker in res.select(TABLE_IDENTIFIER)]
# iterate over each row of table (9 elements of information), and extract the individual tickers
tickers = [table_data[i].text for i in range(0, len(table_data), 9)]
# iterate through the S&P 500 company tickers, and collect data from Yahoo Finance
def get_yahoo_ticker_data(tickers):
ticker_data = []
# make GET request for specified ticker
print(len(tickers))
for i, ticker in enumerate(tickers):
print(i)
try:
REQ_URL = YAHOO_URL + ticker[:-1] + '?p=' + ticker[:-1]
ticker_i_res = requests.get(REQ_URL)
ticker_i_parser = BeautifulSoup(ticker_i_res.text, 'html.parser')
ticker_i_data = [ticker[:-1]]
ticker_i_open_eps_div = [ticker_i_parser.find(attrs={'class': YAHOO_TABLE_CLASS, 'data-test': id_}).text for id_ in YAHOO_IDS]
for data in ticker_i_open_eps_div:
ticker_i_data.append(data)
ticker_data.append(ticker_i_data)
except:
print("error for " + ticker)
continue
return ticker_data
The process of scraping all of the necessary data was rather cumbersome, so it made sense to save the data to file for future experiments
# convert yahoo finance data to dataframe
# will include:
# EPS (TTM) => earnings per share for trailing 12 months
# Dividend/Yield => dividend per share / price per share
# P/E ratio => share price / earnings per share
try:
df = pd.read_csv('data.csv')
except:
# iterate over stock tickers, and get 1 year of time-series data
market_data = pd.DataFrame()
yahoo_data = get_yahoo_ticker_data(tickers)
df = pd.DataFrame(yahoo_data, columns=['ticker', 'open', 'eps', 'div'])#, 'pe'],)
df.to_csv(path_or_buf='data.csv')
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Unnamed: 0 | ticker | open | eps | div | |
---|---|---|---|---|---|
0 | 0 | MMM | 169.78 | 8.43 | 5.76 (3.39%) |
1 | 1 | ABT | 87.08 | 1.84 | 1.44 (1.65%) |
2 | 2 | ABBV | 90.05 | 2.18 | 4.72 (5.24%) |
3 | 3 | ABMD | 179.85 | 4.79 | N/A (N/A) |
4 | 4 | ACN | 203.60 | 7.36 | 3.72 (1.83%) |
df['div'] = df['div'].replace({'N/A (N/A)': 0})
Some data preprocessing is required before proceeding forward with experimentation
- separating percentage dividend yield and dividend yield amount into two separate featuress
- reformatting some features into representations that could be converted to numerical types
- casting features of DataFrame to numerical types
# drop NaN values
df = df.dropna()
# remove NaN values that aren't using NaN value
#df = df[df['eps'] != 'N/A']
df['eps'] = df['eps'].astype(float)
# preprocess open values
df['open'] = df['open'].astype(str)
df['open'] = df['open'].apply(lambda x: x.replace(',', '')).astype(float)
# split dividend into amount and percentage
df['div'] = df['div'].astype(str)
df['div_pct'] = df['div'].apply(lambda x: x.split(' ')[1] if len(x.split(' ')) > 1 else '(0%)')
df['div_pct'] = df['div_pct'].apply(lambda x: x[1:-2]).astype(float)
df['div_amt'] = df['div'].apply(lambda x: x.split(' ')[0]).astype(float)
df = df.drop(['div'], axis=1)
df.isnull().sum()
Unnamed: 0 0
ticker 0
open 0
eps 0
div_pct 0
div_amt 0
dtype: int64
# relevant data for now, will be using these columns for k-means clustering
two_dim_cluster_data = df[['ticker', 'eps', 'div_pct']]
four_dim_cluster_data = df[['ticker', 'eps', 'open', 'div_pct', 'div_amt']]
sns.scatterplot(x='eps', y='div_pct', data=two_dim_cluster_data)
<matplotlib.axes._subplots.AxesSubplot at 0x1a10f6b6a0>
Now that the data the accquisition and preprocessing was complete, the next step is clustering our stock data, analyzing the performance of the clustering, based on the number of centroids, and then generating a final clustering based on some performance metrics.
The K-means algorithm operates as follows:
1. a number of "centroids" are randomly initialized (the number of hyperparameter of the model), these centroid
match the dimension of the feature set, and can be imagine as a vector into some n-dimensional space
2. every sample in the data set is then compared to each of the randomly initialized centroids, to see how far
it is away from the centroid. Since the samples and centroids are vectors, the distance
between a vector v and a centroid u is the vector normal of the difference between the two vectors
((u1-v1)^2 + (u2-v2)^2 + ....)^(1/2). Each sample is then "clustered" with the centroid it is closest to.
3. After each sample has been clustered with a specific centroid, each centroid is repositioned, such that it
is the average of all of the samples that have been clustered with it.
4. The sample association and centroid repositioning steps are then repeated for some number of iterations
# iterate over a variety of amounts of cluster centroids for clustering our stock data
# looking for an "elbow" in the sum of squared error plot, for different amounts of centroids
def k_means_func(data, max_centroids=25):
# transform numerical features (eps and percentage dividend)
transform_data = StandardScaler().fit_transform(data.iloc[:,1:])
sum_square_err = {}
sil_score = {}
for num_centroids in range(2,max_centroids):
model = KMeans(n_clusters=num_centroids, random_state=2, n_init=10)
model.fit(transform_data)
sum_square_err[num_centroids] = model.inertia_
sil_score[num_centroids] = silhouette_score(transform_data, model.labels_, random_state=2)
plt.figure(figsize=(16,6))
ax1 = plt.subplot(211)
plt.plot(list(sum_square_err.keys()), list(sum_square_err.values()))
ax1.title.set_text("k-means sum squared error")
plt.xlabel("num. centroids")
plt.ylabel("sum squared error")
plt.xticks([i for i in range(2, max_centroids)])
ax2 = plt.subplot(212)
plt.plot(list(sil_score.keys()), list(sil_score.values()))
ax2.title.set_text("k-means silhouette score")
plt.xlabel("num. centroids")
plt.ylabel("score")
plt.xticks([i for i in range(2, max_centroids)])
plt.yticks([i / 10 for i in range(10)])
The K-means algorithm cannot be measured in performance in the same way as supervised learning algorithms. There is no prediction error, since the data we are given is unlabeled, and instead, we measure the performance of the k-means algorithm based on the ability of the chosen number of centroids to effectively cluster the data. Notely, one of the common metrics for K-means is measuring the squared sum of errors between each sample and the centroid it is clustered with, where the squared error is just the squared vector normal of the difference between the sample and the centroid
In addition to the squared sum of errors, K-means is often measured using the silhouette score. This metric is the mean of the silhouette coefficient for every sample. The silhouette coefficient can be defined as follows:
- for a sample S, we define A(S) as the mean distance between S and every other element in S's assigned cluster
- we define B(S) as the mean distance between S, and every point in the closest cluster to S, other than S's assigned cluster
- we define SC(S), the silhouette coefficient, as the difference between A(S) and B(S), divided by the larger of A(S) and B(S)
- therefore, SC(S) ranges from 0 to 1, where SC(S) = 1 means the mean distance from S to every point in S's cluster is 0, and SC(S) = 0 means that the mean distance from S to every point in its cluster is the same as the mean distance from S to every point in the nearest other cluster
Below, we plot these metrics for our application of K-means to the stock data, we can see the following:
- The silhouette score drops rather quickly after n grows greater than 3-4, this implies that a small amount of clusters most likely results in a few disparate clusters (with a single cluster comprising much of the data)
- The silhouette score stabilizes after it drops to
0.4, while the SSE continues to drop rapidly until n10 - The silhouette score bumps up slightly for a few values of n (n = 11, n = 15, n = 20), these are likely good values for n, since the silhouette score is stable but slightly up, while the SSE continues to go down
k_means_func(two_dim_cluster_data)
k_means_func(four_dim_cluster_data)
Given that we have identified a few values for our centroid hyperparameter that seem fruitful, the next step is to fit and cluster the data for these specified values, our results will not be predictions of an output variable, as is the case in supervised learning, but rather, predictions of certain groupings of our stock tickers
def classify_four_dim_stocks(data, cluster_configs):
transform_data = StandardScaler().fit_transform(data.iloc[:,1:])
# initialize K-means models with each of the specified cluster hyperparameter valuess
for config in cluster_configs.keys():
model = KMeans(n_clusters=cluster_configs[config], random_state=5, n_init=10)
model.fit(transform_data)
data[config] = model.labels_
return data
cluster_config_one = {
'cluster_five': 5,
'cluster_ten': 10,
'cluster_fourteen': 14,
'cluster_twenty': 20
}
four_dim_cluster_data = classify_four_dim_stocks(four_dim_cluster_data[['ticker', 'eps', 'open', 'div_pct', 'div_amt']], cluster_config_one)
/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
import sys
four_dim_cluster_data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ticker | eps | open | div_pct | div_amt | cluster_five | cluster_ten | cluster_fourteen | cluster_twenty | |
---|---|---|---|---|---|---|---|---|---|
0 | MMM | 8.43 | 169.78 | 3.39 | 5.76 | 0 | 4 | 11 | 19 |
1 | ABT | 1.84 | 87.08 | 1.65 | 1.44 | 2 | 5 | 1 | 2 |
2 | ABBV | 2.18 | 90.05 | 5.24 | 4.72 | 0 | 4 | 13 | 1 |
3 | ABMD | 4.79 | 179.85 | 0.00 | 0.00 | 2 | 8 | 5 | 16 |
4 | ACN | 7.36 | 203.60 | 1.83 | 3.72 | 0 | 4 | 3 | 13 |
5 | ATVI | 2.11 | 58.34 | 0.63 | 0.37 | 2 | 8 | 5 | 7 |
6 | ADBE | 6.00 | 322.10 | 0.00 | 0.00 | 2 | 2 | 12 | 16 |
7 | AMD | 0.19 | 42.79 | 0.00 | 0.00 | 2 | 8 | 5 | 7 |
8 | AAP | 6.17 | 158.13 | 0.16 | 0.24 | 2 | 8 | 5 | 16 |
9 | AES | 0.76 | 18.88 | 3.03 | 0.57 | 3 | 0 | 9 | 10 |
10 | AMG | -3.35 | 86.68 | 1.50 | 1.28 | 2 | 5 | 1 | 2 |
11 | AFL | 4.05 | 53.33 | 2.03 | 1.08 | 2 | 5 | 1 | 2 |
12 | A | 3.37 | 83.75 | 0.85 | 0.72 | 2 | 5 | 1 | 2 |
13 | APD | 7.94 | 235.09 | 1.98 | 4.64 | 0 | 4 | 11 | 13 |
14 | AKAM | 2.74 | 84.44 | 0.00 | 0.00 | 2 | 8 | 5 | 7 |
15 | ALK | 4.92 | 70.41 | 2.03 | 1.40 | 2 | 5 | 1 | 2 |
16 | ALB | 5.38 | 68.90 | 2.22 | 1.47 | 2 | 5 | 9 | 10 |
17 | ARE | 1.09 | 155.29 | 2.63 | 4.12 | 0 | 4 | 3 | 13 |
18 | ALXN | 6.52 | 109.43 | 0.00 | 0.00 | 2 | 8 | 5 | 7 |
19 | ALGN | 5.21 | 269.48 | 0.00 | 0.00 | 2 | 8 | 5 | 16 |
20 | ALLE | 4.79 | 123.53 | 0.87 | 1.08 | 2 | 5 | 1 | 2 |
21 | AGN | -27.98 | 190.50 | 1.56 | 2.96 | 2 | 5 | 1 | 5 |
22 | ADS | 8.81 | 109.78 | 2.31 | 2.52 | 0 | 7 | 3 | 5 |
23 | LNT | 2.24 | 54.08 | 2.64 | 1.42 | 3 | 0 | 9 | 10 |
24 | ALL | 7.32 | 110.18 | 1.82 | 2.00 | 2 | 7 | 7 | 5 |
25 | GOOGL | 46.60 | 1357.00 | 0.00 | 0.00 | 1 | 1 | 8 | 4 |
26 | GOOG | 46.60 | 1356.60 | 0.00 | 0.00 | 1 | 1 | 8 | 4 |
27 | MO | 0.93 | 50.90 | 6.61 | 3.36 | 3 | 6 | 0 | 9 |
28 | AMZN | 22.57 | 1795.02 | 0.00 | 0.00 | 1 | 1 | 8 | 18 |
29 | AMCR | 0.31 | 10.75 | 4.26 | 0.46 | 3 | 0 | 10 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
474 | V | 5.32 | 185.52 | 0.65 | 1.20 | 2 | 5 | 7 | 15 |
475 | VNO | 15.73 | 65.26 | 4.06 | 2.64 | 3 | 0 | 13 | 17 |
476 | VMC | 4.50 | 143.12 | 0.87 | 1.24 | 2 | 5 | 1 | 2 |
477 | WRB | 3.61 | 69.64 | 0.63 | 0.44 | 2 | 8 | 5 | 7 |
478 | WAB | 1.46 | 74.31 | 0.64 | 0.48 | 2 | 8 | 5 | 7 |
479 | WMT | 5.00 | 121.51 | 1.75 | 2.12 | 2 | 7 | 7 | 5 |
480 | WBA | 4.31 | 57.23 | 3.21 | 1.83 | 3 | 0 | 9 | 10 |
481 | DIS | 6.64 | 147.77 | 1.19 | 1.76 | 2 | 7 | 7 | 15 |
482 | WM | 4.09 | 113.02 | 1.81 | 2.05 | 2 | 7 | 7 | 5 |
483 | WAT | 8.13 | 231.00 | 0.00 | 0.00 | 2 | 8 | 5 | 16 |
484 | WEC | 3.45 | 91.33 | 2.78 | 2.53 | 3 | 0 | 9 | 5 |
485 | WCG | 12.44 | 317.40 | 0.00 | 0.00 | 2 | 2 | 12 | 16 |
486 | WFC | 4.65 | 54.46 | 3.75 | 2.04 | 3 | 0 | 9 | 17 |
487 | WELL | 2.80 | 76.97 | 4.50 | 3.48 | 3 | 4 | 13 | 1 |
488 | WDC | -5.26 | 57.17 | 3.49 | 2.00 | 3 | 0 | 9 | 17 |
489 | WU | 2.60 | 26.87 | 2.98 | 0.80 | 3 | 0 | 9 | 10 |
490 | WRK | 3.33 | 41.87 | 4.43 | 1.86 | 3 | 0 | 10 | 17 |
491 | WY | -0.21 | 29.74 | 4.59 | 1.36 | 3 | 0 | 10 | 0 |
492 | WHR | 16.58 | 147.25 | 3.27 | 4.80 | 0 | 4 | 11 | 13 |
493 | WMB | 0.12 | 23.04 | 6.63 | 1.52 | 3 | 6 | 0 | 9 |
494 | WLTW | 6.74 | 201.02 | 1.29 | 2.60 | 2 | 7 | 7 | 15 |
495 | WYNN | 6.16 | 138.00 | 3.00 | 4.00 | 0 | 4 | 3 | 1 |
496 | XEL | 2.50 | 63.57 | 2.55 | 1.62 | 3 | 0 | 9 | 10 |
497 | XRX | 2.84 | 36.93 | 2.69 | 1.00 | 3 | 0 | 9 | 10 |
498 | XLNX | 3.71 | 96.26 | 1.54 | 1.48 | 2 | 5 | 1 | 2 |
499 | XYL | 2.80 | 78.10 | 1.23 | 0.96 | 2 | 5 | 1 | 2 |
500 | YUM | 3.62 | 99.48 | 1.69 | 1.68 | 2 | 5 | 7 | 5 |
501 | ZBH | -0.44 | 149.90 | 0.64 | 0.96 | 2 | 5 | 1 | 2 |
502 | ZION | 4.27 | 51.60 | 2.64 | 1.36 | 3 | 0 | 9 | 10 |
503 | ZTS | 3.02 | 127.15 | 0.63 | 0.80 | 2 | 8 | 5 | 2 |
497 rows ร 9 columns
def output_cluster_tickers(original_data, cluster_data, cluster, show_tickers=[]):
for i in range(0, max(cluster_data[cluster])):
if(i in show_tickers or len(show_tickers) == 0):
# list of tickers for the current cluster
ticker_list = list(cluster_data[cluster_data[cluster] == i]['ticker'])
print("cluster " + str(i) + ":")
print("includes " + str(len(ticker_list)) + " stocks")
print(ticker_list)
# original data for tickers that are part of cluster, more useful than
# the transformed data
curr_data = original_data[original_data['ticker'].isin(ticker_list)]
print(curr_data[['open', 'div_pct', 'div_amt', 'eps']].mean())
print()
output_cluster_tickers(df, four_dim_cluster_data, 'cluster_twenty')
cluster 0:
includes 24 stocks
['AMCR', 'APA', 'T', 'CAH', 'CNP', 'COTY', 'F', 'BEN', 'GPS', 'HRB', 'HBI', 'HST', 'HBAN', 'IPG', 'KIM', 'KMI', 'KHC', 'NWL', 'NLSN', 'PBCT', 'PPL', 'SLB', 'TPR', 'WY']
open 23.628750
div_pct 4.732500
div_amt 1.106667
eps -0.760000
dtype: float64
cluster 1:
includes 27 stocks
['ABBV', 'BXP', 'CVX', 'CCI', 'DRI', 'DLR', 'D', 'DTE', 'DUK', 'ETR', 'EXR', 'XOM', 'FRT', 'SJM', 'KMB', 'LYB', 'MAA', 'OKE', 'PM', 'PSX', 'PNW', 'PRU', 'SLG', 'UPS', 'VLO', 'WELL', 'WYNN']
open 105.968148
div_pct 3.819630
div_amt 3.918148
eps 4.584444
dtype: float64
cluster 2:
includes 82 stocks
['ABT', 'AMG', 'AFL', 'A', 'ALK', 'ALLE', 'AAL', 'APH', 'AOS', 'AMAT', 'APTV', 'BLL', 'BAC', 'BAX', 'BWA', 'CBOE', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTXS', 'CTSH', 'CMCSA', 'CTVA', 'CSX', 'DHI', 'DVN', 'FANG', 'DD', 'ETFC', 'EBAY', 'EOG', 'EFX', 'EXPE', 'EXPD', 'FIS', 'FRC', 'FLIR', 'FLS', 'FMC', 'FBHS', 'FOXA', 'FOX', 'FCX', 'GL', 'HIG', 'HES', 'HRL', 'ICE', 'JBHT', 'LW', 'LDOS', 'MRO', 'MAS', 'MCK', 'MGM', 'MCHP', 'MOS', 'NEM', 'NWSA', 'NWS', 'NKE', 'NBL', 'ORCL', 'PCAR', 'PNR', 'PRGO', 'PHM', 'RJF', 'RHI', 'ROL', 'ROST', 'SEE', 'LUV', 'TJX', 'TSCO', 'VRSK', 'VMC', 'XLNX', 'XYL', 'ZBH', 'ZTS']
open 71.278902
div_pct 1.359146
div_amt 0.892927
eps 2.791463
dtype: float64
cluster 3:
includes 1 stocks
['NVR']
open 3820.00
div_pct 0.00
div_amt 0.00
eps 215.31
dtype: float64
cluster 4:
includes 3 stocks
['GOOGL', 'GOOG', 'AZO']
open 1311.956667
div_pct 0.000000
div_amt 0.000000
eps 52.210000
dtype: float64
cluster 5:
includes 66 stocks
['AGN', 'ADS', 'ALL', 'AEE', 'AEP', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'CHRW', 'CE', 'CB', 'CINF', 'C', 'STZ', 'CVS', 'DFS', 'DOV', 'ETN', 'EMR', 'EQR', 'ES', 'FDX', 'GRMN', 'GPC', 'HAS', 'HSY', 'IR', 'IFF', 'LLY', 'LOW', 'MMC', 'MDT', 'MRK', 'MSI', 'NDAQ', 'NTRS', 'PAYX', 'PPG', 'PG', 'PLD', 'QCOM', 'DGX', 'RL', 'RSG', 'SWKS', 'SWK', 'SBUX', 'STT', 'SYY', 'TROW', 'TGT', 'TEL', 'TIF', 'TSN', 'UTX', 'VFC', 'WMT', 'WM', 'WEC', 'YUM']
open 110.851970
div_pct 2.137273
div_amt 2.301212
eps 4.087879
dtype: float64
cluster 6:
includes 3 stocks
['IBM', 'PSA', 'SPG']
open 161.196667
div_pct 4.833333
div_amt 7.626667
eps 8.170000
dtype: float64
cluster 7:
includes 64 stocks
['ATVI', 'AMD', 'AKAM', 'ALXN', 'AME', 'ARNC', 'ADSK', 'BSX', 'CDNS', 'CPRI', 'KMX', 'CBRE', 'CNC', 'CXO', 'CPRT', 'DHR', 'DVA', 'XRAY', 'DISCA', 'DISCK', 'DISH', 'DLTR', 'FISV', 'FTNT', 'FTV', 'IT', 'GE', 'GPN', 'HSIC', 'HLT', 'HOLX', 'INFO', 'INCY', 'IPGP', 'IQV', 'JEC', 'KEYS', 'LEN', 'LKQ', 'L', 'MU', 'MNST', 'MYL', 'NOV', 'NCLH', 'NRG', 'PYPL', 'PKI', 'PGR', 'PVH', 'QRVO', 'PWR', 'CRM', 'SNPS', 'TMUS', 'TTWO', 'TXT', 'TRIP', 'TWTR', 'UAA', 'UA', 'VAR', 'WRB', 'WAB']
open 78.296094
div_pct 0.152500
div_amt 0.110156
eps 2.325156
dtype: float64
cluster 8:
includes 4 stocks
['BIIB', 'MHK', 'REGN', 'SIVB']
open 264.5475
div_pct 0.0000
div_amt 0.0000
eps 29.5425
dtype: float64
cluster 9:
includes 10 stocks
['MO', 'CTL', 'HP', 'IVZ', 'IRM', 'LB', 'MAC', 'M', 'OXY', 'WMB']
open 27.793
div_pct 7.789
div_amt 2.130
eps 0.224
dtype: float64
cluster 10:
includes 56 stocks
['AES', 'ALB', 'LNT', 'AIG', 'AIV', 'ADM', 'BK', 'BMY', 'COG', 'CPB', 'CF', 'CSCO', 'CFG', 'CMS', 'KO', 'CL', 'CAG', 'COP', 'GLW', 'DAL', 'DRE', 'DXC', 'EVRG', 'EXC', 'FAST', 'FITB', 'FE', 'HAL', 'HPE', 'HFC', 'HPQ', 'INTC', 'JCI', 'JNPR', 'KEY', 'KR', 'LEG', 'LNC', 'MXIM', 'MDLZ', 'MS', 'NTAP', 'NI', 'NUE', 'PEG', 'RF', 'SYF', 'FTI', 'UDR', 'USB', 'UNM', 'WBA', 'WU', 'XEL', 'XRX', 'ZION']
open 44.340000
div_pct 2.839643
div_amt 1.240000
eps 2.648036
dtype: float64
cluster 11:
includes 4 stocks
['BLK', 'AVGO', 'EQIX', 'LMT']
open 443.6425
div_pct 2.7200
div_amt 11.4100
eps 14.8175
dtype: float64
cluster 12:
includes 1 stocks
['BKNG']
open 2008.67
div_pct 0.00
div_amt 0.00
eps 97.36
dtype: float64
cluster 13:
includes 35 stocks
['ACN', 'APD', 'ARE', 'AMT', 'AMP', 'ADP', 'CAT', 'CLX', 'DE', 'HON', 'HII', 'ITW', 'JNJ', 'JPM', 'KLAC', 'LRCX', 'LIN', 'MTB', 'MCD', 'NEE', 'NSC', 'PKG', 'PH', 'PEP', 'PNC', 'RTN', 'ROK', 'RCL', 'SRE', 'SNA', 'TXN', 'TRV', 'UNP', 'UNH', 'WHR']
open 181.172857
div_pct 2.270571
div_amt 3.953714
eps 9.180000
dtype: float64
cluster 14:
includes 5 stocks
['CMG', 'ISRG', 'MTD', 'ORLY', 'TDG']
open 642.804
div_pct 0.000
div_amt 0.000
eps 14.996
dtype: float64
cluster 15:
includes 38 stocks
['AXP', 'ANTM', 'AON', 'AAPL', 'BDX', 'COF', 'CDW', 'CTAS', 'CME', 'COST', 'DG', 'ECL', 'EL', 'HCA', 'HUM', 'IEX', 'INTU', 'JKHY', 'KSU', 'LHX', 'MKTX', 'MAR', 'MLM', 'MA', 'MKC', 'MSFT', 'MCO', 'MSCI', 'PXD', 'RMD', 'ROP', 'SPGI', 'SBAC', 'SYK', 'TFX', 'V', 'DIS', 'WLTW']
open 219.314474
div_pct 1.013947
div_amt 2.072105
eps 7.210789
dtype: float64
cluster 16:
includes 30 stocks
['ABMD', 'ADBE', 'AAP', 'ALGN', 'ANSS', 'ANET', 'CHTR', 'CI', 'COO', 'EW', 'EA', 'FFIV', 'FB', 'FLT', 'IDXX', 'ILMN', 'LH', 'NFLX', 'NVDA', 'ODFL', 'NOW', 'TMO', 'ULTA', 'UAL', 'URI', 'UHS', 'VRSN', 'VRTX', 'WAT', 'WCG']
open 235.018000
div_pct 0.055667
div_amt 0.107333
eps 7.425000
dtype: float64
cluster 17:
includes 31 stocks
['CCL', 'CMA', 'ED', 'DOW', 'EMN', 'EIX', 'GIS', 'GM', 'GILD', 'HOG', 'IP', 'K', 'KSS', 'LVS', 'MPC', 'MET', 'TAP', 'JWN', 'OMC', 'PFE', 'PFG', 'O', 'REG', 'STX', 'SO', 'VTR', 'VZ', 'VNO', 'WFC', 'WDC', 'WRK']
open 58.530645
div_pct 4.003871
div_amt 2.306452
eps 3.771935
dtype: float64
cluster 18:
includes 1 stocks
['AMZN']
open 1795.02
div_pct 0.00
div_amt 0.00
eps 22.57
dtype: float64
I don't have too much expertise with stock trading, but have been listening to a podcast lately called trading stocks made easy by Tyrone Jackson (great podcast that I'd reccomend to anyone trying to learn more). He heavily advocates for stocks which pay out a dividend, a portion of their profits that isn't reinvested into the company, but rather goes to the shareholders. Additonally, he advocates for stocks that have sshowed consistent quarterly earnings growth. Between the two, dividend yield is a part of the data that has been collected, so I decided to cluster the subset of data for stocks which do pay out a dividend
# get stocks which pay dividend
div_yielding_data = four_dim_cluster_data[four_dim_cluster_data['div_amt'] > 0].drop(columns=cluster_config_one.keys(), axis=1)
k_means_func(data=div_yielding_data)
# apply model for n = {12, 14, 19}
cluster_config_two = {
'cluster_fourteen': 14,
'cluster_nineteen': 19,
'cluster_twenty_three': 23
}
div_yielding_data = classify_four_dim_stocks(div_yielding_data, cluster_config_two)
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_twenty_three')
cluster 0:
includes 24 stocks
['ACN', 'APD', 'AMT', 'ADP', 'CB', 'CME', 'STZ', 'DE', 'HSY', 'HON', 'ITW', 'KLAC', 'LHX', 'LIN', 'MKC', 'MCD', 'MSI', 'NEE', 'ROK', 'SWK', 'SYK', 'UNP', 'UTX', 'WLTW']
open 187.022500
div_pct 1.841250
div_amt 3.428750
eps 6.843333
dtype: float64
cluster 1:
includes 56 stocks
['AFL', 'ALK', 'ALB', 'LNT', 'AEE', 'AIG', 'AIV', 'BK', 'BMY', 'CHRW', 'CF', 'CSCO', 'C', 'CMS', 'KO', 'CL', 'COP', 'CVS', 'DAL', 'EMN', 'EMR', 'EQR', 'EVRG', 'ES', 'HIG', 'HAS', 'HFC', 'INTC', 'JCI', 'K', 'LEG', 'LNC', 'MPC', 'MXIM', 'MRK', 'MET', 'MDLZ', 'MS', 'NTAP', 'NUE', 'OMC', 'PAYX', 'PLD', 'PEG', 'QCOM', 'RHI', 'STT', 'SYF', 'SYY', 'USB', 'WBA', 'WEC', 'WFC', 'XEL', 'XRX', 'ZION']
open 64.544643
div_pct 2.723750
div_amt 1.752500
eps 3.903393
dtype: float64
cluster 2:
includes 10 stocks
['MMM', 'AMGN', 'AVB', 'BA', 'ESS', 'RE', 'HD', 'IBM', 'PSA', 'SPG']
open 222.526
div_pct 3.329
div_amt 6.878
eps 8.663
dtype: float64
cluster 3:
includes 9 stocks
['APA', 'COTY', 'DXC', 'KHC', 'NWL', 'NLSN', 'SLB', 'FTI', 'WDC']
open 28.590000
div_pct 4.234444
div_amt 1.165556
eps -4.783333
dtype: float64
cluster 4:
includes 61 stocks
['ATVI', 'A', 'AAL', 'AOS', 'AMAT', 'ARNC', 'BLL', 'BAC', 'BAX', 'BWA', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTSH', 'CMCSA', 'CTVA', 'CSX', 'DHI', 'XRAY', 'DVN', 'DD', 'ETFC', 'EBAY', 'EXPD', 'FLIR', 'FLS', 'FBHS', 'FOXA', 'FOX', 'FCX', 'GE', 'HES', 'HRL', 'KR', 'LW', 'LEN', 'L', 'MRO', 'MAS', 'MGM', 'MOS', 'NEM', 'NWSA', 'NWS', 'NBL', 'NRG', 'ORCL', 'PNR', 'PRGO', 'PGR', 'PHM', 'PWR', 'ROL', 'SEE', 'LUV', 'TXT', 'TJX', 'WRB', 'WAB', 'XYL']
open 48.194426
div_pct 1.265738
div_amt 0.588852
eps 2.369180
dtype: float64
cluster 5:
includes 15 stocks
['AAPL', 'BDX', 'CTAS', 'COO', 'COST', 'INTU', 'MKTX', 'MLM', 'MA', 'MCO', 'MSCI', 'ROP', 'SPGI', 'TFX', 'TMO']
open 295.877333
div_pct 0.719333
div_amt 2.038667
eps 8.054667
dtype: float64
cluster 6:
includes 11 stocks
['MO', 'CTL', 'F', 'GPS', 'HP', 'IVZ', 'IRM', 'KIM', 'LB', 'OXY', 'WMB']
open 25.681818
div_pct 6.781818
div_amt 1.770909
eps 0.169091
dtype: float64
cluster 7:
includes 27 stocks
['ARE', 'AEP', 'BXP', 'CVX', 'CLX', 'ED', 'CCI', 'DRI', 'DLR', 'DTE', 'DUK', 'ETN', 'ETR', 'EXR', 'FRT', 'GPC', 'IFF', 'SJM', 'JNJ', 'KMB', 'MAA', 'PNW', 'PG', 'TXN', 'UPS', 'VLO', 'WYNN']
open 118.426296
div_pct 3.174815
div_amt 3.709259
eps 4.370370
dtype: float64
cluster 8:
includes 2 stocks
['CAH', 'NOV']
open 37.445
div_pct 2.210
div_amt 1.060
eps -14.465
dtype: float64
cluster 9:
includes 1 stocks
['AVGO']
open 324.40
div_pct 4.01
div_amt 13.00
eps 6.43
dtype: float64
cluster 10:
includes 2 stocks
['BLK', 'LMT']
open 445.285
div_pct 2.555
div_amt 11.400
eps 23.465
dtype: float64
cluster 11:
includes 25 stocks
['AAP', 'ALLE', 'AXP', 'AON', 'COF', 'CDW', 'CI', 'CXO', 'FANG', 'DG', 'ECL', 'EL', 'FRC', 'FTV', 'GL', 'HCA', 'IEX', 'KSU', 'NVDA', 'ODFL', 'PVH', 'UHS', 'V', 'VMC', 'DIS']
open 147.0420
div_pct 0.7716
div_amt 1.1076
eps 6.7476
dtype: float64
cluster 12:
includes 28 stocks
['ABBV', 'T', 'CCL', 'D', 'DOW', 'EIX', 'XOM', 'GIS', 'GM', 'GILD', 'IP', 'KSS', 'LVS', 'TAP', 'OKE', 'PM', 'PPL', 'PFG', 'O', 'REG', 'STX', 'SLG', 'SO', 'TPR', 'VTR', 'VZ', 'WELL', 'WRK']
open 60.232500
div_pct 4.505714
div_amt 2.699643
eps 2.902500
dtype: float64
cluster 13:
includes 1 stocks
['EQIX']
open 559.60
div_pct 1.76
div_amt 9.84
eps 5.91
dtype: float64
cluster 14:
includes 10 stocks
['AMP', 'CAT', 'CMI', 'MTB', 'NSC', 'PH', 'PNC', 'RTN', 'SNA', 'WHR']
open 175.944
div_pct 2.468
div_amt 4.241
eps 12.811
dtype: float64
cluster 15:
includes 1 stocks
['SHW']
open 579.73
div_pct 0.78
div_amt 4.52
eps 14.86
dtype: float64
cluster 16:
includes 36 stocks
['AES', 'AMCR', 'ADM', 'COG', 'CPB', 'CNP', 'CFG', 'CAG', 'GLW', 'DRE', 'EXC', 'FAST', 'FITB', 'FE', 'BEN', 'HRB', 'HAL', 'HBI', 'HOG', 'HPE', 'HST', 'HPQ', 'HBAN', 'IPG', 'JNPR', 'KEY', 'KMI', 'NI', 'JWN', 'PBCT', 'PFE', 'RF', 'UDR', 'UNM', 'WU', 'WY']
open 28.308056
div_pct 3.518889
div_amt 0.972778
eps 1.740278
dtype: float64
cluster 17:
includes 1 stocks
['AGN']
open 190.50
div_pct 1.56
div_amt 2.96
eps -27.98
dtype: float64
cluster 18:
includes 25 stocks
['ABT', 'AMG', 'AME', 'APH', 'APTV', 'DHR', 'EFX', 'EXPE', 'FDX', 'FIS', 'GPN', 'HLT', 'ICE', 'JKHY', 'JBHT', 'MCK', 'MCHP', 'NKE', 'PKI', 'RMD', 'ROST', 'SBAC', 'VRSK', 'ZBH', 'ZTS']
open 126.8680
div_pct 0.9396
div_amt 1.1628
eps 1.9892
dtype: float64
cluster 19:
includes 8 stocks
['ANTM', 'GS', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'UNH']
open 299.73875
div_pct 1.47500
div_amt 4.31000
eps 16.78125
dtype: float64
cluster 20:
includes 46 stocks
['ALL', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'CBOE', 'CE', 'CINF', 'CTXS', 'DFS', 'DOV', 'EOG', 'FMC', 'GRMN', 'IR', 'LDOS', 'LOW', 'MAR', 'MMC', 'MDT', 'MSFT', 'NDAQ', 'PCAR', 'PXD', 'PPG', 'DGX', 'RL', 'RJF', 'RSG', 'SWKS', 'SBUX', 'TGT', 'TEL', 'TIF', 'TSCO', 'TSN', 'VFC', 'WMT', 'WM', 'XLNX', 'YUM']
open 110.026087
div_pct 1.757174
div_amt 1.916739
eps 4.684783
dtype: float64
cluster 21:
includes 2 stocks
['MAC', 'M']
open 21.180
div_pct 10.490
div_amt 2.255
eps 1.835
dtype: float64
other_keys = [key for key in cluster_config_two.keys() if key != 'cluster_twenty_three']
div_yielding_agg = div_yielding_data.drop(columns=other_keys, axis=1).groupby('cluster_twenty_three').mean()
div_yielding_agg
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eps | open | div_pct | div_amt | |
---|---|---|---|---|
cluster_twenty_three | ||||
0 | 6.843333 | 187.022500 | 1.841250 | 3.428750 |
1 | 3.903393 | 64.544643 | 2.723750 | 1.752500 |
2 | 8.663000 | 222.526000 | 3.329000 | 6.878000 |
3 | -4.783333 | 28.590000 | 4.234444 | 1.165556 |
4 | 2.369180 | 48.194426 | 1.265738 | 0.588852 |
5 | 8.054667 | 295.877333 | 0.719333 | 2.038667 |
6 | 0.169091 | 25.681818 | 6.781818 | 1.770909 |
7 | 4.370370 | 118.426296 | 3.174815 | 3.709259 |
8 | -14.465000 | 37.445000 | 2.210000 | 1.060000 |
9 | 6.430000 | 324.400000 | 4.010000 | 13.000000 |
10 | 23.465000 | 445.285000 | 2.555000 | 11.400000 |
11 | 6.747600 | 147.042000 | 0.771600 | 1.107600 |
12 | 2.902500 | 60.232500 | 4.505714 | 2.699643 |
13 | 5.910000 | 559.600000 | 1.760000 | 9.840000 |
14 | 12.811000 | 175.944000 | 2.468000 | 4.241000 |
15 | 14.860000 | 579.730000 | 0.780000 | 4.520000 |
16 | 1.740278 | 28.308056 | 3.518889 | 0.972778 |
17 | -27.980000 | 190.500000 | 1.560000 | 2.960000 |
18 | 1.989200 | 126.868000 | 0.939600 | 1.162800 |
19 | 16.781250 | 299.738750 | 1.475000 | 4.310000 |
20 | 4.684783 | 110.026087 | 1.757174 | 1.916739 |
21 | 1.835000 | 21.180000 | 10.490000 | 2.255000 |
22 | 9.195333 | 114.239333 | 2.994667 | 3.286000 |
Finally! We have some simple visualization of the aggregated data for our clustered dividend yielding S&P 500 stocks. Based on these plots, I'm going to take a closer look at a few of the clusters:
- cluster 10/19: these clusters has the highest earnings per share on average of all clusters
- cluster 9/10/13: These clusters had the highest average dividend amounts per share of any cluster
- cluster 6/21: these clusters by far had the highest percentage dividend of any cluster
Although open value was included in the feature set (with the intention of clustering stocks based on similar cost per share), open value for an arbritrary day does not seem like a good feature to indicate a specific cluster to consider more carefully
plt.figure(figsize=(12,12))
ax1 = plt.subplot(221)
ax1.title.set_text('average EPS per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.eps)
ax2 = plt.subplot(222)
ax2.title.set_text('average dividend amount per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.div_amt)
ax3 = plt.subplot(223)
ax3.title.set_text('average dividend percentage per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.div_pct)
ax4 = plt.subplot(224)
ax4.title.set_text('average open value per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.open)
<matplotlib.axes._subplots.AxesSubplot at 0x1a1fe5d630>
Although these results are far from finished, and I will need to comb through financial figures and track these stocks for more than just one day, it is clear that clustering through the K-means algorithm has allowed me to hone initial search for potentially lucrative S&P 500 stocks. This was a fun and quick 1-day venture that allowed me to get more familiar with relevant financial figures for stock trading, scraping stock data, and applying machine learning techniques to an interesting data set
# we can use the output cluster tickers function, passsing an optional parameter which specifies
# which clusters to show the tickers for.
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_twenty_three', show_tickers=[6, 9, 10, 13, 19, 21])
cluster 6:
includes 11 stocks
['MO', 'CTL', 'F', 'GPS', 'HP', 'IVZ', 'IRM', 'KIM', 'LB', 'OXY', 'WMB']
open 25.681818
div_pct 6.781818
div_amt 1.770909
eps 0.169091
dtype: float64
cluster 9:
includes 1 stocks
['AVGO']
open 324.40
div_pct 4.01
div_amt 13.00
eps 6.43
dtype: float64
cluster 10:
includes 2 stocks
['BLK', 'LMT']
open 445.285
div_pct 2.555
div_amt 11.400
eps 23.465
dtype: float64
cluster 13:
includes 1 stocks
['EQIX']
open 559.60
div_pct 1.76
div_amt 9.84
eps 5.91
dtype: float64
cluster 19:
includes 8 stocks
['ANTM', 'GS', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'UNH']
open 299.73875
div_pct 1.47500
div_amt 4.31000
eps 16.78125
dtype: float64
cluster 21:
includes 2 stocks
['MAC', 'M']
open 21.180
div_pct 10.490
div_amt 2.255
eps 1.835
dtype: float64
# we can use the output cluster tickers function, passsing an optional parameter which specifies
# which clusters to show the tickers for.
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_nineteen')
cluster 0:
includes 17 stocks
['AAPL', 'BDX', 'CI', 'CTAS', 'COO', 'COST', 'INTU', 'MKTX', 'MLM', 'MA', 'MCO', 'MSCI', 'ROP', 'SPGI', 'SYK', 'TFX', 'TMO']
open 284.572941
div_pct 0.702353
div_amt 1.936471
eps 8.352353
dtype: float64
cluster 1:
includes 58 stocks
['ALK', 'ALB', 'LNT', 'AEE', 'AEP', 'AIV', 'ADM', 'BK', 'BMY', 'CHRW', 'CSCO', 'C', 'CFG', 'CMS', 'KO', 'CL', 'COP', 'CVS', 'DAL', 'EMN', 'EMR', 'EQR', 'EVRG', 'ES', 'EXC', 'FITB', 'FE', 'GIS', 'HIG', 'HAS', 'HFC', 'INTC', 'JCI', 'K', 'LEG', 'LNC', 'MPC', 'MXIM', 'MRK', 'MET', 'MS', 'NTAP', 'NUE', 'OMC', 'PAYX', 'PFG', 'PLD', 'PEG', 'QCOM', 'STT', 'SYF', 'SYY', 'USB', 'WBA', 'WEC', 'WFC', 'XEL', 'ZION']
open 64.173103
div_pct 2.854655
div_amt 1.809828
eps 3.905172
dtype: float64
cluster 2:
includes 7 stocks
['AMP', 'CMI', 'GS', 'MTB', 'SNA', 'VNO', 'WHR']
open 162.411429
div_pct 2.827143
div_amt 4.325714
eps 15.898571
dtype: float64
cluster 3:
includes 32 stocks
['ARE', 'ADS', 'BXP', 'CVX', 'CLX', 'CMA', 'ED', 'DRI', 'DTE', 'ETN', 'ETR', 'FRT', 'GPC', 'HSY', 'SJM', 'JNJ', 'KMB', 'LLY', 'LYB', 'MAA', 'NTRS', 'PKG', 'PEP', 'PSX', 'PRU', 'RCL', 'TROW', 'TXN', 'TRV', 'UPS', 'VLO', 'WYNN']
open 119.817812
div_pct 3.031875
div_amt 3.562812
eps 6.331563
dtype: float64
cluster 4:
includes 8 stocks
['APA', 'CAH', 'CTL', 'COTY', 'KHC', 'NLSN', 'SLB', 'WDC']
open 30.72875
div_pct 4.91375
div_amt 1.39125
eps -6.71250
dtype: float64
cluster 5:
includes 4 stocks
['BA', 'AVGO', 'EQIX', 'ESS']
open 377.2375
div_pct 2.7275
div_amt 9.7150
eps 6.3675
dtype: float64
cluster 6:
includes 56 stocks
['ALL', 'AXP', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'COF', 'CBOE', 'CE', 'CINF', 'CTXS', 'STZ', 'DFS', 'DOV', 'EXPE', 'FDX', 'FMC', 'GRMN', 'IR', 'IFF', 'LDOS', 'LOW', 'MAR', 'MMC', 'MKC', 'MDT', 'MCHP', 'MSFT', 'MSI', 'NDAQ', 'PCAR', 'PPG', 'PG', 'DGX', 'RL', 'RJF', 'RSG', 'SWKS', 'SWK', 'SBUX', 'TGT', 'TEL', 'TIF', 'TSCO', 'TSN', 'UTX', 'VFC', 'WMT', 'WM', 'XLNX', 'YUM']
open 116.192143
div_pct 1.759286
div_amt 2.030893
eps 4.663929
dtype: float64
cluster 7:
includes 50 stocks
['ATVI', 'A', 'AAL', 'AME', 'APH', 'AMAT', 'APTV', 'ARNC', 'BLL', 'BAX', 'BWA', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTSH', 'CXO', 'CSX', 'DHI', 'XRAY', 'DVN', 'FANG', 'ETFC', 'EOG', 'EXPD', 'FLIR', 'FTV', 'FBHS', 'FOXA', 'FOX', 'GE', 'HLT', 'ICE', 'LW', 'LEN', 'L', 'MAS', 'NEM', 'NKE', 'NRG', 'PKI', 'PGR', 'PHM', 'PWR', 'LUV', 'TXT', 'TJX', 'WRB', 'WAB', 'XYL']
open 63.3910
div_pct 0.9736
div_amt 0.6050
eps 3.2894
dtype: float64
cluster 8:
includes 1 stocks
['AGN']
open 190.50
div_pct 1.56
div_amt 2.96
eps -27.98
dtype: float64
cluster 9:
includes 2 stocks
['BLK', 'LMT']
open 445.285
div_pct 2.555
div_amt 11.400
eps 23.465
dtype: float64
cluster 10:
includes 31 stocks
['AAP', 'ALLE', 'AON', 'CDW', 'DHR', 'DG', 'ECL', 'EL', 'FIS', 'FRC', 'GL', 'GPN', 'HCA', 'IEX', 'JKHY', 'JBHT', 'KSU', 'NVDA', 'ODFL', 'PXD', 'PVH', 'RMD', 'ROST', 'SBAC', 'UHS', 'VRSK', 'V', 'VMC', 'DIS', 'ZBH', 'ZTS']
open 155.143548
div_pct 0.778065
div_amt 1.189677
eps 4.845484
dtype: float64
cluster 11:
includes 10 stocks
['MO', 'F', 'HP', 'IVZ', 'IRM', 'LB', 'MAC', 'M', 'OXY', 'WMB']
open 27.418
div_pct 7.692
div_amt 2.090
eps 1.003
dtype: float64
cluster 12:
includes 41 stocks
['ABT', 'AES', 'AFL', 'AIG', 'AOS', 'BAC', 'COG', 'CPB', 'CF', 'CMCSA', 'CAG', 'GLW', 'CTVA', 'DRE', 'DD', 'EBAY', 'FAST', 'FLS', 'FCX', 'HAL', 'HES', 'HPE', 'HRL', 'JNPR', 'KR', 'MRO', 'MGM', 'MDLZ', 'MOS', 'NWSA', 'NWS', 'NI', 'ORCL', 'PNR', 'PRGO', 'RHI', 'ROL', 'SEE', 'UDR', 'WU', 'XRX']
open 37.565122
div_pct 2.142927
div_amt 0.790732
eps 1.632439
dtype: float64
cluster 13:
includes 10 stocks
['AMGN', 'ANTM', 'RE', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'SHW', 'UNH']
open 326.680
div_pct 1.528
div_amt 4.660
eps 15.004
dtype: float64
cluster 14:
includes 27 stocks
['MMM', 'ACN', 'APD', 'AMT', 'ADP', 'AVB', 'CAT', 'CB', 'CME', 'DE', 'HD', 'HON', 'ITW', 'JPM', 'KLAC', 'LHX', 'LIN', 'MCD', 'NEE', 'NSC', 'PH', 'PNC', 'RTN', 'ROK', 'SRE', 'UNP', 'WLTW']
open 189.472593
div_pct 2.135185
div_amt 3.988148
eps 8.268148
dtype: float64
cluster 15:
includes 23 stocks
['ABBV', 'CCI', 'DLR', 'D', 'DOW', 'DUK', 'EIX', 'EXR', 'XOM', 'GILD', 'KSS', 'LVS', 'OKE', 'PM', 'PNW', 'O', 'REG', 'STX', 'SLG', 'SO', 'VTR', 'VZ', 'WELL']
open 77.526087
div_pct 4.353913
div_amt 3.316087
eps 2.796087
dtype: float64
cluster 16:
includes 3 stocks
['IBM', 'PSA', 'SPG']
open 161.196667
div_pct 4.833333
div_amt 7.626667
eps 8.170000
dtype: float64
cluster 17:
includes 29 stocks
['AMCR', 'T', 'CCL', 'CNP', 'BEN', 'GPS', 'GM', 'HRB', 'HBI', 'HOG', 'HST', 'HPQ', 'HBAN', 'IP', 'IPG', 'KEY', 'KIM', 'KMI', 'TAP', 'NWL', 'JWN', 'PBCT', 'PFE', 'PPL', 'RF', 'TPR', 'UNM', 'WRK', 'WY']
open 27.975517
div_pct 4.359310
div_amt 1.215172
eps 2.048621
dtype: float64