Giter Club home page Giter Club logo

stock-data-cluster-analysis's Introduction

S&P 500 Stock Clustering

This notebook demonstrates a clustering of the S&P 500 stock exchange, based on a select set of financial figures

The exchange consists of 500 companies, but includes 505 common stocks, due to 5 companies having two shares of stocks in the exchange (Facebook, Under-Armour, NewsCorp, Comcast and 21st Century Fox)

# libraries for making requests and parsing HTML
import requests
from bs4 import BeautifulSoup

# plotting
import matplotlib.pyplot as plt
import seaborn as sns

# sklearn for kmeans and model metrics
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# pandas, for data wrangling
import pandas as pd
/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)

Data Accquisition

For the data I wanted access to, the existing APIs for financial data did not work out. Instead. I decided to manually scrape the data, ussing Wikipedia and Yahoo Finance.

  1. scrape the list of S&P 500 tickers from Wikipedia
  2. scrape the financial figures for each stock ticker from Yahoo Finance
# URL to get S&P tickers from
TICKER_URL = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

# multi-level identifier, to select each row of ticker table in HTML response
TABLE_IDENTIFIER = '#constituents tbody tr td'

# yahoo finance URL we can use to scrape data for each company
YAHOO_URL = 'http://finance.yahoo.com/quote/'

# HTML classes for various elements on yahoo finance page

YAHOO_TABLE_CLASS = 'Ta(end) Fw(600) Lh(14px)'
# EPS (TTM) react-id
# Open price react-id
# Div/Yield react-id
YAHOO_IDS = ['OPEN-value', 'EPS_RATIO-value', 'DIVIDEND_AND_YIELD-value', 'PE_RATIO-value']
# get HTML content from wikipedia S&P 500 page
res = BeautifulSoup(requests.get(TICKER_URL).text, 'html.parser')
# get the table of stock ticker data, selecting on TABLE_ID
table_data = [ticker for ticker in res.select(TABLE_IDENTIFIER)]
# iterate over each row of table (9 elements of information), and extract the individual tickers
tickers = [table_data[i].text for i in range(0, len(table_data), 9)]
# iterate through the S&P 500 company tickers, and collect data from Yahoo Finance
def get_yahoo_ticker_data(tickers):
    ticker_data = []
    # make GET request for specified ticker
    print(len(tickers))
    for i, ticker in enumerate(tickers):
        print(i)
        try:
            REQ_URL = YAHOO_URL + ticker[:-1] + '?p=' + ticker[:-1]
            ticker_i_res = requests.get(REQ_URL)
            ticker_i_parser = BeautifulSoup(ticker_i_res.text, 'html.parser')

            ticker_i_data = [ticker[:-1]]
            ticker_i_open_eps_div = [ticker_i_parser.find(attrs={'class': YAHOO_TABLE_CLASS, 'data-test': id_}).text for id_ in YAHOO_IDS]
            for data in ticker_i_open_eps_div:
                    ticker_i_data.append(data)
            ticker_data.append(ticker_i_data)
        except:
            print("error for " + ticker)
            continue
    return ticker_data

Saving the data

The process of scraping all of the necessary data was rather cumbersome, so it made sense to save the data to file for future experiments

# convert yahoo finance data to dataframe

# will include:
# EPS (TTM) => earnings per share for trailing 12 months
# Dividend/Yield => dividend per share / price per share
# P/E ratio => share price / earnings per share
try:
    df = pd.read_csv('data.csv')
except:
    # iterate over stock tickers, and get 1 year of time-series data
    market_data = pd.DataFrame()
    yahoo_data = get_yahoo_ticker_data(tickers)
    df = pd.DataFrame(yahoo_data, columns=['ticker', 'open', 'eps', 'div'])#, 'pe'],)
    df.to_csv(path_or_buf='data.csv')
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 ticker open eps div
0 0 MMM 169.78 8.43 5.76 (3.39%)
1 1 ABT 87.08 1.84 1.44 (1.65%)
2 2 ABBV 90.05 2.18 4.72 (5.24%)
3 3 ABMD 179.85 4.79 N/A (N/A)
4 4 ACN 203.60 7.36 3.72 (1.83%)
df['div'] = df['div'].replace({'N/A (N/A)': 0})

Preprocessing

Some data preprocessing is required before proceeding forward with experimentation

  1. separating percentage dividend yield and dividend yield amount into two separate featuress
  2. reformatting some features into representations that could be converted to numerical types
  3. casting features of DataFrame to numerical types
# drop NaN values
df = df.dropna()

# remove NaN values that aren't using NaN value
#df = df[df['eps'] != 'N/A']
df['eps'] = df['eps'].astype(float)


# preprocess open values
df['open'] = df['open'].astype(str)
df['open'] = df['open'].apply(lambda x: x.replace(',', '')).astype(float)

# split dividend into amount and percentage
df['div'] = df['div'].astype(str)
df['div_pct'] = df['div'].apply(lambda x: x.split(' ')[1] if len(x.split(' ')) > 1 else '(0%)')
df['div_pct'] = df['div_pct'].apply(lambda x: x[1:-2]).astype(float)
df['div_amt'] = df['div'].apply(lambda x: x.split(' ')[0]).astype(float)
df = df.drop(['div'], axis=1)
df.isnull().sum()
Unnamed: 0    0
ticker        0
open          0
eps           0
div_pct       0
div_amt       0
dtype: int64
# relevant data for now, will be using these columns for k-means clustering
two_dim_cluster_data = df[['ticker', 'eps', 'div_pct']]
four_dim_cluster_data = df[['ticker', 'eps', 'open', 'div_pct', 'div_amt']]
sns.scatterplot(x='eps', y='div_pct', data=two_dim_cluster_data)
<matplotlib.axes._subplots.AxesSubplot at 0x1a10f6b6a0>

png

Clustering the data: The K-Means algorithm

Now that the data the accquisition and preprocessing was complete, the next step is clustering our stock data, analyzing the performance of the clustering, based on the number of centroids, and then generating a final clustering based on some performance metrics.

The K-means algorithm operates as follows:

1. a number of "centroids" are randomly initialized (the number of hyperparameter of the model), these centroid
   match the dimension of the feature set, and can be imagine as a vector into some n-dimensional space
2. every sample in the data set is then compared to each of the randomly initialized centroids, to see how far 
   it is away from the centroid. Since the samples and centroids are vectors, the distance 
   between a vector v and a centroid u is the vector normal of the difference between the two vectors 
   ((u1-v1)^2 + (u2-v2)^2 + ....)^(1/2). Each sample is then "clustered" with the centroid it is closest to.
3. After each sample has been clustered with a specific centroid, each centroid is repositioned, such that it
   is the average of all of the samples that have been clustered with it.
4. The sample association and centroid repositioning steps are then repeated for some number of iterations
# iterate over a variety of amounts of cluster centroids for clustering our stock data
# looking for an "elbow" in the sum of squared error plot, for different amounts of centroids
def k_means_func(data, max_centroids=25):
    # transform numerical features (eps and percentage dividend)
    transform_data = StandardScaler().fit_transform(data.iloc[:,1:])
    
    sum_square_err = {}
    sil_score = {}
    for num_centroids in range(2,max_centroids):
        model = KMeans(n_clusters=num_centroids, random_state=2, n_init=10)
        model.fit(transform_data)
        sum_square_err[num_centroids] = model.inertia_
        sil_score[num_centroids] = silhouette_score(transform_data, model.labels_, random_state=2)
    
    plt.figure(figsize=(16,6))
    ax1 = plt.subplot(211)
    plt.plot(list(sum_square_err.keys()), list(sum_square_err.values()))
    ax1.title.set_text("k-means sum squared error")
    plt.xlabel("num. centroids")
    plt.ylabel("sum squared error")
    plt.xticks([i for i in range(2, max_centroids)])
    
    ax2 = plt.subplot(212)
    plt.plot(list(sil_score.keys()), list(sil_score.values()))
    ax2.title.set_text("k-means silhouette score")
    plt.xlabel("num. centroids")
    plt.ylabel("score")
    plt.xticks([i for i in range(2, max_centroids)])
    plt.yticks([i / 10 for i in range(10)])

Measuring the performance of K-Means clustering

The K-means algorithm cannot be measured in performance in the same way as supervised learning algorithms. There is no prediction error, since the data we are given is unlabeled, and instead, we measure the performance of the k-means algorithm based on the ability of the chosen number of centroids to effectively cluster the data. Notely, one of the common metrics for K-means is measuring the squared sum of errors between each sample and the centroid it is clustered with, where the squared error is just the squared vector normal of the difference between the sample and the centroid

In addition to the squared sum of errors, K-means is often measured using the silhouette score. This metric is the mean of the silhouette coefficient for every sample. The silhouette coefficient can be defined as follows:

  • for a sample S, we define A(S) as the mean distance between S and every other element in S's assigned cluster
  • we define B(S) as the mean distance between S, and every point in the closest cluster to S, other than S's assigned cluster
  • we define SC(S), the silhouette coefficient, as the difference between A(S) and B(S), divided by the larger of A(S) and B(S)
  • therefore, SC(S) ranges from 0 to 1, where SC(S) = 1 means the mean distance from S to every point in S's cluster is 0, and SC(S) = 0 means that the mean distance from S to every point in its cluster is the same as the mean distance from S to every point in the nearest other cluster

Below, we plot these metrics for our application of K-means to the stock data, we can see the following:

  1. The silhouette score drops rather quickly after n grows greater than 3-4, this implies that a small amount of clusters most likely results in a few disparate clusters (with a single cluster comprising much of the data)
  2. The silhouette score stabilizes after it drops to 0.4, while the SSE continues to drop rapidly until n10
  3. The silhouette score bumps up slightly for a few values of n (n = 11, n = 15, n = 20), these are likely good values for n, since the silhouette score is stable but slightly up, while the SSE continues to go down
k_means_func(two_dim_cluster_data)

png

k_means_func(four_dim_cluster_data)    

png

Finalizing our clusterings

Given that we have identified a few values for our centroid hyperparameter that seem fruitful, the next step is to fit and cluster the data for these specified values, our results will not be predictions of an output variable, as is the case in supervised learning, but rather, predictions of certain groupings of our stock tickers

def classify_four_dim_stocks(data, cluster_configs):
    transform_data = StandardScaler().fit_transform(data.iloc[:,1:])
    # initialize K-means models with each of the specified cluster hyperparameter valuess
    for config in cluster_configs.keys():
        model = KMeans(n_clusters=cluster_configs[config], random_state=5, n_init=10)
        model.fit(transform_data)
        data[config] = model.labels_
    return data
cluster_config_one = {
    'cluster_five': 5,
    'cluster_ten': 10,
    'cluster_fourteen': 14,
    'cluster_twenty': 20
}
four_dim_cluster_data = classify_four_dim_stocks(four_dim_cluster_data[['ticker', 'eps', 'open', 'div_pct', 'div_amt']], cluster_config_one)
/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys
four_dim_cluster_data
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ticker eps open div_pct div_amt cluster_five cluster_ten cluster_fourteen cluster_twenty
0 MMM 8.43 169.78 3.39 5.76 0 4 11 19
1 ABT 1.84 87.08 1.65 1.44 2 5 1 2
2 ABBV 2.18 90.05 5.24 4.72 0 4 13 1
3 ABMD 4.79 179.85 0.00 0.00 2 8 5 16
4 ACN 7.36 203.60 1.83 3.72 0 4 3 13
5 ATVI 2.11 58.34 0.63 0.37 2 8 5 7
6 ADBE 6.00 322.10 0.00 0.00 2 2 12 16
7 AMD 0.19 42.79 0.00 0.00 2 8 5 7
8 AAP 6.17 158.13 0.16 0.24 2 8 5 16
9 AES 0.76 18.88 3.03 0.57 3 0 9 10
10 AMG -3.35 86.68 1.50 1.28 2 5 1 2
11 AFL 4.05 53.33 2.03 1.08 2 5 1 2
12 A 3.37 83.75 0.85 0.72 2 5 1 2
13 APD 7.94 235.09 1.98 4.64 0 4 11 13
14 AKAM 2.74 84.44 0.00 0.00 2 8 5 7
15 ALK 4.92 70.41 2.03 1.40 2 5 1 2
16 ALB 5.38 68.90 2.22 1.47 2 5 9 10
17 ARE 1.09 155.29 2.63 4.12 0 4 3 13
18 ALXN 6.52 109.43 0.00 0.00 2 8 5 7
19 ALGN 5.21 269.48 0.00 0.00 2 8 5 16
20 ALLE 4.79 123.53 0.87 1.08 2 5 1 2
21 AGN -27.98 190.50 1.56 2.96 2 5 1 5
22 ADS 8.81 109.78 2.31 2.52 0 7 3 5
23 LNT 2.24 54.08 2.64 1.42 3 0 9 10
24 ALL 7.32 110.18 1.82 2.00 2 7 7 5
25 GOOGL 46.60 1357.00 0.00 0.00 1 1 8 4
26 GOOG 46.60 1356.60 0.00 0.00 1 1 8 4
27 MO 0.93 50.90 6.61 3.36 3 6 0 9
28 AMZN 22.57 1795.02 0.00 0.00 1 1 8 18
29 AMCR 0.31 10.75 4.26 0.46 3 0 10 0
... ... ... ... ... ... ... ... ... ...
474 V 5.32 185.52 0.65 1.20 2 5 7 15
475 VNO 15.73 65.26 4.06 2.64 3 0 13 17
476 VMC 4.50 143.12 0.87 1.24 2 5 1 2
477 WRB 3.61 69.64 0.63 0.44 2 8 5 7
478 WAB 1.46 74.31 0.64 0.48 2 8 5 7
479 WMT 5.00 121.51 1.75 2.12 2 7 7 5
480 WBA 4.31 57.23 3.21 1.83 3 0 9 10
481 DIS 6.64 147.77 1.19 1.76 2 7 7 15
482 WM 4.09 113.02 1.81 2.05 2 7 7 5
483 WAT 8.13 231.00 0.00 0.00 2 8 5 16
484 WEC 3.45 91.33 2.78 2.53 3 0 9 5
485 WCG 12.44 317.40 0.00 0.00 2 2 12 16
486 WFC 4.65 54.46 3.75 2.04 3 0 9 17
487 WELL 2.80 76.97 4.50 3.48 3 4 13 1
488 WDC -5.26 57.17 3.49 2.00 3 0 9 17
489 WU 2.60 26.87 2.98 0.80 3 0 9 10
490 WRK 3.33 41.87 4.43 1.86 3 0 10 17
491 WY -0.21 29.74 4.59 1.36 3 0 10 0
492 WHR 16.58 147.25 3.27 4.80 0 4 11 13
493 WMB 0.12 23.04 6.63 1.52 3 6 0 9
494 WLTW 6.74 201.02 1.29 2.60 2 7 7 15
495 WYNN 6.16 138.00 3.00 4.00 0 4 3 1
496 XEL 2.50 63.57 2.55 1.62 3 0 9 10
497 XRX 2.84 36.93 2.69 1.00 3 0 9 10
498 XLNX 3.71 96.26 1.54 1.48 2 5 1 2
499 XYL 2.80 78.10 1.23 0.96 2 5 1 2
500 YUM 3.62 99.48 1.69 1.68 2 5 7 5
501 ZBH -0.44 149.90 0.64 0.96 2 5 1 2
502 ZION 4.27 51.60 2.64 1.36 3 0 9 10
503 ZTS 3.02 127.15 0.63 0.80 2 8 5 2

497 rows ร— 9 columns

def output_cluster_tickers(original_data, cluster_data, cluster, show_tickers=[]): 
    for i in range(0, max(cluster_data[cluster])):
        if(i in show_tickers or len(show_tickers) == 0):
            # list of tickers for the current cluster
            ticker_list = list(cluster_data[cluster_data[cluster] == i]['ticker'])
            print("cluster " + str(i) + ":")
            print("includes " + str(len(ticker_list)) + " stocks")
            print(ticker_list)
            # original data for tickers that are part of cluster, more useful than
            # the transformed data
            curr_data = original_data[original_data['ticker'].isin(ticker_list)]
            print(curr_data[['open', 'div_pct', 'div_amt', 'eps']].mean())
            print()
output_cluster_tickers(df, four_dim_cluster_data, 'cluster_twenty')
cluster 0:
includes 24 stocks
['AMCR', 'APA', 'T', 'CAH', 'CNP', 'COTY', 'F', 'BEN', 'GPS', 'HRB', 'HBI', 'HST', 'HBAN', 'IPG', 'KIM', 'KMI', 'KHC', 'NWL', 'NLSN', 'PBCT', 'PPL', 'SLB', 'TPR', 'WY']
open       23.628750
div_pct     4.732500
div_amt     1.106667
eps        -0.760000
dtype: float64

cluster 1:
includes 27 stocks
['ABBV', 'BXP', 'CVX', 'CCI', 'DRI', 'DLR', 'D', 'DTE', 'DUK', 'ETR', 'EXR', 'XOM', 'FRT', 'SJM', 'KMB', 'LYB', 'MAA', 'OKE', 'PM', 'PSX', 'PNW', 'PRU', 'SLG', 'UPS', 'VLO', 'WELL', 'WYNN']
open       105.968148
div_pct      3.819630
div_amt      3.918148
eps          4.584444
dtype: float64

cluster 2:
includes 82 stocks
['ABT', 'AMG', 'AFL', 'A', 'ALK', 'ALLE', 'AAL', 'APH', 'AOS', 'AMAT', 'APTV', 'BLL', 'BAC', 'BAX', 'BWA', 'CBOE', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTXS', 'CTSH', 'CMCSA', 'CTVA', 'CSX', 'DHI', 'DVN', 'FANG', 'DD', 'ETFC', 'EBAY', 'EOG', 'EFX', 'EXPE', 'EXPD', 'FIS', 'FRC', 'FLIR', 'FLS', 'FMC', 'FBHS', 'FOXA', 'FOX', 'FCX', 'GL', 'HIG', 'HES', 'HRL', 'ICE', 'JBHT', 'LW', 'LDOS', 'MRO', 'MAS', 'MCK', 'MGM', 'MCHP', 'MOS', 'NEM', 'NWSA', 'NWS', 'NKE', 'NBL', 'ORCL', 'PCAR', 'PNR', 'PRGO', 'PHM', 'RJF', 'RHI', 'ROL', 'ROST', 'SEE', 'LUV', 'TJX', 'TSCO', 'VRSK', 'VMC', 'XLNX', 'XYL', 'ZBH', 'ZTS']
open       71.278902
div_pct     1.359146
div_amt     0.892927
eps         2.791463
dtype: float64

cluster 3:
includes 1 stocks
['NVR']
open       3820.00
div_pct       0.00
div_amt       0.00
eps         215.31
dtype: float64

cluster 4:
includes 3 stocks
['GOOGL', 'GOOG', 'AZO']
open       1311.956667
div_pct       0.000000
div_amt       0.000000
eps          52.210000
dtype: float64

cluster 5:
includes 66 stocks
['AGN', 'ADS', 'ALL', 'AEE', 'AEP', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'CHRW', 'CE', 'CB', 'CINF', 'C', 'STZ', 'CVS', 'DFS', 'DOV', 'ETN', 'EMR', 'EQR', 'ES', 'FDX', 'GRMN', 'GPC', 'HAS', 'HSY', 'IR', 'IFF', 'LLY', 'LOW', 'MMC', 'MDT', 'MRK', 'MSI', 'NDAQ', 'NTRS', 'PAYX', 'PPG', 'PG', 'PLD', 'QCOM', 'DGX', 'RL', 'RSG', 'SWKS', 'SWK', 'SBUX', 'STT', 'SYY', 'TROW', 'TGT', 'TEL', 'TIF', 'TSN', 'UTX', 'VFC', 'WMT', 'WM', 'WEC', 'YUM']
open       110.851970
div_pct      2.137273
div_amt      2.301212
eps          4.087879
dtype: float64

cluster 6:
includes 3 stocks
['IBM', 'PSA', 'SPG']
open       161.196667
div_pct      4.833333
div_amt      7.626667
eps          8.170000
dtype: float64

cluster 7:
includes 64 stocks
['ATVI', 'AMD', 'AKAM', 'ALXN', 'AME', 'ARNC', 'ADSK', 'BSX', 'CDNS', 'CPRI', 'KMX', 'CBRE', 'CNC', 'CXO', 'CPRT', 'DHR', 'DVA', 'XRAY', 'DISCA', 'DISCK', 'DISH', 'DLTR', 'FISV', 'FTNT', 'FTV', 'IT', 'GE', 'GPN', 'HSIC', 'HLT', 'HOLX', 'INFO', 'INCY', 'IPGP', 'IQV', 'JEC', 'KEYS', 'LEN', 'LKQ', 'L', 'MU', 'MNST', 'MYL', 'NOV', 'NCLH', 'NRG', 'PYPL', 'PKI', 'PGR', 'PVH', 'QRVO', 'PWR', 'CRM', 'SNPS', 'TMUS', 'TTWO', 'TXT', 'TRIP', 'TWTR', 'UAA', 'UA', 'VAR', 'WRB', 'WAB']
open       78.296094
div_pct     0.152500
div_amt     0.110156
eps         2.325156
dtype: float64

cluster 8:
includes 4 stocks
['BIIB', 'MHK', 'REGN', 'SIVB']
open       264.5475
div_pct      0.0000
div_amt      0.0000
eps         29.5425
dtype: float64

cluster 9:
includes 10 stocks
['MO', 'CTL', 'HP', 'IVZ', 'IRM', 'LB', 'MAC', 'M', 'OXY', 'WMB']
open       27.793
div_pct     7.789
div_amt     2.130
eps         0.224
dtype: float64

cluster 10:
includes 56 stocks
['AES', 'ALB', 'LNT', 'AIG', 'AIV', 'ADM', 'BK', 'BMY', 'COG', 'CPB', 'CF', 'CSCO', 'CFG', 'CMS', 'KO', 'CL', 'CAG', 'COP', 'GLW', 'DAL', 'DRE', 'DXC', 'EVRG', 'EXC', 'FAST', 'FITB', 'FE', 'HAL', 'HPE', 'HFC', 'HPQ', 'INTC', 'JCI', 'JNPR', 'KEY', 'KR', 'LEG', 'LNC', 'MXIM', 'MDLZ', 'MS', 'NTAP', 'NI', 'NUE', 'PEG', 'RF', 'SYF', 'FTI', 'UDR', 'USB', 'UNM', 'WBA', 'WU', 'XEL', 'XRX', 'ZION']
open       44.340000
div_pct     2.839643
div_amt     1.240000
eps         2.648036
dtype: float64

cluster 11:
includes 4 stocks
['BLK', 'AVGO', 'EQIX', 'LMT']
open       443.6425
div_pct      2.7200
div_amt     11.4100
eps         14.8175
dtype: float64

cluster 12:
includes 1 stocks
['BKNG']
open       2008.67
div_pct       0.00
div_amt       0.00
eps          97.36
dtype: float64

cluster 13:
includes 35 stocks
['ACN', 'APD', 'ARE', 'AMT', 'AMP', 'ADP', 'CAT', 'CLX', 'DE', 'HON', 'HII', 'ITW', 'JNJ', 'JPM', 'KLAC', 'LRCX', 'LIN', 'MTB', 'MCD', 'NEE', 'NSC', 'PKG', 'PH', 'PEP', 'PNC', 'RTN', 'ROK', 'RCL', 'SRE', 'SNA', 'TXN', 'TRV', 'UNP', 'UNH', 'WHR']
open       181.172857
div_pct      2.270571
div_amt      3.953714
eps          9.180000
dtype: float64

cluster 14:
includes 5 stocks
['CMG', 'ISRG', 'MTD', 'ORLY', 'TDG']
open       642.804
div_pct      0.000
div_amt      0.000
eps         14.996
dtype: float64

cluster 15:
includes 38 stocks
['AXP', 'ANTM', 'AON', 'AAPL', 'BDX', 'COF', 'CDW', 'CTAS', 'CME', 'COST', 'DG', 'ECL', 'EL', 'HCA', 'HUM', 'IEX', 'INTU', 'JKHY', 'KSU', 'LHX', 'MKTX', 'MAR', 'MLM', 'MA', 'MKC', 'MSFT', 'MCO', 'MSCI', 'PXD', 'RMD', 'ROP', 'SPGI', 'SBAC', 'SYK', 'TFX', 'V', 'DIS', 'WLTW']
open       219.314474
div_pct      1.013947
div_amt      2.072105
eps          7.210789
dtype: float64

cluster 16:
includes 30 stocks
['ABMD', 'ADBE', 'AAP', 'ALGN', 'ANSS', 'ANET', 'CHTR', 'CI', 'COO', 'EW', 'EA', 'FFIV', 'FB', 'FLT', 'IDXX', 'ILMN', 'LH', 'NFLX', 'NVDA', 'ODFL', 'NOW', 'TMO', 'ULTA', 'UAL', 'URI', 'UHS', 'VRSN', 'VRTX', 'WAT', 'WCG']
open       235.018000
div_pct      0.055667
div_amt      0.107333
eps          7.425000
dtype: float64

cluster 17:
includes 31 stocks
['CCL', 'CMA', 'ED', 'DOW', 'EMN', 'EIX', 'GIS', 'GM', 'GILD', 'HOG', 'IP', 'K', 'KSS', 'LVS', 'MPC', 'MET', 'TAP', 'JWN', 'OMC', 'PFE', 'PFG', 'O', 'REG', 'STX', 'SO', 'VTR', 'VZ', 'VNO', 'WFC', 'WDC', 'WRK']
open       58.530645
div_pct     4.003871
div_amt     2.306452
eps         3.771935
dtype: float64

cluster 18:
includes 1 stocks
['AMZN']
open       1795.02
div_pct       0.00
div_amt       0.00
eps          22.57
dtype: float64

Changing our approach: The Wealthy Investor technique

I don't have too much expertise with stock trading, but have been listening to a podcast lately called trading stocks made easy by Tyrone Jackson (great podcast that I'd reccomend to anyone trying to learn more). He heavily advocates for stocks which pay out a dividend, a portion of their profits that isn't reinvested into the company, but rather goes to the shareholders. Additonally, he advocates for stocks that have sshowed consistent quarterly earnings growth. Between the two, dividend yield is a part of the data that has been collected, so I decided to cluster the subset of data for stocks which do pay out a dividend

# get stocks which pay dividend
div_yielding_data = four_dim_cluster_data[four_dim_cluster_data['div_amt'] > 0].drop(columns=cluster_config_one.keys(), axis=1)
k_means_func(data=div_yielding_data)

png

# apply model for n = {12, 14, 19}
cluster_config_two = {
    'cluster_fourteen': 14,
    'cluster_nineteen': 19,
    'cluster_twenty_three': 23
}

div_yielding_data = classify_four_dim_stocks(div_yielding_data, cluster_config_two)
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_twenty_three')
cluster 0:
includes 24 stocks
['ACN', 'APD', 'AMT', 'ADP', 'CB', 'CME', 'STZ', 'DE', 'HSY', 'HON', 'ITW', 'KLAC', 'LHX', 'LIN', 'MKC', 'MCD', 'MSI', 'NEE', 'ROK', 'SWK', 'SYK', 'UNP', 'UTX', 'WLTW']
open       187.022500
div_pct      1.841250
div_amt      3.428750
eps          6.843333
dtype: float64

cluster 1:
includes 56 stocks
['AFL', 'ALK', 'ALB', 'LNT', 'AEE', 'AIG', 'AIV', 'BK', 'BMY', 'CHRW', 'CF', 'CSCO', 'C', 'CMS', 'KO', 'CL', 'COP', 'CVS', 'DAL', 'EMN', 'EMR', 'EQR', 'EVRG', 'ES', 'HIG', 'HAS', 'HFC', 'INTC', 'JCI', 'K', 'LEG', 'LNC', 'MPC', 'MXIM', 'MRK', 'MET', 'MDLZ', 'MS', 'NTAP', 'NUE', 'OMC', 'PAYX', 'PLD', 'PEG', 'QCOM', 'RHI', 'STT', 'SYF', 'SYY', 'USB', 'WBA', 'WEC', 'WFC', 'XEL', 'XRX', 'ZION']
open       64.544643
div_pct     2.723750
div_amt     1.752500
eps         3.903393
dtype: float64

cluster 2:
includes 10 stocks
['MMM', 'AMGN', 'AVB', 'BA', 'ESS', 'RE', 'HD', 'IBM', 'PSA', 'SPG']
open       222.526
div_pct      3.329
div_amt      6.878
eps          8.663
dtype: float64

cluster 3:
includes 9 stocks
['APA', 'COTY', 'DXC', 'KHC', 'NWL', 'NLSN', 'SLB', 'FTI', 'WDC']
open       28.590000
div_pct     4.234444
div_amt     1.165556
eps        -4.783333
dtype: float64

cluster 4:
includes 61 stocks
['ATVI', 'A', 'AAL', 'AOS', 'AMAT', 'ARNC', 'BLL', 'BAC', 'BAX', 'BWA', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTSH', 'CMCSA', 'CTVA', 'CSX', 'DHI', 'XRAY', 'DVN', 'DD', 'ETFC', 'EBAY', 'EXPD', 'FLIR', 'FLS', 'FBHS', 'FOXA', 'FOX', 'FCX', 'GE', 'HES', 'HRL', 'KR', 'LW', 'LEN', 'L', 'MRO', 'MAS', 'MGM', 'MOS', 'NEM', 'NWSA', 'NWS', 'NBL', 'NRG', 'ORCL', 'PNR', 'PRGO', 'PGR', 'PHM', 'PWR', 'ROL', 'SEE', 'LUV', 'TXT', 'TJX', 'WRB', 'WAB', 'XYL']
open       48.194426
div_pct     1.265738
div_amt     0.588852
eps         2.369180
dtype: float64

cluster 5:
includes 15 stocks
['AAPL', 'BDX', 'CTAS', 'COO', 'COST', 'INTU', 'MKTX', 'MLM', 'MA', 'MCO', 'MSCI', 'ROP', 'SPGI', 'TFX', 'TMO']
open       295.877333
div_pct      0.719333
div_amt      2.038667
eps          8.054667
dtype: float64

cluster 6:
includes 11 stocks
['MO', 'CTL', 'F', 'GPS', 'HP', 'IVZ', 'IRM', 'KIM', 'LB', 'OXY', 'WMB']
open       25.681818
div_pct     6.781818
div_amt     1.770909
eps         0.169091
dtype: float64

cluster 7:
includes 27 stocks
['ARE', 'AEP', 'BXP', 'CVX', 'CLX', 'ED', 'CCI', 'DRI', 'DLR', 'DTE', 'DUK', 'ETN', 'ETR', 'EXR', 'FRT', 'GPC', 'IFF', 'SJM', 'JNJ', 'KMB', 'MAA', 'PNW', 'PG', 'TXN', 'UPS', 'VLO', 'WYNN']
open       118.426296
div_pct      3.174815
div_amt      3.709259
eps          4.370370
dtype: float64

cluster 8:
includes 2 stocks
['CAH', 'NOV']
open       37.445
div_pct     2.210
div_amt     1.060
eps       -14.465
dtype: float64

cluster 9:
includes 1 stocks
['AVGO']
open       324.40
div_pct      4.01
div_amt     13.00
eps          6.43
dtype: float64

cluster 10:
includes 2 stocks
['BLK', 'LMT']
open       445.285
div_pct      2.555
div_amt     11.400
eps         23.465
dtype: float64

cluster 11:
includes 25 stocks
['AAP', 'ALLE', 'AXP', 'AON', 'COF', 'CDW', 'CI', 'CXO', 'FANG', 'DG', 'ECL', 'EL', 'FRC', 'FTV', 'GL', 'HCA', 'IEX', 'KSU', 'NVDA', 'ODFL', 'PVH', 'UHS', 'V', 'VMC', 'DIS']
open       147.0420
div_pct      0.7716
div_amt      1.1076
eps          6.7476
dtype: float64

cluster 12:
includes 28 stocks
['ABBV', 'T', 'CCL', 'D', 'DOW', 'EIX', 'XOM', 'GIS', 'GM', 'GILD', 'IP', 'KSS', 'LVS', 'TAP', 'OKE', 'PM', 'PPL', 'PFG', 'O', 'REG', 'STX', 'SLG', 'SO', 'TPR', 'VTR', 'VZ', 'WELL', 'WRK']
open       60.232500
div_pct     4.505714
div_amt     2.699643
eps         2.902500
dtype: float64

cluster 13:
includes 1 stocks
['EQIX']
open       559.60
div_pct      1.76
div_amt      9.84
eps          5.91
dtype: float64

cluster 14:
includes 10 stocks
['AMP', 'CAT', 'CMI', 'MTB', 'NSC', 'PH', 'PNC', 'RTN', 'SNA', 'WHR']
open       175.944
div_pct      2.468
div_amt      4.241
eps         12.811
dtype: float64

cluster 15:
includes 1 stocks
['SHW']
open       579.73
div_pct      0.78
div_amt      4.52
eps         14.86
dtype: float64

cluster 16:
includes 36 stocks
['AES', 'AMCR', 'ADM', 'COG', 'CPB', 'CNP', 'CFG', 'CAG', 'GLW', 'DRE', 'EXC', 'FAST', 'FITB', 'FE', 'BEN', 'HRB', 'HAL', 'HBI', 'HOG', 'HPE', 'HST', 'HPQ', 'HBAN', 'IPG', 'JNPR', 'KEY', 'KMI', 'NI', 'JWN', 'PBCT', 'PFE', 'RF', 'UDR', 'UNM', 'WU', 'WY']
open       28.308056
div_pct     3.518889
div_amt     0.972778
eps         1.740278
dtype: float64

cluster 17:
includes 1 stocks
['AGN']
open       190.50
div_pct      1.56
div_amt      2.96
eps        -27.98
dtype: float64

cluster 18:
includes 25 stocks
['ABT', 'AMG', 'AME', 'APH', 'APTV', 'DHR', 'EFX', 'EXPE', 'FDX', 'FIS', 'GPN', 'HLT', 'ICE', 'JKHY', 'JBHT', 'MCK', 'MCHP', 'NKE', 'PKI', 'RMD', 'ROST', 'SBAC', 'VRSK', 'ZBH', 'ZTS']
open       126.8680
div_pct      0.9396
div_amt      1.1628
eps          1.9892
dtype: float64

cluster 19:
includes 8 stocks
['ANTM', 'GS', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'UNH']
open       299.73875
div_pct      1.47500
div_amt      4.31000
eps         16.78125
dtype: float64

cluster 20:
includes 46 stocks
['ALL', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'CBOE', 'CE', 'CINF', 'CTXS', 'DFS', 'DOV', 'EOG', 'FMC', 'GRMN', 'IR', 'LDOS', 'LOW', 'MAR', 'MMC', 'MDT', 'MSFT', 'NDAQ', 'PCAR', 'PXD', 'PPG', 'DGX', 'RL', 'RJF', 'RSG', 'SWKS', 'SBUX', 'TGT', 'TEL', 'TIF', 'TSCO', 'TSN', 'VFC', 'WMT', 'WM', 'XLNX', 'YUM']
open       110.026087
div_pct      1.757174
div_amt      1.916739
eps          4.684783
dtype: float64

cluster 21:
includes 2 stocks
['MAC', 'M']
open       21.180
div_pct    10.490
div_amt     2.255
eps         1.835
dtype: float64
other_keys = [key for key in cluster_config_two.keys() if key != 'cluster_twenty_three']
div_yielding_agg = div_yielding_data.drop(columns=other_keys, axis=1).groupby('cluster_twenty_three').mean()
div_yielding_agg
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eps open div_pct div_amt
cluster_twenty_three
0 6.843333 187.022500 1.841250 3.428750
1 3.903393 64.544643 2.723750 1.752500
2 8.663000 222.526000 3.329000 6.878000
3 -4.783333 28.590000 4.234444 1.165556
4 2.369180 48.194426 1.265738 0.588852
5 8.054667 295.877333 0.719333 2.038667
6 0.169091 25.681818 6.781818 1.770909
7 4.370370 118.426296 3.174815 3.709259
8 -14.465000 37.445000 2.210000 1.060000
9 6.430000 324.400000 4.010000 13.000000
10 23.465000 445.285000 2.555000 11.400000
11 6.747600 147.042000 0.771600 1.107600
12 2.902500 60.232500 4.505714 2.699643
13 5.910000 559.600000 1.760000 9.840000
14 12.811000 175.944000 2.468000 4.241000
15 14.860000 579.730000 0.780000 4.520000
16 1.740278 28.308056 3.518889 0.972778
17 -27.980000 190.500000 1.560000 2.960000
18 1.989200 126.868000 0.939600 1.162800
19 16.781250 299.738750 1.475000 4.310000
20 4.684783 110.026087 1.757174 1.916739
21 1.835000 21.180000 10.490000 2.255000
22 9.195333 114.239333 2.994667 3.286000

Plotting the results

Finally! We have some simple visualization of the aggregated data for our clustered dividend yielding S&P 500 stocks. Based on these plots, I'm going to take a closer look at a few of the clusters:

  1. cluster 10/19: these clusters has the highest earnings per share on average of all clusters
  2. cluster 9/10/13: These clusters had the highest average dividend amounts per share of any cluster
  3. cluster 6/21: these clusters by far had the highest percentage dividend of any cluster

Although open value was included in the feature set (with the intention of clustering stocks based on similar cost per share), open value for an arbritrary day does not seem like a good feature to indicate a specific cluster to consider more carefully

plt.figure(figsize=(12,12))
ax1 = plt.subplot(221)
ax1.title.set_text('average EPS per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.eps)
ax2 = plt.subplot(222)
ax2.title.set_text('average dividend amount per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.div_amt)
ax3 = plt.subplot(223)
ax3.title.set_text('average dividend percentage per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.div_pct)
ax4 = plt.subplot(224)
ax4.title.set_text('average open value per cluster')
sns.barplot(x=div_yielding_agg.index, y=div_yielding_agg.open)
<matplotlib.axes._subplots.AxesSubplot at 0x1a1fe5d630>

png

Results

Although these results are far from finished, and I will need to comb through financial figures and track these stocks for more than just one day, it is clear that clustering through the K-means algorithm has allowed me to hone initial search for potentially lucrative S&P 500 stocks. This was a fun and quick 1-day venture that allowed me to get more familiar with relevant financial figures for stock trading, scraping stock data, and applying machine learning techniques to an interesting data set

# we can use the output cluster tickers function, passsing an optional parameter which specifies
# which clusters to show the tickers for.
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_twenty_three', show_tickers=[6, 9, 10, 13, 19, 21])
cluster 6:
includes 11 stocks
['MO', 'CTL', 'F', 'GPS', 'HP', 'IVZ', 'IRM', 'KIM', 'LB', 'OXY', 'WMB']
open       25.681818
div_pct     6.781818
div_amt     1.770909
eps         0.169091
dtype: float64

cluster 9:
includes 1 stocks
['AVGO']
open       324.40
div_pct      4.01
div_amt     13.00
eps          6.43
dtype: float64

cluster 10:
includes 2 stocks
['BLK', 'LMT']
open       445.285
div_pct      2.555
div_amt     11.400
eps         23.465
dtype: float64

cluster 13:
includes 1 stocks
['EQIX']
open       559.60
div_pct      1.76
div_amt      9.84
eps          5.91
dtype: float64

cluster 19:
includes 8 stocks
['ANTM', 'GS', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'UNH']
open       299.73875
div_pct      1.47500
div_amt      4.31000
eps         16.78125
dtype: float64

cluster 21:
includes 2 stocks
['MAC', 'M']
open       21.180
div_pct    10.490
div_amt     2.255
eps         1.835
dtype: float64
# we can use the output cluster tickers function, passsing an optional parameter which specifies
# which clusters to show the tickers for.
output_cluster_tickers(original_data=df, cluster_data=div_yielding_data, cluster='cluster_nineteen')
cluster 0:
includes 17 stocks
['AAPL', 'BDX', 'CI', 'CTAS', 'COO', 'COST', 'INTU', 'MKTX', 'MLM', 'MA', 'MCO', 'MSCI', 'ROP', 'SPGI', 'SYK', 'TFX', 'TMO']
open       284.572941
div_pct      0.702353
div_amt      1.936471
eps          8.352353
dtype: float64

cluster 1:
includes 58 stocks
['ALK', 'ALB', 'LNT', 'AEE', 'AEP', 'AIV', 'ADM', 'BK', 'BMY', 'CHRW', 'CSCO', 'C', 'CFG', 'CMS', 'KO', 'CL', 'COP', 'CVS', 'DAL', 'EMN', 'EMR', 'EQR', 'EVRG', 'ES', 'EXC', 'FITB', 'FE', 'GIS', 'HIG', 'HAS', 'HFC', 'INTC', 'JCI', 'K', 'LEG', 'LNC', 'MPC', 'MXIM', 'MRK', 'MET', 'MS', 'NTAP', 'NUE', 'OMC', 'PAYX', 'PFG', 'PLD', 'PEG', 'QCOM', 'STT', 'SYF', 'SYY', 'USB', 'WBA', 'WEC', 'WFC', 'XEL', 'ZION']
open       64.173103
div_pct     2.854655
div_amt     1.809828
eps         3.905172
dtype: float64

cluster 2:
includes 7 stocks
['AMP', 'CMI', 'GS', 'MTB', 'SNA', 'VNO', 'WHR']
open       162.411429
div_pct      2.827143
div_amt      4.325714
eps         15.898571
dtype: float64

cluster 3:
includes 32 stocks
['ARE', 'ADS', 'BXP', 'CVX', 'CLX', 'CMA', 'ED', 'DRI', 'DTE', 'ETN', 'ETR', 'FRT', 'GPC', 'HSY', 'SJM', 'JNJ', 'KMB', 'LLY', 'LYB', 'MAA', 'NTRS', 'PKG', 'PEP', 'PSX', 'PRU', 'RCL', 'TROW', 'TXN', 'TRV', 'UPS', 'VLO', 'WYNN']
open       119.817812
div_pct      3.031875
div_amt      3.562812
eps          6.331563
dtype: float64

cluster 4:
includes 8 stocks
['APA', 'CAH', 'CTL', 'COTY', 'KHC', 'NLSN', 'SLB', 'WDC']
open       30.72875
div_pct     4.91375
div_amt     1.39125
eps        -6.71250
dtype: float64

cluster 5:
includes 4 stocks
['BA', 'AVGO', 'EQIX', 'ESS']
open       377.2375
div_pct      2.7275
div_amt      9.7150
eps          6.3675
dtype: float64

cluster 6:
includes 56 stocks
['ALL', 'AXP', 'AWK', 'ABC', 'ADI', 'AJG', 'AIZ', 'ATO', 'AVY', 'BBY', 'BR', 'COF', 'CBOE', 'CE', 'CINF', 'CTXS', 'STZ', 'DFS', 'DOV', 'EXPE', 'FDX', 'FMC', 'GRMN', 'IR', 'IFF', 'LDOS', 'LOW', 'MAR', 'MMC', 'MKC', 'MDT', 'MCHP', 'MSFT', 'MSI', 'NDAQ', 'PCAR', 'PPG', 'PG', 'DGX', 'RL', 'RJF', 'RSG', 'SWKS', 'SWK', 'SBUX', 'TGT', 'TEL', 'TIF', 'TSCO', 'TSN', 'UTX', 'VFC', 'WMT', 'WM', 'XLNX', 'YUM']
open       116.192143
div_pct      1.759286
div_amt      2.030893
eps          4.663929
dtype: float64

cluster 7:
includes 50 stocks
['ATVI', 'A', 'AAL', 'AME', 'APH', 'AMAT', 'APTV', 'ARNC', 'BLL', 'BAX', 'BWA', 'CERN', 'SCHW', 'CHD', 'XEC', 'CTSH', 'CXO', 'CSX', 'DHI', 'XRAY', 'DVN', 'FANG', 'ETFC', 'EOG', 'EXPD', 'FLIR', 'FTV', 'FBHS', 'FOXA', 'FOX', 'GE', 'HLT', 'ICE', 'LW', 'LEN', 'L', 'MAS', 'NEM', 'NKE', 'NRG', 'PKI', 'PGR', 'PHM', 'PWR', 'LUV', 'TXT', 'TJX', 'WRB', 'WAB', 'XYL']
open       63.3910
div_pct     0.9736
div_amt     0.6050
eps         3.2894
dtype: float64

cluster 8:
includes 1 stocks
['AGN']
open       190.50
div_pct      1.56
div_amt      2.96
eps        -27.98
dtype: float64

cluster 9:
includes 2 stocks
['BLK', 'LMT']
open       445.285
div_pct      2.555
div_amt     11.400
eps         23.465
dtype: float64

cluster 10:
includes 31 stocks
['AAP', 'ALLE', 'AON', 'CDW', 'DHR', 'DG', 'ECL', 'EL', 'FIS', 'FRC', 'GL', 'GPN', 'HCA', 'IEX', 'JKHY', 'JBHT', 'KSU', 'NVDA', 'ODFL', 'PXD', 'PVH', 'RMD', 'ROST', 'SBAC', 'UHS', 'VRSK', 'V', 'VMC', 'DIS', 'ZBH', 'ZTS']
open       155.143548
div_pct      0.778065
div_amt      1.189677
eps          4.845484
dtype: float64

cluster 11:
includes 10 stocks
['MO', 'F', 'HP', 'IVZ', 'IRM', 'LB', 'MAC', 'M', 'OXY', 'WMB']
open       27.418
div_pct     7.692
div_amt     2.090
eps         1.003
dtype: float64

cluster 12:
includes 41 stocks
['ABT', 'AES', 'AFL', 'AIG', 'AOS', 'BAC', 'COG', 'CPB', 'CF', 'CMCSA', 'CAG', 'GLW', 'CTVA', 'DRE', 'DD', 'EBAY', 'FAST', 'FLS', 'FCX', 'HAL', 'HES', 'HPE', 'HRL', 'JNPR', 'KR', 'MRO', 'MGM', 'MDLZ', 'MOS', 'NWSA', 'NWS', 'NI', 'ORCL', 'PNR', 'PRGO', 'RHI', 'ROL', 'SEE', 'UDR', 'WU', 'XRX']
open       37.565122
div_pct     2.142927
div_amt     0.790732
eps         1.632439
dtype: float64

cluster 13:
includes 10 stocks
['AMGN', 'ANTM', 'RE', 'GWW', 'HUM', 'HII', 'LRCX', 'NOC', 'SHW', 'UNH']
open       326.680
div_pct      1.528
div_amt      4.660
eps         15.004
dtype: float64

cluster 14:
includes 27 stocks
['MMM', 'ACN', 'APD', 'AMT', 'ADP', 'AVB', 'CAT', 'CB', 'CME', 'DE', 'HD', 'HON', 'ITW', 'JPM', 'KLAC', 'LHX', 'LIN', 'MCD', 'NEE', 'NSC', 'PH', 'PNC', 'RTN', 'ROK', 'SRE', 'UNP', 'WLTW']
open       189.472593
div_pct      2.135185
div_amt      3.988148
eps          8.268148
dtype: float64

cluster 15:
includes 23 stocks
['ABBV', 'CCI', 'DLR', 'D', 'DOW', 'DUK', 'EIX', 'EXR', 'XOM', 'GILD', 'KSS', 'LVS', 'OKE', 'PM', 'PNW', 'O', 'REG', 'STX', 'SLG', 'SO', 'VTR', 'VZ', 'WELL']
open       77.526087
div_pct     4.353913
div_amt     3.316087
eps         2.796087
dtype: float64

cluster 16:
includes 3 stocks
['IBM', 'PSA', 'SPG']
open       161.196667
div_pct      4.833333
div_amt      7.626667
eps          8.170000
dtype: float64

cluster 17:
includes 29 stocks
['AMCR', 'T', 'CCL', 'CNP', 'BEN', 'GPS', 'GM', 'HRB', 'HBI', 'HOG', 'HST', 'HPQ', 'HBAN', 'IP', 'IPG', 'KEY', 'KIM', 'KMI', 'TAP', 'NWL', 'JWN', 'PBCT', 'PFE', 'PPL', 'RF', 'TPR', 'UNM', 'WRK', 'WY']
open       27.975517
div_pct     4.359310
div_amt     1.215172
eps         2.048621
dtype: float64

stock-data-cluster-analysis's People

Watchers

Aashray Anand avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.