Giter Club home page Giter Club logo

ticc's Introduction

TICC

TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regularization parameter lambda and smoothness parameter beta, the window size w and the number of clusters k. TICC breaks the T timestamps into segments where each segment belongs to one of the k clusters. The total number of segments is affected by the smoothness parameter beta. It does so by running an EM algorithm where TICC alternately assigns points to clusters using a dynamic programming algorithm and updates the cluster parameters by solving a Toeplitz Inverse Covariance Estimation problem.

For details about the method and implementation see the paper [1].

Download & Setup

Download the source code, by running in the terminal:

git clone https://github.com/davidhallac/TICC.git

Using TICC

The TICC-constructor takes the following parameters:

  • window_size: the size of the sliding window
  • number_of_clusters: the number of underlying clusters 'k'
  • lambda_parameter: sparsity of the Markov Random Field (MRF) for each of the clusters. The sparsity of the inverse covariance matrix of each cluster.
  • beta: The switching penalty used in the TICC algorithm. Same as the beta parameter described in the paper.
  • maxIters: the maximum iterations of the TICC algorithm before convergence. Default value is 100.
  • threshold: convergence threshold
  • write_out_file: Boolean. Flag indicating if the computed inverse covariances for each of the clusters should be saved.
  • prefix_string: Location of the folder to which you want to save the outputs.

The TICC.fit(input_file)-function runs the TICC algorithm on a specific dataset to learn the model parameters.

  • input_file: Location of the data matrix of size T-by-n.

An array of cluster assignments for each time point is returned in the form of a dictionary with keys being the cluster_id (from 0 to k-1) and the values being the cluster MRFs.

Example Usage

See example.py.

References

[1] D. Hallac, S. Vare, S. Boyd, and J. Leskovec Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 215--223

ticc's People

Contributors

davidhallac avatar dstuck avatar heusdens97 avatar jessekolb avatar mohataher avatar rasmusfonseca avatar sagarvare avatar scoutsaachi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ticc's Issues

ValueError: operands could not be broadcast together with shapes (16,) (32,)

There is a bug in the predict function when the window_size w>1:

beginning the smoothening ALGORITHM
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-f7f6321b34d3> in <module>
----> 1 ticc.predict_clusters(X2)

~\Desktop\Appnomic\paytm_corebanking_may19_jul19_raw_csvs\raw_csvs\TICC_solver.py in predict_clusters(self, test_data)
    394                                                          self.trained_model['cluster_mean_stacked_info'],
    395                                                          test_data,
--> 396                                                          self.trained_model['time_series_col_size'])
    397 
    398         # Update cluster points - using NEW smoothening

~\Desktop\Appnomic\paytm_corebanking_may19_jul19_raw_csvs\raw_csvs\TICC_solver.py in smoothen_clusters(self, cluster_mean_info, computed_covariance, cluster_mean_stacked_info, complete_D_train, n)
    280                     cluster_mean = cluster_mean_info[self.number_of_clusters, cluster]
    281                     cluster_mean_stacked = cluster_mean_stacked_info[self.number_of_clusters, cluster]
--> 282                     x = complete_D_train[point, :] - cluster_mean_stacked[0:(self.num_blocks - 1) * n]
    283                     inv_cov_matrix = inv_cov_dict[cluster]
    284                     log_det_cov = log_det_dict[cluster]

ValueError: operands could not be broadcast together with shapes (16,) (32,)

Hide the output of impute function

Hi,

how to hide the output of impute function on console? e.g below is very long list and i want to hide it but did not fine anyway

'cpu__agg_linear_trend__f_agg_"mean"chunk_len_50__attr"intercept"'
'cpu__agg_linear_trend__f_agg
"mean"chunk_len_50__attr"rvalue"'
'cpu__agg_linear_trend__f_agg
"mean"chunk_len_50__attr"slope"'
'cpu__agg_linear_trend__f_agg
"mean"chunk_len_50__attr"stderr"'
'cpu__agg_linear_trend__f_agg
"min"chunk_len_50__attr"intercept"'
'cpu__agg_linear_trend__f_agg
"min"chunk_len_50__attr"rvalue"'
'cpu__agg_linear_trend__f_agg
"min"chunk_len_50__attr"slope"'
'cpu__agg_linear_trend__f_agg
"min"chunk_len_50__attr"stderr"'
'cpu__agg_linear_trend__f_agg
"var"chunk_len_50__attr"intercept"'
'cpu__agg_linear_trend__f_agg
"var"chunk_len_50__attr"rvalue"'
'cpu__agg_linear_trend__f_agg
"var"chunk_len_50__attr"slope"'
'cpu__agg_linear_trend__f_agg
"var"chunk_len_50__attr"stderr"'
'cpu__fft_coefficient__coeff_21__attr
"abs"'
'cpu__fft_coefficient__coeff_21__attr_"angle"'
'cpu__fft_coefficient__coeff_21__attr_"imag"'
'cpu__fft_coefficient__coeff_21__attr_"real"'
'cpu__fft_coefficient__coeff_22__attr_"abs"'
'cpu__fft_coefficient__coeff_22__attr_"angle"'
'cpu__fft_coefficient__coeff_22__attr_"imag"'
'cpu__fft_coefficient__coeff_22__attr_"real"'
'cpu__fft_coefficient__coeff_23__attr_"abs"'
'cpu__fft_coefficient__coeff_23__attr_"angle"'
'cpu__fft_coefficient__coeff_23__attr_"imag"'
'cpu__fft_coefficient__coeff_23__attr_"real"'
'cpu__fft_coefficient__coeff_24__attr_"abs"'
'cpu__fft_coefficient__coeff_24__attr_"angle"'
'cpu__fft_coefficient__coeff_24__attr_"imag"'
'cpu__fft_coefficient__coeff_24__attr_"real"'
'cpu__fft_coefficient__coeff_25__attr_"abs"'
'cpu__fft_coefficient__coeff_25__attr_"angle"'
'cpu__fft_coefficient__coeff_25__attr_"imag"'
'cpu__fft_coefficient__coeff_25__attr_"real"'
'cpu__fft_coefficient__coeff_26__attr_"abs"'
'cpu__fft_coefficient__coeff_26__attr_"angle"'
'cpu__fft_coefficient__coeff_26__attr_"imag"'
'cpu__fft_coefficient__coeff_26__attr_"real"'
'cpu__fft_coefficient__coeff_27__attr_"abs"'
'cpu__fft_coefficient__coeff_27__attr_"angle"'
'cpu__fft_coefficient__coeff_27__attr_"imag"'
'cpu__fft_coefficient__coeff_27__attr_"real"'
'cpu__fft_coefficient__coeff_28__attr_"abs"'
'cpu__fft_coefficient__coeff_28__attr_"angle"'
'cpu__fft_coefficient__coeff_28__attr_"imag"'
'cpu__fft_coefficient__coeff_28__attr_"real"'
'cpu__fft_coefficient__coeff_29__attr_"abs"'
'cpu__fft_coefficient__coeff_29__attr_"angle"'
'cpu__fft_coefficient__coeff_29__attr_"imag"'
'cpu__fft_coefficient__coeff_29__attr_"real"'
'cpu__fft_coefficient__coeff_30__attr_"abs"'
'cpu__fft_coefficient__coeff_30__attr_"angle"'
'cpu__fft_coefficient__coeff_30__attr_"imag"'
'cpu__fft_coefficient__coeff_30__attr_"real"'
'cpu__fft_coefficient__coeff_31__attr_"abs"'
'cpu__fft_coefficient__coeff_31__attr_"angle"'
'cpu__fft_coefficient__coeff_31__attr_"imag"'
'cpu__fft_coefficient__coeff_31__attr_"real"'
'cpu__fft_coefficient__coeff_32__attr_"abs"'
'cpu__fft_coefficient__coeff_32__attr_"angle"'
'cpu__fft_coefficient__coeff_32__attr_"imag"'
'cpu__fft_coefficient__coeff_32__attr_"real"'
'cpu__fft_coefficient__coeff_33__attr_"abs"'
'cpu__fft_coefficient__coeff_33__attr_"angle"'
'cpu__fft_coefficient__coeff_33__attr_"imag"'
'cpu__fft_coefficient__coeff_33__attr_"real"'
'cpu__fft_coefficient__coeff_34__attr_"abs"'
'cpu__fft_coefficient__coeff_34__attr_"angle"'
'cpu__fft_coefficient__coeff_34__attr_"imag"'
'cpu__fft_coefficient__coeff_34__attr_"real"'
'cpu__fft_coefficient__coeff_35__attr_"abs"'
'cpu__fft_coefficient__coeff_35__attr_"angle"'
'cpu__fft_coefficient__coeff_35__attr_"imag"'
'cpu__fft_coefficient__coeff_35__attr_"real"'
'cpu__fft_coefficient__coeff_36__attr_"abs"'
'cpu__fft_coefficient__coeff_36__attr_"angle"'
'cpu__fft_coefficient__coeff_36__attr_"imag"'
'cpu__fft_coefficient__coeff_36__attr_"real"'
'cpu__fft_coefficient__coeff_37__attr_"abs"'
'cpu__fft_coefficient__coeff_37__attr_"angle"'
'cpu__fft_coefficient__coeff_37__attr_"imag"'
'cpu__fft_coefficient__coeff_37__attr_"real"'
'cpu__fft_coefficient__coeff_38__attr_"abs"'
'cpu__fft_coefficient__coeff_38__attr_"angle"'
'cpu__fft_coefficient__coeff_38__attr_"imag"'
'cpu__fft_coefficient__coeff_38__attr_"real"'
'cpu__fft_coefficient__coeff_39__attr_"abs"'
'cpu__fft_coefficient__coeff_39__attr_"angle"'
'cpu__fft_coefficient__coeff_39__attr_"imag"'
'cpu__fft_coefficient__coeff_39__attr_"real"'
'cpu__fft_coefficient__coeff_40__attr_"abs"'
'cpu__fft_coefficient__coeff_40__attr_"angle"'
'cpu__fft_coefficient__coeff_40__attr_"imag"'
'cpu__fft_coefficient__coeff_40__attr_"real"'
'cpu__fft_coefficient__coeff_41__attr_"abs"'
'cpu__fft_coefficient__coeff_41__attr_"angle"'
'cpu__fft_coefficient__coeff_41__attr_"imag"'
'cpu__fft_coefficient__coeff_41__attr_"real"'
'cpu__fft_coefficient__coeff_42__attr_"abs"'
'cpu__fft_coefficient__coeff_42__attr_"angle"'
'cpu__fft_coefficient__coeff_42__attr_"imag"'
'cpu__fft_coefficient__coeff_42__attr_"real"'
'cpu__fft_coefficient__coeff_43__attr_"abs"'
'cpu__fft_coefficient__coeff_43__attr_"angle"'
'cpu__fft_coefficient__coeff_43__attr_"imag"'
'cpu__fft_coefficient__coeff_43__attr_"real"'
'cpu__fft_coefficient__coeff_44__attr_"abs"'
'cpu__fft_coefficient__coeff_44__attr_"angle"'
'cpu__fft_coefficient__coeff_44__attr_"imag"'
'cpu__fft_coefficient__coeff_44__attr_"real"'
'cpu__fft_coefficient__coeff_45__attr_"abs"'
'cpu__fft_coefficient__coeff_45__attr_"angle"'
'cpu__fft_coefficient__coeff_45__attr_"imag"'
'cpu__fft_coefficient__coeff_45__attr_"real"'
'cpu__fft_coefficient__coeff_46__attr_"abs"'
'cpu__fft_coefficient__coeff_46__attr_"angle"'
'cpu__fft_coefficient__coeff_46__attr_"imag"'
'cpu__fft_coefficient__coeff_46__attr_"real"'
'cpu__fft_coefficient__coeff_47__attr_"abs"'
'cpu__fft_coefficient__coeff_47__attr_"angle"'
'cpu__fft_coefficient__coeff_47__attr_"imag"'
'cpu__fft_coefficient__coeff_47__attr_"real"'
'cpu__fft_coefficient__coeff_48__attr_"abs"'
'cpu__fft_coefficient__coeff_48__attr_"angle"'
'cpu__fft_coefficient__coeff_48__attr_"imag"'
'cpu__fft_coefficient__coeff_48__attr_"real"'
'cpu__fft_coefficient__coeff_49__attr_"abs"'
'cpu__fft_coefficient__coeff_49__attr_"angle"'
'cpu__fft_coefficient__coeff_49__attr_"imag"'
'cpu__fft_coefficient__coeff_49__attr_"real"'
'cpu__fft_coefficient__coeff_50__attr_"abs"'
'cpu__fft_coefficient__coeff_50__attr_"angle"'
'cpu__fft_coefficient__coeff_50__attr_"imag"'
'cpu__fft_coefficient__coeff_50__attr_"real"'
'cpu__fft_coefficient__coeff_51__attr_"abs"'
'cpu__fft_coefficient__coeff_51__attr_"angle"'
'cpu__fft_coefficient__coeff_51__attr_"imag"'
'cpu__fft_coefficient__coeff_51__attr_"real"'
'cpu__fft_coefficient__coeff_52__attr_"abs"'
'cpu__fft_coefficient__coeff_52__attr_"angle"'
'cpu__fft_coefficient__coeff_52__attr_"imag"'
'cpu__fft_coefficient__coeff_52__attr_"real"'
'cpu__fft_coefficient__coeff_53__attr_"abs"'
'cpu__fft_coefficient__coeff_53__attr_"angle"'
'cpu__fft_coefficient__coeff_53__attr_"imag"'
'cpu__fft_coefficient__coeff_53__attr_"real"'
'cpu__fft_coefficient__coeff_54__attr_"abs"'
'cpu__fft_coefficient__coeff_54__attr_"angle"'
'cpu__fft_coefficient__coeff_54__attr_"imag"'
'cpu__fft_coefficient__coeff_54__attr_"real"'
'cpu__fft_coefficient__coeff_55__attr_"abs"'
'cpu__fft_coefficient__coeff_55__attr_"angle"'
'cpu__fft_coefficient__coeff_55__attr_"imag"'
'cpu__fft_coefficient__coeff_55__attr_"real"'
'cpu__fft_coefficient__coeff_56__attr_"abs"'
'cpu__fft_coefficient__coeff_56__attr_"angle"'
'cpu__fft_coefficient__coeff_56__attr_"imag"'
'cpu__fft_coefficient__coeff_56__attr_"real"'
'cpu__fft_coefficient__coeff_57__attr_"abs"'
'cpu__fft_coefficient__coeff_57__attr_"angle"'
'cpu__fft_coefficient__coeff_57__attr_"imag"'
'cpu__fft_coefficient__coeff_57__attr_"real"'
'cpu__fft_coefficient__coeff_58__attr_"abs"'
'cpu__fft_coefficient__coeff_58__attr_"angle"'
'cpu__fft_coefficient__coeff_58__attr_"imag"'
'cpu__fft_coefficient__coeff_58__attr_"real"'
'cpu__fft_coefficient__coeff_59__attr_"abs"'
'cpu__fft_coefficient__coeff_59__attr_"angle"'
'cpu__fft_coefficient__coeff_59__attr_"imag"'
'cpu__fft_coefficient__coeff_59__attr_"real"'
'cpu__fft_coefficient__coeff_60__attr_"abs"'
'cpu__fft_coefficient__coeff_60__attr_"angle"'
'cpu__fft_coefficient__coeff_60__attr_"imag"'
'cpu__fft_coefficient__coeff_60__attr_"real"'
'cpu__fft_coefficient__coeff_61__attr_"abs"'
'cpu__fft_coefficient__coeff_61__attr_"angle"'
'cpu__fft_coefficient__coeff_61__attr_"imag"'
'cpu__fft_coefficient__coeff_61__attr_"real"'
'cpu__fft_coefficient__coeff_62__attr_"abs"'
'cpu__fft_coefficient__coeff_62__attr_"angle"'
'cpu__fft_coefficient__coeff_62__attr_"imag"'
'cpu__fft_coefficient__coeff_62__attr_"real"'
'cpu__fft_coefficient__coeff_63__attr_"abs"'
'cpu__fft_coefficient__coeff_63__attr_"angle"'
'cpu__fft_coefficient__coeff_63__attr_"imag"'
'cpu__fft_coefficient__coeff_63__attr_"real"'
'cpu__fft_coefficient__coeff_64__attr_"abs"'
'cpu__fft_coefficient__coeff_64__attr_"angle"'
'cpu__fft_coefficient__coeff_64__attr_"imag"'
'cpu__fft_coefficient__coeff_64__attr_"real"'
'cpu__fft_coefficient__coeff_65__attr_"abs"'
'cpu__fft_coefficient__coeff_65__attr_"angle"'
'cpu__fft_coefficient__coeff_65__attr_"imag"'
'cpu__fft_coefficient__coeff_65__attr_"real"'
'cpu__fft_coefficient__coeff_66__attr_"abs"'
'cpu__fft_coefficient__coeff_66__attr_"angle"'
'cpu__fft_coefficient__coeff_66__attr_"imag"'
'cpu__fft_coefficient__coeff_66__attr_"real"'
'cpu__fft_coefficient__coeff_67__attr_"abs"'
'cpu__fft_coefficient__coeff_67__attr_"angle"'
'cpu__fft_coefficient__coeff_67__attr_"imag"'
'cpu__fft_coefficient__coeff_67__attr_"real"'
'cpu__fft_coefficient__coeff_68__attr_"abs"'
'cpu__fft_coefficient__coeff_68__attr_"angle"'
'cpu__fft_coefficient__coeff_68__attr_"imag"'
'cpu__fft_coefficient__coeff_68__attr_"real"'
'cpu__fft_coefficient__coeff_69__attr_"abs"'
'cpu__fft_coefficient__coeff_69__attr_"angle"'
'cpu__fft_coefficient__coeff_69__attr_"imag"'
'cpu__fft_coefficient__coeff_69__attr_"real"'
'cpu__fft_coefficient__coeff_70__attr_"abs"'
'cpu__fft_coefficient__coeff_70__attr_"angle"'
'cpu__fft_coefficient__coeff_70__attr_"imag"'
'cpu__fft_coefficient__coeff_70__attr_"real"'
'cpu__fft_coefficient__coeff_71__attr_"abs"'
'cpu__fft_coefficient__coeff_71__attr_"angle"'
'cpu__fft_coefficient__coeff_71__attr_"imag"'
'cpu__fft_coefficient__coeff_71__attr_"real"'
'cpu__fft_coefficient__coeff_72__attr_"abs"'
'cpu__fft_coefficient__coeff_72__attr_"angle"'
'cpu__fft_coefficient__coeff_72__attr_"imag"'
'cpu__fft_coefficient__coeff_72__attr_"real"'
'cpu__fft_coefficient__coeff_73__attr_"abs"'
'cpu__fft_coefficient__coeff_73__attr_"angle"'
'cpu__fft_coefficient__coeff_73__attr_"imag"'
'cpu__fft_coefficient__coeff_73__attr_"real"'
'cpu__fft_coefficient__coeff_74__attr_"abs"'
'cpu__fft_coefficient__coeff_74__attr_"angle"'
'cpu__fft_coefficient__coeff_74__attr_"imag"'
'cpu__fft_coefficient__coeff_74__attr_"real"'
'cpu__fft_coefficient__coeff_75__attr_"abs"'
'cpu__fft_coefficient__coeff_75__attr_"angle"'
'cpu__fft_coefficient__coeff_75__attr_"imag"'
'cpu__fft_coefficient__coeff_75__attr_"real"'
'cpu__fft_coefficient__coeff_76__attr_"abs"'
'cpu__fft_coefficient__coeff_76__attr_"angle"'
'cpu__fft_coefficient__coeff_76__attr_"imag"'
'cpu__fft_coefficient__coeff_76__attr_"real"'
'cpu__fft_coefficient__coeff_77__attr_"abs"'
'cpu__fft_coefficient__coeff_77__attr_"angle"'
'cpu__fft_coefficient__coeff_77__attr_"imag"'
'cpu__fft_coefficient__coeff_77__attr_"real"'
'cpu__fft_coefficient__coeff_78__attr_"abs"'
'cpu__fft_coefficient__coeff_78__attr_"angle"'
'cpu__fft_coefficient__coeff_78__attr_"imag"'
'cpu__fft_coefficient__coeff_78__attr_"real"'
'cpu__fft_coefficient__coeff_79__attr_"abs"'
'cpu__fft_coefficient__coeff_79__attr_"angle"'
'cpu__fft_coefficient__coeff_79__attr_"imag"'
'cpu__fft_coefficient__coeff_79__attr_"real"'
'cpu__fft_coefficient__coeff_80__attr_"abs"'
'cpu__fft_coefficient__coeff_80__attr_"angle"'
'cpu__fft_coefficient__coeff_80__attr_"imag"'
'cpu__fft_coefficient__coeff_80__attr_"real"'
'cpu__fft_coefficient__coeff_81__attr_"abs"'
'cpu__fft_coefficient__coeff_81__attr_"angle"'
'cpu__fft_coefficient__coeff_81__attr_"imag"'
'cpu__fft_coefficient__coeff_81__attr_"real"'
'cpu__fft_coefficient__coeff_82__attr_"abs"'
'cpu__fft_coefficient__coeff_82__attr_"angle"'
'cpu__fft_coefficient__coeff_82__attr_"imag"'
'cpu__fft_coefficient__coeff_82__attr_"real"'
'cpu__fft_coefficient__coeff_83__attr_"abs"'
'cpu__fft_coefficient__coeff_83__attr_"angle"'
'cpu__fft_coefficient__coeff_83__attr_"imag"'
'cpu__fft_coefficient__coeff_83__attr_"real"'
'cpu__fft_coefficient__coeff_84__attr_"abs"'
'cpu__fft_coefficient__coeff_84__attr_"angle"'
'cpu__fft_coefficient__coeff_84__attr_"imag"'
'cpu__fft_coefficient__coeff_84__attr_"real"'
'cpu__fft_coefficient__coeff_85__attr_"abs"'
'cpu__fft_coefficient__coeff_85__attr_"angle"'
'cpu__fft_coefficient__coeff_85__attr_"imag"'
'cpu__fft_coefficient__coeff_85__attr_"real"'
'cpu__fft_coefficient__coeff_86__attr_"abs"'
'cpu__fft_coefficient__coeff_86__attr_"angle"'
'cpu__fft_coefficient__coeff_86__attr_"imag"'
'cpu__fft_coefficient__coeff_86__attr_"real"'
'cpu__fft_coefficient__coeff_87__attr_"abs"'
'cpu__fft_coefficient__coeff_87__attr_"angle"'
'cpu__fft_coefficient__coeff_87__attr_"imag"'
'cpu__fft_coefficient__coeff_87__attr_"real"'
'cpu__fft_coefficient__coeff_88__attr_"abs"'
'cpu__fft_coefficient__coeff_88__attr_"angle"'
'cpu__fft_coefficient__coeff_88__attr_"imag"'
'cpu__fft_coefficient__coeff_88__attr_"real"'
'cpu__fft_coefficient__coeff_89__attr_"abs"'
'cpu__fft_coefficient__coeff_89__attr_"angle"'
'cpu__fft_coefficient__coeff_89__attr_"imag"'
'cpu__fft_coefficient__coeff_89__attr_"real"'
'cpu__fft_coefficient__coeff_90__attr_"abs"'
'cpu__fft_coefficient__coeff_90__attr_"angle"'
'cpu__fft_coefficient__coeff_90__attr_"imag"'
'cpu__fft_coefficient__coeff_90__attr_"real"'
'cpu__fft_coefficient__coeff_91__attr_"abs"'
'cpu__fft_coefficient__coeff_91__attr_"angle"'
'cpu__fft_coefficient__coeff_91__attr_"imag"'
'cpu__fft_coefficient__coeff_91__attr_"real"'
'cpu__fft_coefficient__coeff_92__attr_"abs"'
'cpu__fft_coefficient__coeff_92__attr_"angle"'
'cpu__fft_coefficient__coeff_92__attr_"imag"'
'cpu__fft_coefficient__coeff_92__attr_"real"'
'cpu__fft_coefficient__coeff_93__attr_"abs"'
'cpu__fft_coefficient__coeff_93__attr_"angle"'
'cpu__fft_coefficient__coeff_93__attr_"imag"'
'cpu__fft_coefficient__coeff_93__attr_"real"'
'cpu__fft_coefficient__coeff_94__attr_"abs"'
'cpu__fft_coefficient__coeff_94__attr_"angle"'
'cpu__fft_coefficient__coeff_94__attr_"imag"'
'cpu__fft_coefficient__coeff_94__attr_"real"'
'cpu__fft_coefficient__coeff_95__attr_"abs"'
'cpu__fft_coefficient__coeff_95__attr_"angle"'
'cpu__fft_coefficient__coeff_95__attr_"imag"'
'cpu__fft_coefficient__coeff_95__attr_"real"'
'cpu__fft_coefficient__coeff_96__attr_"abs"'
'cpu__fft_coefficient__coeff_96__attr_"angle"'
'cpu__fft_coefficient__coeff_96__attr_"imag"'
'cpu__fft_coefficient__coeff_96__attr_"real"'
'cpu__fft_coefficient__coeff_97__attr_"abs"'
'cpu__fft_coefficient__coeff_97__attr_"angle"'
'cpu__fft_coefficient__coeff_97__attr_"imag"'
'cpu__fft_coefficient__coeff_97__attr_"real"'
'cpu__fft_coefficient__coeff_98__attr_"abs"'
'cpu__fft_coefficient__coeff_98__attr_"angle"'
'cpu__fft_coefficient__coeff_98__attr_"imag"'
'cpu__fft_coefficient__coeff_98__attr_"real"'
'cpu__fft_coefficient__coeff_99__attr_"abs"'
'cpu__fft_coefficient__coeff_99__attr_"angle"'
'cpu__fft_coefficient__coeff_99__attr_"imag"'
'cpu__fft_coefficient__coeff_99__attr_"real"'] did not have any finite values. Filling with zeros.

thanks
shuja

Example stops at iteration zero

Hi
Could you please help me with this.
As I tried to run example, I constantly received similar result: it printed ITERATION ### 0 and did not proceed. I will be beyond grateful if you could help me figuring out what is going wrong.

generate synthetic data

Hi,

I want to ask about how do yo generate synthetic data based on previous samples?
In your paper, you mentioned that "The data is then drawn one sample at a time, conditioned on the values of the previous w − 1 samples.".
I know you generate w samples (n * w array) using "numpy.random.multinomial", but how do you generate w samples based on previous w − 1 samples?
It means how do I generate 1 sample given w - 1 samples?

Thank you for your time.

Using TICC for online clustering

Hi David,

Can we use TICC for online clustering time series? For instance, we want to identify the state the car is in during driving, given a set of learned states using TICC for batch learning.

Thanks,

`TestStringMethods.test_multiExample` fails when run alone

I have a had a look at your updated code. Interestingly enough, when running the two tests cases in TestStringMethods together, they both pass.

However, when running TestStringMethods.test_multiExample, it fails. I'm not sure why exactly. Do you have an explanation for this?

I ran that using Mac Pro machine, Python 3.6.4 (miniconda). Which machine and Python version did you use?

When test case passes, this is a snippet of the output

`# PASSED`
lam_sparse 0.11
switch_penalty 600
num_cluster 5
num stacked 5
completed getting the data



ITERATION ### 0
OPTIMIZATION for Cluster # 0 DONE!!!
OPTIMIZATION for Cluster # 1 DONE!!!
OPTIMIZATION for Cluster # 2 DONE!!!
OPTIMIZATION for Cluster # 3 DONE!!!
OPTIMIZATION for Cluster # 4 DONE!!!
length of the cluster  0 ------> 2851.       ` # <---- these numbers are different than the failed one.`
length of the cluster  1 ------> 2698
length of the cluster  2 ------> 2094
length of the cluster  3 ------> 7522
length of the cluster  4 ------> 4438
UPDATED THE OLD COVARIANCE
beginning the smoothening ALGORITHM
length of cluster # 0 --------> 2131
length of cluster # 1 --------> 1886
length of cluster # 2 --------> 3072
length of cluster # 3 --------> 7915
length of cluster # 4 --------> 4599
Done writing the figure

And that is a snippet when it fails

 `# FAILED`
lam_sparse 0.11
switch_penalty 600
num_cluster 5
num stacked 5
completed getting the data



ITERATION ### 0
OPTIMIZATION for Cluster # 0 DONE!!!
OPTIMIZATION for Cluster # 1 DONE!!!
OPTIMIZATION for Cluster # 2 DONE!!!
OPTIMIZATION for Cluster # 3 DONE!!!
OPTIMIZATION for Cluster # 4 DONE!!!
length of the cluster  0 ------> 7851.   `# <-----  really higher than when it passed`
length of the cluster  1 ------> 2246
length of the cluster  2 ------> 2261
length of the cluster  3 ------> 2613
length of the cluster  4 ------> 4632
UPDATED THE OLD COVARIANCE
beginning the smoothening ALGORITHM
length of cluster # 0 --------> 7957
length of cluster # 1 --------> 1617
length of cluster # 2 --------> 3102
length of cluster # 3 --------> 3280
length of cluster # 4 --------> 3647
Done writing the figure

The failed test case has successfully convergred, but still fails

CONVERGED!!! BREAKING EARLY!!!



TRAINING F1 score: -1 -1 -1


Failure
Traceback (most recent call last):
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/Users/motaher/dev/PycharmProjects/ORIGINAL_TICC/UnitTest.py", line 33, in test_multiExample
    self.assertEqual(sum(val), 0)
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 829, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Users/motaher/miniconda2/envs/pyliger/lib/python3.6/unittest/case.py", line 822, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: 45059.0 != 0


Ran 1 test in 53.887s

FAILED (failures=1)

That could be due to the fact that states in stored file and states coming from the algorithms were in different values?

where is the paper?

Please give the name of paper or url, Thanks!

BTW: windows (64bit)after run generate_synthetic_data.py, run TICC.py, find this in output:

RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()

Missing in car.py -> "saves a .csv file with list of the assignments for each of the timestamps to the 'k' clusters "

Hello,
Thank you for providing the code!
I am using the same type of signals as used in the paper on the car.py file. The code runs without any error generating few text files. As mentioned in the description, car.py should return a .csv file with cluster assignments for each timestamps. However, I could not find a .csv file with cluster assignments. Am I missing anything in the code or the snippet is missing?

data.tsv problem in car.py

Hello!
I'm very interested in your paper, and want to study your TICC algorithm, but the data.tsv file is missing in the car.py file, is it convenient to share the data?
thank you very much!

different BIC with same parameters

I run example with same parameters 3 times and I am getting different BICs. Initially, I wanted to try different number of clusters and based on minimum BIC, identify number of clusters and that was when I found out that every run with same parameters and dataset yielded different BICs. Could you please explain why BIC vary while parameters and data are the same.

Create a training video

Hi,

I have a request for you. Is it possible to create a YouTube training video which explains the complete concept with an example and how to interpret the results and also the comparison with other clustering algorithms.

As of now I am going through your article. But it will be good if we get a training video.

Thanks for your help,

data.tsv

hello, thanks for your paper and code , but is there a data.tsv file about the car case?

Eigenvalues did not converge

Hi, David, thanks for your great job. When I run this script with my data, I always meet the following problem:

Traceback (most recent call last):
File "example.py", line 6, in
(cluster_assignment, cluster_MRFs) = TICC.solve(window_size = 2,number_of_clusters = 4, lambda_parameter = 11e-2, beta = 600, maxIters = 100, threshold = 2e-5, write_out_file = False, input_file = fname, prefix_string = "output_folder/", num_proc=1)
File "/Users/apple/github/TICC/TICC_solver.py", line 138, in solve
val = optRes[cluster].get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge

And Before this problem, it also has this following warning:
RuntimeWarning: Degrees of freedom <= 0 for slice
S = np.cov(np.transpose(D_train) )
/Users/apple/Library/Python/2.7/lib/python/site-packages/numpy/lib/function_base.py:3093: RuntimeWarning: divide by zero encountered in double_scalars
c *= 1. / np.float64(fact)
/Users/apple/Library/Python/2.7/lib/python/site-packages/numpy/lib/function_base.py:3093: RuntimeWarning: invalid value encountered in multiply
c *= 1. / np.float64(fact)

Is there any problem of my data or anything? I hope you can help me, thanks!

Some of the value in "optRes" will be nan.

Hi,
Could you please help me with this.

I have a problem running your code on my dataset.
When I'm in Iteration 0 and calling the function "train_clusters", the return value "optRes" seems to contain some nan.
So when the function "optimize_clusters" got the value "optRes" from the function "train_clusters", it raised the error.

I did print out the value of "val" every time. Here is the output and the error information:
##################################################
ITERATION ### 0
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:497: RuntimeWarning: Degrees of freedom <= 0 for slice
/usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py:3093: RuntimeWarning: divide by zero encountered in double_scalars
c *= 1. / np.float64(fact)
/usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py:3093: RuntimeWarning: invalid value encountered in multiply
c *= 1. / np.float64(fact)
val: [0.2301 0.1450 -0.0000 ..., 0.9513 -0.0000 0.3904]
OPTIMIZATION for Cluster # 0 DONE!!!
val: [0.7449 0.0000 -0.0000 ..., 1.0000 -0.0000 0.5509]
OPTIMIZATION for Cluster # 1 DONE!!!
val: [0.9931 0.0000 0.0000 ..., 0.7676 -0.0000 0.5487]
OPTIMIZATION for Cluster # 2 DONE!!!
val: [0.4735 0.0000 0.0000 ..., 0.7480 0.0000 0.3804]
OPTIMIZATION for Cluster # 3 DONE!!!
val: [0.1999 0.1088 0.0000 ..., 0.8651 0.0000 0.5592]
OPTIMIZATION for Cluster # 4 DONE!!!
val: [0.0202 0.0314 0.0143 ..., 0.7245 0.0000 0.1511]
OPTIMIZATION for Cluster # 5 DONE!!!
val: [nan nan nan ..., nan nan nan]
OPTIMIZATION for Cluster # 6 DONE!!!
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:567: RuntimeWarning: invalid value encountered in less
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:567: RuntimeWarning: invalid value encountered in greater

LinAlgErrorTraceback (most recent call last)
in fit(self, input_file)
274 #print("opt_res: " + str(opt_res))
275 self.optimize_clusters(computed_covariance, len_train_clusters, log_det_values, opt_res,
--> 276 train_cluster_inverse)
277
278 # update old computed covariance

in optimize_clusters(self, computed_covariance, len_train_clusters, log_det_values, optRes, train_cluster_inverse)
467 X2 = S_est
468 #print(S_est)
--> 469 u, _ = np.linalg.eig(S_est)
470 cov_out = np.linalg.inv(X2)
471

/usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in eig(a)
1126 _assertRankAtLeast2(a)
1127 _assertNdSquareness(a)
-> 1128 _assertFinite(a)
1129 t, result_t = _commonType(a)
1130

/usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _assertFinite(*arrays)
215 for a in arrays:
216 if not (isfinite(a).all()):
--> 217 raise LinAlgError("Array must not contain infs or NaNs")
218
219 def _isEmpty2d(arr):

LinAlgError: Array must not contain infs or NaNs
##################################################

I assume that there are some problems when the program calculate (use) the ADMM_solver.
Do you know what's wrong with this?
Thank you for your time.

Btw, I did normalize my dataset and set lambda to 0.1, 1, ...

"NameError: global name 'GenRndGnm' is not defined" when running generate_synthetic_data.py

I am working in Anaconda and creating a Python2.7 environment. I also installed snap with "pip install snap". But when I run "python generate_synthetic_data.py", I got the error "NameError: global name 'GenRndGnm' is not defined". I also tried to install snap with "python setup.py install" by downloading snap from its homepage and got same errors. May I know any of your solutions?

AttributeError: 'module' object has no attribute 'GaussianMixture'

Hi,
When I try to run the example code, I got the error. here is the output.

shuja@shujamughal:~/Desktop/data/bsc_data/year_3/TICC$ python example.py
('lam_sparse', 0.11)
('switch_penalty', 600)
('num_cluster', 8)
('num stacked', 1)
completed getting the data
Traceback (most recent call last):
File "example.py", line 8, in
(cluster_assignment, cluster_MRFs) = ticc.fit(input_file=fname)
File "/home/shuja/Desktop/data/bsc_data/year_3/TICC/TICC_solver.py", line 75, in fit
gmm = mixture.GaussianMixture(n_components=self.number_of_clusters, covariance_type="full")
AttributeError: 'module' object has no attribute 'GaussianMixture'

Unexpected behavior for the BIC score

The BIC score:

  • always decreases with decreasing number_of_clusters
  • always decreases with increasing beta
  • always decreases with decreasing window_size
  • always decreases with increasing lambda_parameter

Is this an expected behavior for any dataset?
If so, how to find the most optimal set of hyperparameters?

Code appears to not match paper

The following line of code appears to me to not match eq. 6 in your paper, http://stanford.edu/~hallac/TICC.pdf, :

X_var = ( 1/(2*float(eta)) )*q*( numpy.diag(d + numpy.sqrt(numpy.square(d) + (4*eta)*numpy.ones(d.shape))) )*q.T

In your paper, within the square root is D^2 + 4 * rho * I, whereas in the code is numpy.square(d) + (4*eta)*numpy.ones(d.shape)), and since eta is 1/rho,

eta = 1/self.rho
, these appear to be different.

I checked the paper which your paper cites (https://arxiv.org/pdf/1111.0324.pdf), and eq. 3.9 in that paper matches eq. 6 in your paper, leading me to believe that it may be the code which is incorrect.

The way the code is currently being called rho was set at 1 so that this is not affecting the results for now, but would affect the results if rho were changed when calling ADMMSolver:

TICC/TICC_solver.py

Lines 334 to 335 in 5788a14

rho = 1
solver = ADMMSolver(lamb, self.window_size, size_blocks, 1, S)

About the training data

I saw that your traing data
idx_k = training_indices[i+k]
complete_D_train[i][k*n:(k+1)*n] = Data[idx_k][0:n]
so the data is not continuous? for example your training_indices is ...10,11,13,14,16..., there would be one complete_D_train: 10-11-13-14-16. According to my understanding of the article, it should be 10-11-12-13-14-15, 11-...-16, 13-...-18... . Any problems with my understanding?

computeF1_macro meet a problem

In function computeF1_macro :
TP= permuted_confusion_matrix[cluster,cluster]
when TP =0.0 and FP=0.0, can't get F1_score.

when train_confusion_matrix_EM =
[[0.0000 179.0000 0.0000]
[0.0000 182.0000 0.0000]
[0.0000 180.0000 0.0000]]
this line cause problem
f1_EM_tr = computeF1_macro(train_confusion_matrix_EM,matching_EM,num_clusters)

BTW:
It 's seems something wrong after/during smooth algorithm

console output :

completed solving the optimization problem for the cluster
printing the cluster len
length of the cluster 0 ------> 169
length of the cluster 1 ------> 186
length of the cluster 2 ------> 186
beginning with the DP - smoothening ALGORITHM
completed smoothening algorithm
printing the length of points in each cluster
length of cluster # 0 --------> 0
length of cluster # 1 --------> 541
length of cluster # 2 --------> 0
...
The binary accuracy at the end of iteration is: 0.42 0.5 0.42

Run Steps:
python generate_synthetic_data.py
python TICC.py
not changed any parameters

Thanks

ModuleNotFoundError: No module named 'src'

Hi,

I am getting a new error 'ModuleNotFoundError: No module named 'src''. Please let me know how I can resolve this.

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\127791\AppData\Local\Continuum\anaconda3\TICC\TICC_solver.py", line 11, in
from src.TICC_helper import *
ModuleNotFoundError: No module named 'src'

Visualization Function

Could you please define the working of Visualization function. I would visualize a custom data time series with 1440 but I cannot understand the parameters: Matrix_v, stock_number, time_steps, y_ticks. Please do elaborate on the same.

Best,
Vignesh

Assign 'weights' to multivariate time series components

Is it possible to assign 'weights' for multivariate time series components so that any patterns or change in patterns in one component are given more weightage than the others?
For eg. -- Between heart-rate and velocity, the heart-rate contributes more towards determining the state of the system.

Prediction Method Input

It seems that the prediction method input takes in an array of size [m, n*w] where m is the number of time steps (rows in the file), n is the number of features (columns in the file), and w is the window size. The documentation seems to suggest that the input should be of size [m, n] during testing. I was wondering if you could clarify the correct intention?

Thanks!

Gaussian inverse covariance

Hi,
I have a problem for Gaussian inverse covariance matrix theta. I understand that you use this matrix to represent the MRF. In my opinion, this matrix should be the adjacent matrix of the MRF, but when I calculate the degree matrix of theta, I find something wrong. You can take a look at this example:
https://imgur.com/a/iGR4Unt
The degree of sensor 3 for the first layer should be 2, not 3. We can solve this problem by transpoing theta. So my conclusion is the adjacent matrix for the MRF should be the transposed theta, which is aslo a Toeplitz matrix (We should put A0 ~ Aw-1 at the first row, not the first column). Thank you for your time.
Btw, I'm curious about which answer you'll get when you calculate TICC, the original theta or the transposed theta? (Both are Toeplitz matrix.)

What does cluster_MRFs represent?

What does each element in a matrix corresponding to each cluster in the cluster_MRFs represent? Is that equivalent to inverse covariance matrix? Also, there are almost no 'ZERO' elements in the matrix. Do we approximate the near zero elements as zero? Or Should we consider it as a fully connected network?

1d case

Hello!

would you be so kind to clarify is your library is capable to work with 1d data?

example.py iteration problem

Hello,
Could you please help me with this.

Here are some situations about my computer: Anaconda3,Python 2.7.13,sklearn 0.19,
Win10(64bit), RAM 8G.

First of all, I ran your code example.py, but the following error occurred, and python process hanged, so I suspect that the problem is the amount of data, but when I reduce the amount of data to 402 lines, the problem still exists.

UserWarning:
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called before pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
warnings.warn(_use_error_msg)
lam_sparse 0.11
switch_penalty 600
num_cluster 8
num stacked 1
completed getting the data
ITERATION ### 0

During this period, my computer's memory usage rate was about 66%, and the CPU usage rate was about 26%. Next I started debug and found that the problem may appear in the 139-line loop in TICC_solver.py, but I can not solve it.

for cluster in xrange(num_clusters):
if optRes[cluster] == None:
continue
val = optRes[cluster].get()
print "OPTIMIZATION for Cluster #", cluster,"DONE!!!"

Then I started using @mohataher ‘s improved code and it worked (402 rows of data). However, when looping to 42 passes, the following error occurred.

ITERATION ### 42
OPTIMIZATION for Cluster # 0 DONE!!!
D:/Workplace/tool_wear_pred/TICC/TICC_solver.py:337: RuntimeWarning: Degrees of freedom <= 0 for slice
S = np.cov(np.transpose(D_train))
D:\Anaconda\envs\TICC\lib\site-packages\numpy\lib\function_base.py:3088: RuntimeWarning: divide by zero encountered in double_scalars
c *= 1. / np.float64(fact)
D:\Anaconda\envs\TICC\lib\site-packages\numpy\lib\function_base.py:3088: RuntimeWarning: invalid value encountered in multiply
c *= 1. / np.float64(fact)

Traceback (most recent call last):

File "", line 1, in
runfile('D:/Workplace/tool_wear_pred/TICC/TICC_solver.py', wdir='D:/Workplace/tool_wear_pred/TICC')
File "D:\Anaconda\envs\TICC\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "D:\Anaconda\envs\TICC\lib\site-packages\spyder\utils\site\sitecustomize.py", line 86, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "D:/Workplace/tool_wear_pred/TICC/TICC_solver.py", line 383, in
(cluster_assignment, cluster_MRFs) = ticc.fit(input_file=fname)
File "D:/Workplace/tool_wear_pred/TICC/TICC_solver.py", line 125, in fit
train_cluster_inverse)
File "D:/Workplace/tool_wear_pred/TICC/TICC_solver.py", line 301, in optimize_clusters
val = optRes[cluster].get()
File "D:\Anaconda\envs\TICC\lib\multiprocessing\pool.py", line 567, in get
raise self._value

LinAlgError: Eigenvalues did not converge

So I hope you can give me some help or advice so that it can be solved. I keenly appreciate your time!

Implement `predict()` method?

Hello, great work on your paper.

Results of the algorithm look promising. Unfortunately, it seems like I'll have to call solve() method with all training dataset to cluster any new data every time. That's not efficient.

Now I have few questions:

  1. Is it possible to implement predict method on an already trained model?
  2. Could we add load_model and save_model methods?
  3. Will you be following scikit-learn interfaces in the future (__init__,fit, predict, etc)?

I could definitely contribute but after 2 hours of attempting to refactor the code, it seems like there are plenty of dependent variables and super long methods. How can I help?

Why there have edges between blue layer and orange layer in the figure 1?

figure
The figure in the TICC paper ,you said each cluster is characterized by a correlation network, or MRF, defined over a short window of size w,so the layer at time t might effect the other layer at time t+1, just like thera are edges between blue layer and purple layer or purple layer and orange layer.
But as MRF network defined, why there are edges between blue and orange layer in cluster A?I can't understand。I hope you can help me.Tanks.

AttributeError: 'module' object has no attribute 'semidefinite'

File "", line 1, in
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 194, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Users/guoxilong/PycharmProjects/TICC/TICC-master/paper code/TICC.py", line 457, in
theta = semidefinite(probSize,name='theta')
AttributeError: 'module' object has no attribute 'semidefinite'

How to fix it? I'm looking forward to your reply.

No module named 'TICC_solver'

Hi

I am getting the below error while running the following script. from TICC_solver import TICC

ModuleNotFoundError: No module named 'TICC_solver'

I am able to import TICC sucessfully. But having problem with TICC_solver. Could you please help me.

I have installed TICC using "git clone [email protected]:davidhallac/TICC.git"

Thanks,

Warning and Can't converge problem

Hi David,

I applied your algorithm to my data set and found two issues:

  1. I sometimes obtain the warning like this:
    /usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:2476: RuntimeWarning: Degrees of freedom <= 0 for slice
    warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
    Is it serious warning? should I stop or just ignore it.

  2. Oscillation between solutions
    ('\n\n\nITERATION ###', 15)
    ('OPTIMIZATION for Cluster #', 0, 'DONE!!!')
    ('OPTIMIZATION for Cluster #', 2, 'DONE!!!')
    ('OPTIMIZATION for Cluster #', 3, 'DONE!!!')
    ('length of the cluster ', 0, '------>', 739)
    ('length of the cluster ', 1, '------>', 0)
    ('length of the cluster ', 2, '------>', 20)
    ('length of the cluster ', 3, '------>', 20)
    ('length of the cluster ', 4, '------>', 0)
    UPDATED THE OLD COVARIANCE
    beginning the smoothening ALGORITHM
    ('cluster that is zero is:', 1, 'selected cluster instead is:', 0)
    ('cluster that is zero is:', 4, 'selected cluster instead is:', 3)
    ('length of cluster #', 0, '-------->', 739)
    ('length of cluster #', 1, '-------->', 20)
    ('length of cluster #', 2, '-------->', 0)
    ('length of cluster #', 3, '-------->', 0)
    ('length of cluster #', 4, '-------->', 20)
    Done writing the figure

    ('\n\n\nITERATION ###', 16)
    ('OPTIMIZATION for Cluster #', 0, 'DONE!!!')
    ('OPTIMIZATION for Cluster #', 1, 'DONE!!!')
    ('OPTIMIZATION for Cluster #', 4, 'DONE!!!')
    ('length of the cluster ', 0, '------>', 739)
    ('length of the cluster ', 1, '------>', 20)
    ('length of the cluster ', 2, '------>', 0)
    ('length of the cluster ', 3, '------>', 0)
    ('length of the cluster ', 4, '------>', 20)
    UPDATED THE OLD COVARIANCE
    beginning the smoothening ALGORITHM
    ('cluster that is zero is:', 2, 'selected cluster instead is:', 0)
    ('cluster that is zero is:', 3, 'selected cluster instead is:', 4)
    ('length of cluster #', 0, '-------->', 739)
    ('length of cluster #', 1, '-------->', 0)
    ('length of cluster #', 2, '-------->', 20)
    ('length of cluster #', 3, '-------->', 20)
    ('length of cluster #', 4, '-------->', 0)
    Done writing the figure

    The runs continuously oscillate between two outcomes (739, 0, 20, 20, 0) and (739, 20, 0, 0, 20).
    Should I stop the run and choose one of them? What happens here and how to avoid this?

Thanks,

"lle" in TICC.py

I am trying to understand the TICC code for the KDD’17 paper and have a question as following:
In TICC.py:
Line 539:
lle = np.dot( x.reshape([1,(num_blocks-1)n]), np.dot(inv_cov_matrix,x.reshape([n(num_blocks-1),1])) ) + log_det_cov

May I know whether “lle” refers to the equation (2) in the paper? If yes, is a minus “-” necessary before the first np.dot() because of the equation (2) in the paper?

fewer data points

I noticed that in example.py, clustered points are fewer by window size than main input data(i.e., output=input-windowsize) could you please help me identify which points are excluded?and why these points are excluded? I keenly appreciate your time.

Finding Optimal Parameters?

Hi,

Could you describe the process by which you find optimal values of Lambda, Beta, Window Size, Number of Clusters etc. to pass into TICC for an arbitrary dataset? Currently, we are trying to brute force/grid search over a large set of parameter values and calculate BIC values for each set, then look for the minimum BIC value, but this doesn't seem optimal. How were you able to calculate your optimal parameters for your dataset and how would you recommend finding optimal parameters for any arbitrary dataset?

Best,
Thushan

Betweeness Score

Hi David
I appreciate your paper, it is so promising. I already applied it on my data set and got good result. But for identification of clusters and make it more reproducible i need to calculate betweenness score and look if my clusters have "signature". I will be really grateful if you could provide calculation of betweenness score code in example as I am very new to python.

Interpretation of the output "train_cluster_inverse" (MRF)

Hi!

Im currently studying in Aachen, Germany and came around your Code on a Project. However i have an issue interpreting the outputs of the TICC_solver.fit function. I want to study the correlation of the different input variables of a timeseries data set and im having trouble understanding what values exactly should be expected from the output MRF and therefor dont know how to analyze the results.
For example i dont know how to deal with negative returned values in the MRF-matrix. Do they suggest a negative proportionality or should all the values of the MRF just be normalized e.g. to values between 0 and 1?
I read your paper on the TICC algorithm and try'd to work my way through the code to find an answer, but didn't get lucky till now.

I hope you can help me with my problems.

Greetings from Germany

p.s. please excuse my bad English

Eigenvalues did not converge

Hello!
because I am trying to test my data in your example.py
but I got an error:
numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge
Do you know what's wrong with this?
Thank you very much

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I was following the readme.md and when I was executing this
(cluster_assignment, cluster_MRFs) = TICC_solver.solve(window_size = 10,number_of_clusters = 5, lambda_parameter = 11e-2, beta = 400, maxIters = 70, threshold = 2e-5, write_out_file = True, input_file = "data.csv", prefix_string = "output_folder/")
it throws me error:
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

I am using anaconda python 2.7 with cvxpy and snap just installed today.

Returning output instead of writing to disk

It would be useful if TICC could be used without generating extra files and directories. The following lines can be removed without any apparent change in functionality:

TICC/TICC_solver.py

Lines 62 to 68 in c8296df

str_NULL = prefix_string + "lam_sparse=" + str(lam_sparse) + "maxClusters=" +str(num_clusters+1)+"/"
if not os.path.exists(os.path.dirname(str_NULL)):
try:
os.makedirs(os.path.dirname(str_NULL))
except OSError as exc: # Guard against race condition of path already existing
if exc.errno != errno.EEXIST:
raise

Results are also written into results text-files which is not always what a user of this code wants (e.g. the second argument to the RunTicc). Maybe just supporting output_filename=None would be useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.