ngroebner / specufex_processing Goto Github PK
View Code? Open in Web Editor NEWPre and post processing utilities for specufex workflow
Pre and post processing utilities for specufex workflow
Leaving a note here per the discussion. Also make a copy of yaml file in the output directory for the hdf5s for future reference
Make a script in MCW to calculate energy of waveforms and save to the waveforms hdf5.
Use this issue to discuss and create different ways the processing package could be/is used. For example, you could explain how you would like to use this for your current experiments, or list a number of use cases that may be helpful for others.
Full error message is attached
Traceback (most recent call last):
File "../3_runSpecUFEx.py", line 120, in <module>
nmf.fit(X[sample], verbose=1)
File "/Users/theresasawi/opt/anaconda3/envs/seismo2/lib/python3.7/site-packages/specufex/nmf.py", line 38, in fit
self.h01 = 1/self.num_pat
TypeError: unsupported operand type(s) for /: 'int' and 'tuple'
Let user specify a setting in config.py for resampling the waveforms in the 1_makwWaveformDataset script. Add the resampling factor the the hdf5 attributes.
Change in --
def get_representative_waveforms(cluster_name,col_name,norm_waveforms,time_vals,n_clust,df_merged_sub,
for count in range(start_clust,n_clust+1):
plt.figure(10)
tmp = df_merged_sub.loc[df_merged_sub[cluster_name]==count,['Index_Exp',col_name]]
#print(tmp.values,count)
num_ev_ac = np.min([num_events,tmp.index.shape[0]])
x_tmp2, y_tmp2 = np.meshgrid(tmp.Index_Exp.values,tmp.Index_Exp.values)
dist_matrix_subset = dist_matrix[x_tmp2,y_tmp2]
dist_matrix_subset_indx = np.argmin(np.mean(dist_matrix_subset,axis=1))
#pdb.set_trace()
print(f'Using {dist_matrix_subset_indx} as the most get_representative_waveforms')
tmp2 = np.random.choice(tmp.Index_Exp.values,num_ev_ac,replace=False) # --- Don't need this - instead find the 40 min mean dist indices ... if pass that to tmp2, rest should be fine
in calculate_energy.py
Missing the requirements at present to install and run the code from just pip install
Add an option in the config file and in the preprocessing scripts to set a seed for the random number generator. To allow deterministic recalculations of the model parameters.
Create a base class for an interface between fingerprint outputs (in hdf5 file), clustering algorithms, and cluster assignment outputs. Should look something like:
class Cluster:
def read(filename):
def cluster(number of clusters):
def write(filename):
Break the NMF and HMM steps into 2 separate files so that a new HMM configuration (eg, number of states) can be calculated without rerunning the NMF step. Also allows multiple versions of the NMF to be calculated but this is probably less important.
Replace the original catalog with an updated one (with energy/entropy appended) after running energy_ .py -> so, that energy is in the clustering catalogs
(Nate- you may have already done this)
Should the random choices in 3_runSpecUFEx (line 119 and 130) have replace=False
?
e.g.,
sample = np.random.choice(X.shape[0], specparams["nmf_batchsz"],replace=False)
"replace : boolean, optional. Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times."
https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.