iesl / grinch Goto Github PK
View Code? Open in Web Editor NEWScalable Hierarchical Clustering with Tree Grafting
License: Apache License 2.0
Scalable Hierarchical Clustering with Tree Grafting
License: Apache License 2.0
Give you a simple example: [1,2, 10,11,12, 54]. Your python code generates a completely wrong clustering tree.
I figured out the starting insert a new data point from 10 becomes completely misleading. It breaks 1 and 2 pairs which should be not!
In fact to get valid nearest neighbor you need search up the whole tree without breaking tighter sibling pairs.
Your introduction example using high dimension random numbers fool yourself and others who try to validate the algorithm.
Without correction I think that your work is not serious at all!
Hi there,
The python version of the code is currently broken. There seem to be an assertion that fails. See the full error below.
$ python src/python/grinch/run_grinch_example.py
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose 'Don't visualize my results'
wandb: Offline run mode, not syncing to the cloud.
wandb: W&B syncing is set to `offline` in this directory. Run `wandb online` to enable cloud syncing.
/Users/istefan/workspace/grinch/src/python/grinch/run_grinch_example.py:9: DeprecationWarning: This function is deprecated. Please call randint(0, 10 + 1) instead
point_labels = np.random.random_integers(0,10,100)
INFO:absl:[Grinch] points (100, 5)
INFO:absl:Using centroid = l2
INFO:absl:Using csim = dot
grinch_build_dendrogram: 1%| | 1/100 [00:00<00:00, 597.73it/s]
Traceback (most recent call last):
File "/Users/istefan/workspace/grinch/src/python/grinch/run_grinch_example.py", line 12, in <module>
grinch.build_dendrogram()
File "/Users/istefan/workspace/grinch/src/python/grinch/grinch_alg.py", line 507, in build_dendrogram
self.insert(i)
File "/Users/istefan/workspace/grinch/src/python/grinch/grinch_alg.py", line 520, in insert
parent = self.node_from_nodes(sib, i)
File "/Users/istefan/workspace/grinch/src/python/grinch/grinch_alg.py", line 487, in node_from_nodes
assert self.next_node_id >= self.max_num_points
AssertionError
wandb: Waiting for W&B process to finish, PID 51394
wandb: Program failed with code 1.
wandb: Find user logs for this run at: /Users/istefan/workspace/grinch/wandb/offline-run-20210107_185142-15tc99ou/logs/debug.log
wandb: Find internal logs for this run at: /Users/istefan/workspace/grinch/wandb/offline-run-20210107_185142-15tc99ou/logs/debug-internal.log
wandb: Run summary:
wandb: time/search_time 0
wandb: time/rotate_time 0
wandb: time/graft_time 0
wandb: time/update_time 0
wandb: time/graft_score_only_time 0
wandb: time/graft_search_time 0
wandb: time/graft_get_comparison_scores 0
wandb: time/find_dependent_update_nodes_time 0
wandb: time/mark_for_lazy_update_time 0
wandb: time/time_getting_descendants 0
wandb: time/lca_time 0
wandb: time/centroid_time 0
wandb: rotate/total_performed 0
wandb: rotate/total_considered 0
wandb: rotate/inst_performed 0
wandb: rotate/inst_considered 0
wandb: graft/total_performed 0
wandb: graft/total_considered 0
wandb: graft/total_allowable 0
wandb: graft/percent_allowable 0.0
wandb: graft/inst_performed 0
wandb: point_counter 0
wandb: _step 0
wandb: _runtime 4
wandb: _timestamp 1610045506
wandb: Run history:
wandb: time/search_time ▁
wandb: time/rotate_time ▁
wandb: time/graft_time ▁
wandb: time/update_time ▁
wandb: time/graft_score_only_time ▁
wandb: time/graft_search_time ▁
wandb: time/graft_get_comparison_scores ▁
wandb: time/find_dependent_update_nodes_time ▁
wandb: time/mark_for_lazy_update_time ▁
wandb: time/time_getting_descendants ▁
wandb: time/lca_time ▁
wandb: time/centroid_time ▁
wandb: rotate/total_performed ▁
wandb: rotate/total_considered ▁
wandb: rotate/inst_performed ▁
wandb: rotate/inst_considered ▁
wandb: graft/total_performed ▁
wandb: graft/total_considered ▁
wandb: graft/total_allowable ▁
wandb: graft/percent_allowable ▁
wandb: graft/inst_performed ▁
wandb: point_counter ▁
wandb: _step ▁
wandb: _runtime ▁
wandb: _timestamp ▁
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /Users/istefan/workspace/grinch/wandb/offline-run-20210107_185142-15tc99ou
In case this matters for debugging, these are the packages I have installed in my environment:
$ pip list
Package Version
--------------- ---------
absl-py 0.11.0
certifi 2020.12.5
chardet 4.0.0
click 7.1.2
configparser 5.0.1
docker-pycreds 0.4.0
gitdb 4.0.5
GitPython 3.1.12
idna 2.10
numpy 1.19.5
pip 20.3.1
promise 2.3
protobuf 3.14.0
psutil 5.8.0
python-dateutil 2.8.1
PyYAML 5.3.1
requests 2.25.1
scipy 1.6.0
sentry-sdk 0.19.5
setuptools 51.0.0
shortuuid 1.0.1
six 1.15.0
smmap 3.0.4
subprocess32 3.5.4
tqdm 4.55.1
urllib3 1.26.2
wandb 0.10.12
watchdog 1.0.2
wheel 0.36.1
Please let me know if you have suggestions how to fix this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.