Analyzing the performance of different clustering algorithms with increasing dimensions

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us page for a list of core contributors.

It is currently maintained by a team of volunteers.

Website: https://scikit-learn.org

Installation

Dependencies

scikit-learn requires:

Python (>= 3.9)
NumPy (>= 1.19.5)
SciPy (>= 1.6.0)
joblib (>= 1.2.0)
threadpoolctl (>= 3.1.0)

Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4. scikit-learn 1.0 and later require Python 3.7 or newer. scikit-learn 1.1 and later require Python 3.8 or newer.

Scikit-learn plotting capabilities (i.e., functions start with plot_ and classes end with Display) require Matplotlib (>= 3.3.4). For running the examples Matplotlib >= 3.3.4 is required. A few examples require scikit-image >= 0.17.2, a few examples require pandas >= 1.1.5, some examples require seaborn >= 0.9.0 and plotly >= 5.14.0.

User installation

If you already have a working installation of NumPy and SciPy, the easiest way to install scikit-learn is using pip:

pip install -U scikit-learn

or conda:

conda install -c conda-forge scikit-learn

The documentation includes more detailed installation instructions.

Changelog

See the changelog for a history of notable changes to scikit-learn.

Development

We welcome new contributors of all experience levels. The scikit-learn community goals are to be helpful, welcoming, and effective. The Development Guide has detailed information about contributing code, documentation, tests, and more. We've included some basic information in this README.

Important links

Official source code repo: https://github.com/scikit-learn/scikit-learn
Download releases: https://pypi.org/project/scikit-learn/
Issue tracker: https://github.com/scikit-learn/scikit-learn/issues

Source code

You can check the latest sources with the command:

git clone https://github.com/scikit-learn/scikit-learn.git

Contributing

To learn more about making a contribution to scikit-learn, please see our Contributing guide.

Testing

After installation, you can launch the test suite from outside the source directory (you will need to have pytest >= 7.1.2 installed):

pytest sklearn

See the web page https://scikit-learn.org/dev/developers/contributing.html#testing-and-improving-test-coverage for more information.

Random number generation can be controlled during testing by setting the SKLEARN_SEED environment variable.

Submitting a Pull Request

Before opening a Pull Request, have a look at the full Contributing page to make sure your code complies with our guidelines: https://scikit-learn.org/stable/developers/index.html

Project History

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us page for a list of core contributors.

The project is currently maintained by a team of volunteers.

Note: scikit-learn was previously referred to as scikits.learn.

Help and Support

Documentation

HTML documentation (stable release): https://scikit-learn.org
HTML documentation (development version): https://scikit-learn.org/dev/
FAQ: https://scikit-learn.org/stable/faq.html

Communication

Mailing list: https://mail.python.org/mailman/listinfo/scikit-learn
Logos & Branding: https://github.com/scikit-learn/scikit-learn/tree/main/doc/logos
Blog: https://blog.scikit-learn.org
Calendar: https://blog.scikit-learn.org/calendar/
Twitter: https://twitter.com/scikit_learn
Stack Overflow: https://stackoverflow.com/questions/tagged/scikit-learn
GitHub Discussions: https://github.com/scikit-learn/scikit-learn/discussions
Website: https://scikit-learn.org
LinkedIn: https://www.linkedin.com/company/scikit-learn
YouTube: https://www.youtube.com/channel/UCJosFjYm0ZYVUARxuOZqnnw/playlists
Facebook: https://www.facebook.com/scikitlearnofficial/
Instagram: https://www.instagram.com/scikitlearnofficial/
TikTok: https://www.tiktok.com/@scikit.learn
Mastodon: https://mastodon.social/@[email protected]
Discord: https://discord.gg/h9qyrK8Jc8

Citation

If you use scikit-learn in a scientific publication, we would appreciate citations: https://scikit-learn.org/stable/about.html#citing-scikit-learn

	cpdef initialize_node_queue(
	self,
	Tree tree,
	object X,
	const float64_t[:, ::1] y,
	const float64_t[:] sample_weight=None,
	const unsigned char[::1] missing_values_in_feature_mask=None,
	):
	"""Initialize a list of roots"""
	X, y, sample_weight = self._check_input(X, y, sample_weight)

	# organize samples by decision paths
	paths = tree.decision_path(X)
	cdef intp_t PARENT
	cdef intp_t CHILD
	cdef intp_t i
	false_roots = {}
	X_copy = {}
	y_copy = {}
	for i in range(X.shape[0]):
	# collect depths from the node paths
	depth_i = paths[i].indices.shape[0] - 1
	PARENT = depth_i - 1
	CHILD = depth_i

	# find leaf node's & their parent node's IDs
	if PARENT < 0:
	parent_i = 0
	else:
	parent_i = paths[i].indices[PARENT]
	child_i = paths[i].indices[CHILD]
	left = 0
	if tree.children_left[parent_i] == child_i:
	left = 1 # leaf node is left child

	# organize samples by the leaf they fall into (false root)
	# leaf nodes are marked by parent node and
	# their relative position (left or right child)
	if (parent_i, left) in false_roots:
	false_roots[(parent_i, left)][0] += 1
	X_copy[(parent_i, left)].append(X[i])
	y_copy[(parent_i, left)].append(y[i])
	else:
	false_roots[(parent_i, left)] = [1, depth_i]
	X_copy[(parent_i, left)] = [X[i]]
	y_copy[(parent_i, left)] = [y[i]]

	X_list = []
	y_list = []

	# reorder the samples according to parent node IDs
	for key, value in reversed(sorted(X_copy.items())):
	X_list = X_list + value
	y_list = y_list + y_copy[key]
	cdef object X_new = np.array(X_list)
	cdef cnp.ndarray y_new = np.array(y_list)

	# initialize the splitter using sorted samples
	cdef Splitter splitter = self.splitter
	splitter.init(X_new, y_new, sample_weight, missing_values_in_feature_mask)

	# convert dict to numpy array and store value
	self.initial_roots = np.array(list(false_roots.items()))

	if initial_roots is None:
	# Recursive partition (without actual recursion)
	splitter.init(X, y, sample_weight, missing_values_in_feature_mask)

	if tree.max_depth <= 10:
	init_capacity = <intp_t> (2 ** (tree.max_depth + 1)) - 1
	else:
	init_capacity = 2047

	tree._resize(init_capacity)
	first = 1
	else:
	# convert numpy array back to dict
	false_roots = {}
	for key_value_pair in initial_roots:
	false_roots[tuple(key_value_pair[0])] = key_value_pair[1]

	# reset the root array
	self.initial_roots = None

neurodata / scikit-learn Goto Github PK

scikit-learn's Introduction

Installation

Dependencies

User installation

Changelog

Development

Important links

Source code

Contributing

Testing

Submitting a Pull Request

Project History

Help and Support

Documentation

Communication

Citation

scikit-learn's People

Contributors

Stargazers

Watchers

Forkers

scikit-learn's Issues

problem

possible soln

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Description

Describe the workflow you want to enable

Describe your proposed solution

Changes to the Cython codebase

Changes to the Python API

Additional context

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Describe the workflow you want to enable

Describe your proposed solution

Additional context

Description

Steps/Code to Reproduce

Recommend Projects

Recommend Topics

Recommend Org