Giter Club home page Giter Club logo

fastpair's Introduction

cjqf

Carson Farmer (he/him)

These days I do research and development with @textileio and @tablelandnetwork. I used to be a professor at hunter.cuny.edu and colorado.edu. I enjoy working on advanced algorithms and peer to peer protocols. I'm interested in conflict free replicated data types, linked data, composability, network resilience, maps, the distributed web, open source software development, and other stuff. You can email me at [email protected] or reach out to @carsonfarmer pretty much everywhere else.

Currently open to advisory roles for non-profits and smaller startups. Active research in data structures for distributed systems, homomorphic hashing schemes, and data-centric distributed systems.

fastpair's People

Contributors

carsonfarmer avatar cjqf avatar jgaboardi avatar pre-commit-ci[bot] avatar rakesh4g avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fastpair's Issues

min_points error

This error/bug happens when there's less pairs than the min_points parameters and we have custom dist function.

(cp_diff, cp_pair) = fp.closest_pair()
      File "/nix/store/65djwzlc0r0w59i5snby2dmfkbpjqm1h-python3.6-fastpair-2016-07-05/lib/python3.6/site-packages/fastpair/base.py", line 178, in closest_pair
        return self.closest_pair_brute_force()
      File "/nix/store/65djwzlc0r0w59i5snby2dmfkbpjqm1h-python3.6-fastpair-2016-07-05/lib/python3.6/site-packages/fastpair/base.py", line 190, in closest_pair_brute_force
        return _closest_pair_brute_force(self.points)
      File "/nix/store/65djwzlc0r0w59i5snby2dmfkbpjqm1h-python3.6-fastpair-2016-07-05/lib/python3.6/site-packages/fastpair/base.py", line 282, in _closest_pair_brute_force
        return min((dst(p1, p2), (p1, p2)) for p1, p2 in combinations(pts, r=2))
      File "/nix/store/65djwzlc0r0w59i5snby2dmfkbpjqm1h-python3.6-fastpair-2016-07-05/lib/python3.6/site-packages/fastpair/base.py", line 282, in <genexpr>
        return min((dst(p1, p2), (p1, p2)) for p1, p2 in combinations(pts, r=2))
      File "/nix/store/qw3cm7qzzvi68cl9ffz06l2xia52v07s-python3.6-scipy-1.0.0/lib/python3.6/site-packages/scipy/spatial/distance.py", line 553, in euclidean
        return minkowski(u, v, p=2, w=w)
      File "/nix/store/qw3cm7qzzvi68cl9ffz06l2xia52v07s-python3.6-scipy-1.0.0/lib/python3.6/site-packages/scipy/spatial/distance.py", line 470, in minkowski
        u_v = u - v
    TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('S648') dtype('S648') dtype('S648')

add `CODECOV_TOKEN` secret

@carsonfarmer I think you may have to "turn on" codecov for the report from CI, (see failed report upload here). You should be able to access via [https://app.codecov.io/gh/carsonfarmer/fastpair/new] and follow the instructions for adding a CODECOV_TOKEN repository secret. This is low on the list of criticalities, so no rush at all (but also does only take 5 seconds).

xref #20

[INFRA] tag `fastpair==0.1.0` causing on issue on install

While working on (3.) in #20, I am running into the problem of there being no associated dynamic version with the fastpair==0.1.0 tag, which is causing the pip install -e . to fail.

The two options (IMHO) are a conservative approach [1,2] or a more aggressive approach [2] that does it all at once.

  1. conservative approach (2 PRs):
    1. explicitly declaring version = "v0.1.0" in pyproject.toml in a first PR addressing (3.)
    2. (perhaps create a v0.1.1 tag and associated release, but this might be overkill)
    3. proceed to step 2
  2. one stroke (1 PR):
    1. delete the fastpair==0.1.0 tag (remote & local)
    2. declare dynamic = ["version"] in pyproject.toml

@carsonfarmer Let me know if (A) you have any questions/concerns here; and (B) which approach you'd like me to take.

adding & subtracting multiple points after calling ``build()``

Currently with FastPair.__add__() only a single point can be added at time after calling build():

def __add__(self, p):
    self.points.append(p)
    if self.initialized:
        self._find_neighbor(p)
    elif len(self) >= self.min_points:
        self.build()
    return self

With some modification, a collection of points could be added to the data structure:

def __add__(self, p):
    if isinstance(p, list):
        for _p in p:
            self._add_1_point(_p)
    else:
        self._add_1_point(p)
    return self

def _add_1_point(self, p):
    self.points.append(p)
    if self.initialized:
        self._find_neighbor(p)
    elif len(self) >= self.min_points:
        self.build()
    return self

Seems like the same could go for FastPair.__sub__()

Thoughts?

cc @gegen07

start work on improving testing

  • start supplemental testing schema
    • rename (and keep for now) test_fastpair.py --> test_fastpair_pre_010.py
    • initialize test_fastpair.py for breaking out updated and fleshed out tests
  • review current testing helper
    • add type hinting
    • DRY for robustness
  • add more as we think of it

interest in modernizing?

@carsonfarmer Do you have any interest in seeing fastpairs modernized (e.g., Python 312 compatibility, pyproject.toml, GHA for testing, structured formatting & linting, etc.)? If so (and as time permits), I can start making specific tickets and chipping away at it.

xref: #12

Wrong result when removed a point from the nearest pairs iteratively

Instead of merging, removing single points results in the wrong solution. The test image looks like this:
1

When I merge with this method:

fp = FastPair()
fp.build(points)

for i in range(int(len(fp) * 0.8)): # remove 80 percent of points
    dist, (a, b) = fp.closest_pair()

    c = (a[0]/2+b[0]/2, a[1]/2+b[1]/2)
    fp -= b
    fp -= a
    fp += c
    
for p in fp:
    plt.plot(p[0], p[1], "b.")

I get this result:
2

But if I remove a point from the closest pairs iteratively like this:

fp = FastPair()
fp.build(points)

for i in range(int(len(fp) * 0.8)):  # remove 80 percent of points
    dist, (a, b) = fp.closest_pair()
    fp -= a

for p in fp:
    plt.plot(p[0], p[1], "b.")

I get this output:
3

Do you have any ideas why this is happening? Input image and ipynb can be found here:
fastpair_test.zip

strange test pass & fail for `test_update_point_less_points()`

Within test_update_point_less_points(), I am not sure this final assert is testing what it's supposed to and I am concerned about it currently passing.

  • pytest passes in remote CI
  • pytest passes locally in serial run (1 core)
  • pytest fails locally in xdist run (multicore)
  • ❌ the code chunk broken out and run in jupyterlab

From what I understand, the final assert statement should be testing number of point currently in fp, and that should not be 1 because we are only adding points to fp and then updating a single point, never removing any points, where:

So I think the final line here should pass with assert len(fp) == 9, but it somehow passes with assert len(fp) == 1 in the ✅ above...


def test_update_point_less_points(self, point_set):
    ps = point_set
    fp = FastPair()
    for p in ps[:9]:
        fp += p
    assert fp.initialized is False
    old = ps[0]  # Just grab the first point...
    new = rand_tuple(len(ps[0]))
    fp._update_point(old, new)
    assert len(fp) == 1

cc @gegen07

Edit 1

I have seen failure in local serial testing, but it is not consistent.

FAILED fastpair/tests/test_fastpair.py::TestFastPairUpdatePointLessPoints::test_basic - assert 9 == 1

Use python set instead of python list for maintaining points

I noticed that point removal is O(n) as it uses list remove function. Instead one can switch to using sets, which would make it close to O(1) for removal. All other operations don't really change by switching to sets as far as I can see.

pytest failures (with 4.2.1)

Latest pytest fails on this repository.

Complains:

Fixture "PointSet" called directly. Fixtures are not meant to be called directly,
but are created automatically when test functions request them as parameters.
See https://docs.pytest.org/en/latest/fixture.html for more information about fixtures, and
https://docs.pytest.org/en/latest/deprecations.html#calling-fixtures-directly about how to update your code.

Fixture code should be updated.

boil down `basic_usage.ipynb` further - hard code points

On another note I am thinking that boiling the basic_usage demo even further by having hard-coded points would be an even better intro. Probably move the the random_points function to another demo when that happens. Leave as it for this PR.

  • boil down basic_usage.ipynb further - hard code points
  • move random_points() to an advanced notebook (high-dimensional points)

xref - #40 (comment)

`initialized` via `build()`?

The conga line structure must have min_points to be initialized from the FastPair class. However, when an instance is created via FastPair.build() there is no assumption any min_points.

Should we perhaps throw a warning if min_points are not yet met when calling .build()?

Also, here in test_iter().

move `Features` to notebook?

@carsonfarmer How do you think about having the Features section of the README moved from there into a notebook? If that sounds like a good idea, would we want that directory to be named:

  • notebooks/
  • examples/
  • demos/
  • something else?

xref #20, #35

ideas for more notebooks

ideas for more notebooks – edit for any idea further idea

  • same point set with varied distance metrics
  • same point set with varied min_points
  • direct time comparison with closest_pair vs. closest_pair_brute_force
  • demo with empirical data, probably from geodatasets

cc @gegen07

remove `dist` from `__all__`?

Does dist (scipy.spatial.distance) actually need to be listed in all? I think we can remove, but wanted to see if I'm missing something.

`fastpair` for points in disjoint sets?

I curious about the possibility of utilizing the fastpair data structure for querying nearest neighbors in disjoint sets à la cross $k$-nearest, for example. Would that even be possible with fastpair?

explicit vs. shorthand imports

@carsonfarmer Do you have any strong feelings for keeping shorthand imports within fastpair? I will defer to your preference if you'd like to keep them. My general preference is explicit importing of names where possible (I think the improved readability vastly makes up for the several saved characters of shorthand imports.

cc @gegen07

need `return` statements in `_find_neighbor()` & `_update_point()`

  • Came across this while working on #55
  • Do does there actually need to be return statements in _find_neighbor()1 and _update_point()2? The values are being set to the data structure itself, and there appears to be nothing being return in the current functional codebase.
  • The unknown here is the currently commented out merge_closest()3 method that may in fact need to return a value
  • xref:

cc @gegen07

Footnotes

  1. https://github.com/carsonfarmer/fastpair/blob/d2b38aad4597ec5d2572a9b5d2abbd60905e92fe/fastpair/base.py#L226

  2. https://github.com/carsonfarmer/fastpair/blob/d2b38aad4597ec5d2572a9b5d2abbd60905e92fe/fastpair/base.py#L259

  3. https://github.com/carsonfarmer/fastpair/blob/d2b38aad4597ec5d2572a9b5d2abbd60905e92fe/fastpair/base.py#L265

beef up docstrings

beef up docstrings

tasks:

  • review language for clarity, etc.
  • conform to numpydoc standards
  • clean up & flesh out Examples
  • start performing doctests during CI

xref:

update `README` by section

update README by section

  • Overview
  • Installation
  • Testing
  • Features -- move to a notebook?
  • License -- anything here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.