Comments (7)
Hey!
If the issue is: make things faster, then that'd typically accepted to be worked on.
If you have particular code examples that make things faster, preferably in a backwards compatible way, then a PR is I think the best way to discuss them?
Closing, likely open to a PR though!
from dice-coefficient.
This may not be backwards compatible, based on what you said in another issue. There, you basically said that the official interface is:
interface Sliceable {
slice(i: number, j: number): any;
}
function diceCoefficient(a: Sliceable, b: Sliceable): number;
If that's the case then yeah, this breaks compatibility, because the new interface is basically:
interface Indexable {
[key: number]: any;
}
function diceCoefficient(a: Indexable , b: Indexable): number;
In which case, I guess, if you don't want to merge this change, then I can publish it as a new lib.
from dice-coefficient.
Ah, I see. Just looked at this in your readme:
Instead of strings you can also pass lists of bigrams.
This can improve performance when processing the same strings repeatedly.diceCoefficient(['ab', 'bc'], ['xy', 'yz']) // => 0 diceCoefficient(['ab', 'bc'], ['ab', 'bc']) // => 1 diceCoefficient(['ab', 'bc'], ['AB', 'BC']) // => 1
So the ability to take a string[] instead of a string was just a speed optimization, right?
Well I couldn't use that one because I'd have to cache them in memory and the sheer number of bigrams made Node run out of memory. That's why I came up with this trick instead.
So if arrays are only there for a speed optimization, and this actually has basically the same performance, and you don't mind breaking compatibility with a new major version bump, then yeah I'd recommend merging my changes.
But separately from that, there might be copyright issues, since my modifications to your library are, I think, copyrighted by my client.
from dice-coefficient.
there might be copyright issues
This is quite important. It’s something that you should decide on with the people that pay you. If they do not see modifications as open source, then this discussion is tainting this project with copyrighted material.
based on what you said in another issue.
That’s a different project.
Well I couldn't use that one because I'd have to cache them in memory and the sheer number of bigrams made Node run out of memory
It’s hard for me to evaluate this statement because you have tried some code out and I have no clue what that code was.
and this actually has basically the same performance, and you don't mind breaking compatibility with a new major version bump, then yeah I'd recommend merging my changes.
When they both have the same performance, why not stick with the existing code?
from dice-coefficient.
This is quite important. It’s something that you should decide on with the people that pay you. If they do not see modifications as open source, then this discussion is tainting this project with copyrighted material.
I'm pretty sure the way software copyright law works is that the code within any PR, whether by an individual or company, is owned by whoever sent the PR, even after you merge it in. This is why big projects have you sign an agreement giving them copyright over your code when you submit PRs. Otherwise, they have a hard time enacting certain decisions. But since I didn't submit a PR, there's just an idea which is in itself not copyrighted. I got the greenlight to submit a PR to you from my client (but see the above for why I didn't), so if you just implement the same idea, you have the copyright over it.
The idea is simple: don't use bigrams, but just compare two string characters individually using simple indexing to get them.
Well I couldn't use that one because I'd have to cache them in memory and the sheer number of bigrams made Node run out of memory
It’s hard for me to evaluate this statement because you have tried some code out and I have no clue what that code was.
and this actually has basically the same performance, and you don't mind breaking compatibility with a new major version bump, then yeah I'd recommend merging my changes.
When they both have the same performance, why not stick with the existing code?
Using pre-cached bigrams didn't work for me because there were so many that they overwhelmed Node's memory, and the code I linked you to has a performance increase instead.
So I should clarify that I'm not sure that my index-based modification of your library is as fast as just pre-caching bigrams, since I wasn't able to complete my test using pre-cached bigrams. But I did cut the runtime in half by ~50% when running on I think millions (billions?) of strings. It was a non-trivial test, and the results are repeatable, so I'm pretty confident in it.
Let me know if you're not going to use the index-based idea, and I'll create a new lib for it.
from dice-coefficient.
Oops my old link broke. Here's an updated one:
from dice-coefficient.
My client says we're happy to sign a CAA and send a PR.
from dice-coefficient.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dice-coefficient.