Comments (34)
from rgf.
Sounds good to me. Thanks.
from rgf.
There is quite a workload. Please feel free to assign me any task.
from rgf.
Your examples use submodules for the core components too: LGB uses compute module for GPU version and XGB uses cub, dmlc-core, nccl, rabit.
They are not a library intended only for LGB or XGB, they are more general purpose library.
And they didn't commit to nccl or other library. Basically it is more important who will maintain in the future than the past who developed it. Besides, Tong is in the RGF-team.
Imagine the situation: someone want to develop, let say, Java wrapper. What should he do?
XGB holding CLI and Python and R. We can hold all language wrapper in the same place.
Of course, we need to rename this project to RGF (not -python) or other appropriate name. And also we may have to change directory structure.
from rgf.
XGB holding CLI and Python and R. We can hold all language wrapper in the same place.
Of course, we need to rename this project to RGF (not -python) or other appropriate name. And also we may have to change directory structure.
Yeah, you're right! It's two main reasons why I'm against keeping all code in one repo. Renaming repo will break all links from old articles/notes/etc. GitHub won't redirect from old URL, like it does in case of transferring the ownership š. I'm sure, that reorganizing the repo will take much efforts and won't be easy process. But without the organizing the main accent is at the Python-wrapper, which is unacceptable is case of not having separate repos for C++ code. Also we need Rie's and Tong's approvals for doing this.
Could you please provide exact reasons, except the absence of CI tests at this moment, why you think that separate repos are bad idea?
P.S. At this moment the most likely variant is that we'll need to fork fast_rgf repo because Tong said, "I think the code officially belongs to Baidu, so it will be better to do a link instead of transfer. Iād try to look into this later on."
from rgf.
Renaming repo will break all links from old articles/notes/etc
We can resolve it as https://github.com/tensorflow/skflow
Also we need Rie's and Tong's approvals for doing this.
Basically, we can redistribute RGF under its license in whatever form. And actually we actually doing it.
Could you please provide exact reasons, except the absence of CI tests at this moment, why you think that separate repos are bad idea?
I think our issue is purely a matter of the time cost, nothing else.
If so, first of all, we should list the costs and compare.
And we had better to distinguish initial cost and running cost.
At the moment, I estimated that the time cost is less if we integrate to one repository.
from rgf.
We can resolve it as https://github.com/tensorflow/skflow
OK, agree.
Basically, we can redistribute RGF under its license in whatever form. And actually we actually doing it.
We should create a conditions when the development will be comfortable for all RGF-team members, don't we? I mean, a person, who altering C++ code shouldn't care about any things about Python code.
Also, don't forget about that we actually have two separate projects: RGF and FastRGF. Maybe someone want to use only one of them. So, we should create easy access to each project for such users.
Please be more concrete while talking about time cost. I don't completely understand you.
Anyway, I think I have some thoughts about the united repo's structure... I'll be back with them soon.
from rgf.
Sorry for late response.
Recently, I concentrate on studying RGF C++.
Basically, If we will integrate two projects, we change directory names, create a new readme for top, and transfer the current readme to python-package. I guess this is all, isn't it?
from rgf.
... bring two folders rgf
and fast_rgf
to the top of the repo, correct readmes in them, rework python-package installation according to new structure. That's few things off the top of my head now.
from rgf.
Hi @fukatani ! How u doing? I see you're in progress of bringing RGF to LightGBM - cool! š
Sorry for being quiet so long - was very busy last two months. Do you have some time to continue the discussion about the future repo structure?
from rgf.
@fukatani Assuming from your thumbs up
in neighboring issue that you don't mind, let me show you my plan š
Plan:
- sync
rgf
andrgf_python
repos (done in #178) - delete
rgf
repo - rename the repo from
rgf_python
torgf
-
createdone automaticallyrgf_python
repo and place redirecting stub there (optional, GitHub seems to be able do this for us) - fix Appveyor is built at old URL
- create a PR in
baidu
repo with notes about the new address of liveFastRGF
repo (done in baidu/fast_rgf#14) - change repo structure (done in #191, #192, #193)
- rewrite
ISSUE_TEMPLATE
(done in #191) - update all docs according to the new repo structure (done in #191, #192, #193, #196)
- update CI tests according to the new repo structure (done in #191, #193)
- update
setup.py
according to the new repo structure (done in #192, #193) - update Kaggle kernel (not connected with the issue, but let it be here as a reminder š, done in Kaggle/docker-python#386)
- new release (I think
3.2.0
3.3.0
is OK because there were no releases for a long time, but in contrast there are no so many changes for major version bumping) (done in 194806a)
from rgf.
@fukatani What do you think about the plan? Do we need to add something?
from rgf.
Why sync rgf and rgf_python repos is needed?
Simply delete rgf is not good?
And before starting this plan, should we publish new release?
from rgf.
Why sync rgf and rgf_python repos is needed?
For example, rgf is under the MIT license now, while here it's still GNU one.
And before starting this plan, should we publish new release?
I support your idea! Especially, due to #172. Could you please do it? Just wait till I'll upload the latest binary files for linux32 (I'll ping you in an hour).
from rgf.
What about the following structure (XGBoost/LightGBM-like)?
rgf/
āāā RGF/ (formerly rgf_python/include/rgf)
ā āāā README.md
ā āāā LICENSE
ā āāā src/
ā āāā ...
āāā FastRGF/ (formerly rgf_python/include/fast_rgf or baidu repo)
ā āāā README.md
ā āāā LICENSE
ā āāā src/
ā āāā ...
āāā python_package/(formerly rgf_python repo)
ā āāā Readme.rst
ā āāā LICENSE
ā āāā setup.py
ā āāā rgf/
ā āāā examples/
ā āāā tests/
ā āāā ...
āāā R_package/
ā āāā ...
āāā other_package/
ā āāā ...
āāā .github
āāā .gitignore
āāā README.md
āāā .appveyor.yml
āāā .travis.yml
āāā ...
from rgf.
I uploaded the newest RGF to PYPI just now.
I agree with your plan and directory structure. Good!
from rgf.
I uploaded the newest RGF to PYPI just now.
Thanks!
I agree with your plan and directory structure. Good!
Nice to hear! So now let's bring here other teammates.
from rgf.
Hello @TongZhang-ML and @riejohnson !
We with @fukatani want to consolidate all RGF projects under the one license(already done) and one roof. It'll help users easier find different implementations and language-specific packages.
The plan is here and the future GitHub repo structure is here.
The most complicated step in the plan is about fast_rgf
repo owned by Baidu. We've already discussed this with @TongZhang-ML via email but without any practice result. I think that providing a link with text like "The active development of FastRGF is maintained now here [link]" in Baidu's repo will not hurt any rights.
We need your approval.
from rgf.
from rgf.
@TongZhang-ML Yes. Somewhere around here in the Baidu's repo:
from rgf.
@TongZhang-ML @riejohnson Thank you!
BTW, is there any place where editable source file of the rgf 1.2 User Guide could have been saved?
from rgf.
BTW, is there any place where editable source file of the rgf 1.2 User Guide could have been saved?
It's a tex file with private comments etc. (meaning not meant for going public). Does this mean you'd like to change it?
from rgf.
@riejohnson It's a pity. I want to replace pdf file with some easy editable one in the repo. For instance, right now I'd like to change the version, paths, add information about new feature_importance
and dump_model
functionality introduced in #161. Generally, it's better to have editable for everyone file at GitHub, I suppose...
from rgf.
I want to replace pdf file with some easy editable one in the repo.
If you'd like to rewrite the documentation in a suitable format, I can give you the original tex file (privately), assuming that having it is better than starting from scratch. But if that's too much work, how about making new documentation only for the new functions?
from rgf.
Speaking of enhancements, I wondered if the C++ source files (of rgf 1.2) you people @StrikerRUS @fukatani have changed for adding new functions should be marked as modified (by whom and when), at the beginning of the files, shouldn't they? At least, that's what the GNU GPL says though I know it's been switched to MIT.
from rgf.
Yeah, you're right about the changes tracking!
@fukatani Could you please mark all files of RGF you've modified (I can help you detect them, if you wish), as @riejohnson said?
from rgf.
@riejohnson If you don't mind, I can rewrite the whole pdf file to md or maybe better to rst (both of them can be rendered by GitHub) format, because we're not in a hurry. Of course, having tex file it'll be easier and faster š .
from rgf.
I admit out understanding about GPL3 was insufficient and the past state was undesirable.
But is it really meaningful to mark from now?
What important is to comply with MIT.
And don't forget, for a while, rgf_python
is under GPL3, so this is not limited to C++ code.
from rgf.
Regarding the C++ files, what I'm suggesting is to add something like:
Change history
01/29/2018 (Fukatani): "feature importance" was added.
03/02/2018 (Fukatani): "dump file" was added.
... because people who added values to source code should get credit (and take responsibility) for it. For example, if you look at AzTETmain.cpp, it looks as if I wrote it entirely in 2011-2012, and that's not true -- you added new functions recently. It seems to me that there are several files that definitely deserve "change history" (e.g., where you added a class variable "gain"). I'm not sure about stylistic changes, but why not if it's not too much trouble.
As for GPL, here is what is said at https://www.gnu.org/licenses/gpl-3.0.en.html .
"For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions."
And I think this is a good practice regardless of the type of license.
from rgf.
My opinion is that changes before 03/08/2018 (the moment when GPLv3 -> MIT) should be documented as Rie suggests.
And don't forget, for a while, rgf_python is under GPL3, so this is not limited to C++ code.
To speak about the rgf_python
, when it was under the GPLv3 license too, we can omit tracking changes by our agreement (only we with @fukatani committed), because there were no information about the license inside source files at all.
from rgf.
because people who added values to source code should get credit (and take responsibility) for it
I agree with this principle.
Since we do version management by git, I think that it is meaningless to manually manage the similar information on the source code manually.
Like LightGBM or other many OSS, copyright notation had better to be removed from source code to make maintenance easier?
from rgf.
You should not remove copyright notices that are already in the source code unless you are the sole copyright holder. As a reminder, I'm a copyright holder of the older C++ rgf code.
Actually, not everyone lives in the GitHub world, and once people download files from GitHub, they are just a bunch of files separated from the GitHub meta data like who committed what and when.
Having said that, if you don't want to add to the derivatives of my C++ code "Change history" for your enhancements, I'm fine with it. It's not that important.
And of course, what to do with other code whose copyright belongs to you is up to you.
from rgf.
Thanks @riejohnson
I understood your intention. If that is the case, I will do my best for C++ change history.
from rgf.
I understood your intention. If that is the case, I will do my best for C++ change history.
Ah okay. In that case, I think you can just focus on the files where your creativity was added for new functions, ignoring stylistic changes (e.g., including a loop variable into the for statement or removing unused variables, etc.). Then, it wouldn't be too much trouble, and I suppose you're going to change some of those source files for issue#183 anyway. When you do that, can you also change the copyright notice in those files? For example:
old: * Copyright (C) 2011, 2012 Rie Johnson
new: * Copyright (C) 2011, 2012 Rie Johnson, 2018 RGF team
I tentatively used "RGF team", and that should be something you people decide; it should be consistent with what you say in the "COPYING" file (please also see my latest comments on COPYING in my review of the new README).
I totally agree with you @fukatani that this copyright thing is cumbersome, but since the notice is already in the source code, I think this is the easiest way to be consistent. Thank you for doing this.
from rgf.
Related Issues (20)
- Collecting nice RGF blog post and kaggle solution. HOT 8
- libcurl handshake error for codecov.io HOT 4
- [R-package] Split classes into their own files HOT 3
- [R-package] Use keyword arguments with internal constructor calls HOT 5
- [R-package] Roxygen documentation should use shared parameters HOT 6
- Can't use rgf on kaggle HOT 8
- [R-package] R package is not currently tested on Windows HOT 1
- [R-package] increase the code coverage of R-package HOT 16
- New release HOT 16
- Fix known issue description
- Add RGF to CRAN Task View on Machine Learning? HOT 5
- FastRGFClassifier BUG HOT 5
- FastRgf doesn't compile HOT 7
- excuse me,but could someone tell me that after I use "joblib.dump(clf,'fastrgf_model.pkl')",how could I load model back?? HOT 3
- FastRGF executables not found on Windows HOT 4
- Problem installing FastRGF HOT 1
- Running RGF from R cmd HOT 2
- FastRGF estimators are unavailable for usage. HOT 1
- Feature importance (permutation or shapley values)
- When loading model artifact, it couldn't find the tmp folder HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rgf.