Giter Club home page Giter Club logo

Comments (34)

TongZhang-ML avatar TongZhang-ML commented on May 22, 2024 1

from rgf.

riejohnson avatar riejohnson commented on May 22, 2024 1

Sounds good to me. Thanks.

from rgf.

fukatani avatar fukatani commented on May 22, 2024 1

There is quite a workload. Please feel free to assign me any task.

from rgf.

fukatani avatar fukatani commented on May 22, 2024

Your examples use submodules for the core components too: LGB uses compute module for GPU version and XGB uses cub, dmlc-core, nccl, rabit.

They are not a library intended only for LGB or XGB, they are more general purpose library.
And they didn't commit to nccl or other library. Basically it is more important who will maintain in the future than the past who developed it. Besides, Tong is in the RGF-team.

Imagine the situation: someone want to develop, let say, Java wrapper. What should he do?

XGB holding CLI and Python and R. We can hold all language wrapper in the same place.
Of course, we need to rename this project to RGF (not -python) or other appropriate name. And also we may have to change directory structure.

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

XGB holding CLI and Python and R. We can hold all language wrapper in the same place.
Of course, we need to rename this project to RGF (not -python) or other appropriate name. And also we may have to change directory structure.

Yeah, you're right! It's two main reasons why I'm against keeping all code in one repo. Renaming repo will break all links from old articles/notes/etc. GitHub won't redirect from old URL, like it does in case of transferring the ownership šŸ™. I'm sure, that reorganizing the repo will take much efforts and won't be easy process. But without the organizing the main accent is at the Python-wrapper, which is unacceptable is case of not having separate repos for C++ code. Also we need Rie's and Tong's approvals for doing this.

Could you please provide exact reasons, except the absence of CI tests at this moment, why you think that separate repos are bad idea?

P.S. At this moment the most likely variant is that we'll need to fork fast_rgf repo because Tong said, "I think the code officially belongs to Baidu, so it will be better to do a link instead of transfer. Iā€™d try to look into this later on."

from rgf.

fukatani avatar fukatani commented on May 22, 2024

Renaming repo will break all links from old articles/notes/etc

We can resolve it as https://github.com/tensorflow/skflow

Also we need Rie's and Tong's approvals for doing this.

Basically, we can redistribute RGF under its license in whatever form. And actually we actually doing it.

Could you please provide exact reasons, except the absence of CI tests at this moment, why you think that separate repos are bad idea?

I think our issue is purely a matter of the time cost, nothing else.
If so, first of all, we should list the costs and compare.
And we had better to distinguish initial cost and running cost.
At the moment, I estimated that the time cost is less if we integrate to one repository.

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

We can resolve it as https://github.com/tensorflow/skflow

OK, agree.

Basically, we can redistribute RGF under its license in whatever form. And actually we actually doing it.

We should create a conditions when the development will be comfortable for all RGF-team members, don't we? I mean, a person, who altering C++ code shouldn't care about any things about Python code.

Also, don't forget about that we actually have two separate projects: RGF and FastRGF. Maybe someone want to use only one of them. So, we should create easy access to each project for such users.

Please be more concrete while talking about time cost. I don't completely understand you.

Anyway, I think I have some thoughts about the united repo's structure... I'll be back with them soon.

from rgf.

fukatani avatar fukatani commented on May 22, 2024

Sorry for late response.
Recently, I concentrate on studying RGF C++.

Basically, If we will integrate two projects, we change directory names, create a new readme for top, and transfer the current readme to python-package. I guess this is all, isn't it?

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

... bring two folders rgf and fast_rgf to the top of the repo, correct readmes in them, rework python-package installation according to new structure. That's few things off the top of my head now.

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

Hi @fukatani ! How u doing? I see you're in progress of bringing RGF to LightGBM - cool! šŸ‘

Sorry for being quiet so long - was very busy last two months. Do you have some time to continue the discussion about the future repo structure?

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

@fukatani Assuming from your thumbs up in neighboring issue that you don't mind, let me show you my plan šŸ˜„

image

Plan:

  • sync rgf and rgf_python repos (done in #178)
  • delete rgf repo
  • rename the repo from rgf_python to rgf
  • create rgf_python repo and place redirecting stub there (optional, GitHub seems to be able do this for us) done automatically
  • fix Appveyor is built at old URL
  • create a PR in baidu repo with notes about the new address of live FastRGF repo (done in baidu/fast_rgf#14)
  • change repo structure (done in #191, #192, #193)
  • rewrite ISSUE_TEMPLATE (done in #191)
  • update all docs according to the new repo structure (done in #191, #192, #193, #196)
  • update CI tests according to the new repo structure (done in #191, #193)
  • update setup.py according to the new repo structure (done in #192, #193)
  • update Kaggle kernel (not connected with the issue, but let it be here as a reminder šŸ˜„, done in Kaggle/docker-python#386)
  • new release (I think 3.2.0 3.3.0 is OK because there were no releases for a long time, but in contrast there are no so many changes for major version bumping) (done in 194806a)

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

@fukatani What do you think about the plan? Do we need to add something?

from rgf.

fukatani avatar fukatani commented on May 22, 2024

Why sync rgf and rgf_python repos is needed?
Simply delete rgf is not good?

And before starting this plan, should we publish new release?

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

Why sync rgf and rgf_python repos is needed?

For example, rgf is under the MIT license now, while here it's still GNU one.

And before starting this plan, should we publish new release?

I support your idea! Especially, due to #172. Could you please do it? Just wait till I'll upload the latest binary files for linux32 (I'll ping you in an hour).

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

What about the following structure (XGBoost/LightGBM-like)?

rgf/
ā”œā”€ā”€ RGF/ (formerly rgf_python/include/rgf)
ā”‚   ā”œā”€ā”€ README.md
ā”‚   ā”œā”€ā”€ LICENSE
ā”‚   ā”œā”€ā”€ src/
ā”‚   ā””ā”€ā”€ ...
ā”œā”€ā”€ FastRGF/ (formerly rgf_python/include/fast_rgf or baidu repo)
ā”‚   ā”œā”€ā”€ README.md
ā”‚   ā”œā”€ā”€ LICENSE
ā”‚   ā”œā”€ā”€ src/
ā”‚   ā””ā”€ā”€ ...
ā”œā”€ā”€ python_package/(formerly rgf_python repo)
ā”‚   ā”œā”€ā”€ Readme.rst
ā”‚   ā”œā”€ā”€ LICENSE
ā”‚   ā”œā”€ā”€ setup.py
ā”‚   ā”œā”€ā”€ rgf/
ā”‚   ā”œā”€ā”€ examples/
ā”‚   ā”œā”€ā”€ tests/
ā”‚   ā””ā”€ā”€ ...
ā”œā”€ā”€ R_package/
ā”‚   ā””ā”€ā”€ ...
ā”œā”€ā”€ other_package/
ā”‚   ā””ā”€ā”€ ...
ā”œā”€ā”€ .github
ā”œā”€ā”€ .gitignore
ā”œā”€ā”€ README.md
ā”œā”€ā”€ .appveyor.yml
ā”œā”€ā”€ .travis.yml
ā””ā”€ā”€ ...

from rgf.

fukatani avatar fukatani commented on May 22, 2024

I uploaded the newest RGF to PYPI just now.

I agree with your plan and directory structure. Good!

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

I uploaded the newest RGF to PYPI just now.

Thanks!

I agree with your plan and directory structure. Good!

Nice to hear! So now let's bring here other teammates.

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

Hello @TongZhang-ML and @riejohnson !

We with @fukatani want to consolidate all RGF projects under the one license(already done) and one roof. It'll help users easier find different implementations and language-specific packages.

The plan is here and the future GitHub repo structure is here.

The most complicated step in the plan is about fast_rgf repo owned by Baidu. We've already discussed this with @TongZhang-ML via email but without any practice result. I think that providing a link with text like "The active development of FastRGF is maintained now here [link]" in Baidu's repo will not hurt any rights.

We need your approval.

from rgf.

TongZhang-ML avatar TongZhang-ML commented on May 22, 2024

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

@TongZhang-ML Yes. Somewhere around here in the Baidu's repo:

fastrgf readme

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

@TongZhang-ML @riejohnson Thank you!

BTW, is there any place where editable source file of the rgf 1.2 User Guide could have been saved?

from rgf.

riejohnson avatar riejohnson commented on May 22, 2024

BTW, is there any place where editable source file of the rgf 1.2 User Guide could have been saved?

It's a tex file with private comments etc. (meaning not meant for going public). Does this mean you'd like to change it?

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

@riejohnson It's a pity. I want to replace pdf file with some easy editable one in the repo. For instance, right now I'd like to change the version, paths, add information about new feature_importance and dump_model functionality introduced in #161. Generally, it's better to have editable for everyone file at GitHub, I suppose...

from rgf.

riejohnson avatar riejohnson commented on May 22, 2024

I want to replace pdf file with some easy editable one in the repo.

If you'd like to rewrite the documentation in a suitable format, I can give you the original tex file (privately), assuming that having it is better than starting from scratch. But if that's too much work, how about making new documentation only for the new functions?

from rgf.

riejohnson avatar riejohnson commented on May 22, 2024

Speaking of enhancements, I wondered if the C++ source files (of rgf 1.2) you people @StrikerRUS @fukatani have changed for adding new functions should be marked as modified (by whom and when), at the beginning of the files, shouldn't they? At least, that's what the GNU GPL says though I know it's been switched to MIT.

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

Yeah, you're right about the changes tracking!

@fukatani Could you please mark all files of RGF you've modified (I can help you detect them, if you wish), as @riejohnson said?

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

@riejohnson If you don't mind, I can rewrite the whole pdf file to md or maybe better to rst (both of them can be rendered by GitHub) format, because we're not in a hurry. Of course, having tex file it'll be easier and faster šŸ˜„ .

from rgf.

fukatani avatar fukatani commented on May 22, 2024

I admit out understanding about GPL3 was insufficient and the past state was undesirable.

But is it really meaningful to mark from now?
What important is to comply with MIT.

And don't forget, for a while, rgf_python is under GPL3, so this is not limited to C++ code.

from rgf.

riejohnson avatar riejohnson commented on May 22, 2024

Regarding the C++ files, what I'm suggesting is to add something like:

Change history
01/29/2018 (Fukatani): "feature importance" was added.
03/02/2018 (Fukatani): "dump file" was added.

... because people who added values to source code should get credit (and take responsibility) for it. For example, if you look at AzTETmain.cpp, it looks as if I wrote it entirely in 2011-2012, and that's not true -- you added new functions recently. It seems to me that there are several files that definitely deserve "change history" (e.g., where you added a class variable "gain"). I'm not sure about stylistic changes, but why not if it's not too much trouble.

As for GPL, here is what is said at https://www.gnu.org/licenses/gpl-3.0.en.html .
"For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions."

And I think this is a good practice regardless of the type of license.

from rgf.

StrikerRUS avatar StrikerRUS commented on May 22, 2024

My opinion is that changes before 03/08/2018 (the moment when GPLv3 -> MIT) should be documented as Rie suggests.

And don't forget, for a while, rgf_python is under GPL3, so this is not limited to C++ code.

To speak about the rgf_python, when it was under the GPLv3 license too, we can omit tracking changes by our agreement (only we with @fukatani committed), because there were no information about the license inside source files at all.

from rgf.

fukatani avatar fukatani commented on May 22, 2024

because people who added values to source code should get credit (and take responsibility) for it

I agree with this principle.

Since we do version management by git, I think that it is meaningless to manually manage the similar information on the source code manually.
Like LightGBM or other many OSS, copyright notation had better to be removed from source code to make maintenance easier?

from rgf.

riejohnson avatar riejohnson commented on May 22, 2024

You should not remove copyright notices that are already in the source code unless you are the sole copyright holder. As a reminder, I'm a copyright holder of the older C++ rgf code.

Actually, not everyone lives in the GitHub world, and once people download files from GitHub, they are just a bunch of files separated from the GitHub meta data like who committed what and when.

Having said that, if you don't want to add to the derivatives of my C++ code "Change history" for your enhancements, I'm fine with it. It's not that important.

And of course, what to do with other code whose copyright belongs to you is up to you.

from rgf.

fukatani avatar fukatani commented on May 22, 2024

Thanks @riejohnson
I understood your intention. If that is the case, I will do my best for C++ change history.

from rgf.

riejohnson avatar riejohnson commented on May 22, 2024

I understood your intention. If that is the case, I will do my best for C++ change history.

Ah okay. In that case, I think you can just focus on the files where your creativity was added for new functions, ignoring stylistic changes (e.g., including a loop variable into the for statement or removing unused variables, etc.). Then, it wouldn't be too much trouble, and I suppose you're going to change some of those source files for issue#183 anyway. When you do that, can you also change the copyright notice in those files? For example:

old: * Copyright (C) 2011, 2012 Rie Johnson
new: * Copyright (C) 2011, 2012 Rie Johnson, 2018 RGF team

I tentatively used "RGF team", and that should be something you people decide; it should be consistent with what you say in the "COPYING" file (please also see my latest comments on COPYING in my review of the new README).

I totally agree with you @fukatani that this copyright thing is cumbersome, but since the notice is already in the source code, I think this is the easiest way to be consistent. Thank you for doing this.

from rgf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.