Giter Club home page Giter Club logo

Comments (13)

quinnj avatar quinnj commented on June 16, 2024 3

I agree that it's a shame that after planning/studying things out, you've chosen to not collaborate w/ the rest of the data ecosystem and do something different entirely. I've voiced it elsewhere and I'll voice it again that it's the users who lose the most when multiple, almost identical packages exist that essentially provide the same functionality/purpose. I understand there are fundamental challenges in open-source w/ communication, collaboration, coordination, etc. but I've always loved the Julia community spirit of working together to improve things and work towards common goals. Obviously, there's always room for alternative approaches and experimentation.

In this case specifically, I agree w/ @nalimilan that the potential for confusion will be great among users. Why not create a new package named DataValueTables.jl? That seems ideally precise in that it's a table-like package based on DataValues.

from datatables.jl.

nalimilan avatar nalimilan commented on June 16, 2024 2

Yes, that could probably live in DataValues.jl, though it's obviously up to you. As for the future of DataTables, I think we'll have to discuss that with all involved parties.

from datatables.jl.

quinnj avatar quinnj commented on June 16, 2024 2

Here are a few of my thoughts/questions/concerns:

  • There is a lot of code in DataFrames/DataTables; I think if the aim was just to have a DataValue-compatible table type, then that is a very small amount of code that could just live in DataValues.jl and be exported, no need to rebrand this package.
  • The idea of trying to maintain the full functionality in DataFrames/DataTables and not have users completely confused if they try to switch between sounds well beyond the amount of effort possible in open source (certainly a single maintainer)
  • Just since we switched active development from DataTables back to DataFrames there have been an incredible amount of API cleanup and performance work w/ joins, etc. None of that has been backported to DataTables

It just makes me very uncomfortable that this package would be repurposed into something that would confuse users and probably not be able to keep up in terms of latest and greatest in DataFrames. I think a much more tenable approach would be to take the simple type definitions and basic indexing methods and put them in DataValues.jl to help reinforce the idea that it's an extension of DataValues and it's approach to missing values as opposed to a full-fledged DataFrames-like package.

from datatables.jl.

davidanthoff avatar davidanthoff commented on June 16, 2024 1

We want to do things beyond just storing DataValueVector in a DataFrame. The things we plan to do require DataValue specific code in lots of places where I think it wouldn't make any sense to have it for a table that is primarily Missing based, like DataFrame.

But the DataValue aspect is only one of many things we want to do differently. I think in such a case it is better to just start a new package, rather than me storming into DataFrames.jl and trying to change lots of things in a pretty fundamental way. In particular because I think that would be really harmful to DataFrames.jl, it is a great package and just because we want to do things differently in queryverseland does not imply that it would be a good idea to even consider them for DataFrames.jl as well.

I think at this point it probably is not super helpful to discuss whether Queryverse should have a table type or not. We thought about, planned it out pretty conclusively and from my point of view we are committed to having one. I think it probably makes more sense to keep this discussion on the question whether the DataTables.jl package name can be recycled or not.

from datatables.jl.

davidanthoff avatar davidanthoff commented on June 16, 2024

@nalimilan I also have a a branch on CategoricalArrays that uses DataValue. I almost feel the best way to handle that would be to just move that stuff into DataValues.jl for the time being. The only other option would be to create another package DataValueCateogoricalArrays, but given that all of this might go away after julia 1.0, that seems not so good? Not sure, though.

from datatables.jl.

davidanthoff avatar davidanthoff commented on June 16, 2024

As for the future of DataTables, I think we'll have to discuss that with all involved parties.

Yep, totally agreed. Who should weigh in on that? Maybe we can ping them on this issue?

from datatables.jl.

nalimilan avatar nalimilan commented on June 16, 2024

I'd say anybody interested enough in that area to have subscribed to the #data Slack channel?

from datatables.jl.

davidanthoff avatar davidanthoff commented on June 16, 2024

Bump.

Ideally (from my point of view) we would move this repo over to https://github.com/queryverse and I would take over maintenance of it.

Could we just move ahead with that, given that no one objected for almost a year? I believe I did mention this back then on the data slack channel. Or should I mention it there again and then we'll wait a week to see whether anyone has an opinion about it? I almost suspect that no one cares anymore, given the current state of this package ;)

from datatables.jl.

davidanthoff avatar davidanthoff commented on June 16, 2024

I'd like to revisit this. We plan to make a DataTable type a central piece for the Queryverse story going forward. We will use the name DataTable for the type, so the question now really is only in what package that type will exist.

I don't want to put it into DataValues.jl, for several reasons: 1) the DataValue aspect is not going to be the central point for this going forward. Yes, it will be based on DataValue, but that is only a small part of the plan. 2) There are other table types like IndexedTables.jl and JuliaDB.jl that use DataValue, and it seems quite wrong to sneak a full table type into their dependency graph. The more general point here is that a large number of packages depends on DataValues.jl, and most of them have absolutely no need for a table type, and so I want to keep a package that is low in the stack (like DataValues.jl) lean and not overload it with stuff that is only useful for some users.

In terms of user confusion, I don't see how putting a table type named DataTable into a package Someothername.jl is better than just putting it into DataTables.jl. The general pattern throughout the ecosystem seems to be that a) each table type has its own package, and b) the package name is the plural of the type name of the table type. I'm suggesting that we follow the same pattern here.

Re @quinnj's concerns about maintenance and feature scope: we are three core devs on Queryverse here in Berkeley now, and I expect that this will ramp up significantly going forward. I would also suggest that the track record of Queryverse be taken into account here: it has been around for a long time, it is expansive, and I think it is fair to say that it is pretty well maintained. We plan to handle the DataTable type in the same way, and so I expect it to be as well maintained as the rest of Queryverse.jl.

I think at the core, though, the question here is really: what is the value of keeping a good name like DataTables.jl dormant? Why not reuse it for something useful?

from datatables.jl.

ararslan avatar ararslan commented on June 16, 2024

Why can't you just use DataFrames?

from datatables.jl.

davidanthoff avatar davidanthoff commented on June 16, 2024

We want to do a fair bit of DataValue specific stuff, and my understanding is that DataFrames.jl doesn't want to take that dependency, nor adopt that approach.

from datatables.jl.

ararslan avatar ararslan commented on June 16, 2024

That used to be the case a long time ago, but DataFrames can now hold arbitrary columns. So it could be Vector{DataValue} or whatever.

from datatables.jl.

nalimilan avatar nalimilan commented on June 16, 2024

I'd be interested in hearing what you would like to do differently and why. Let's keep that for another discussion, but I think it's worth having it as we already have too many table types given our limited workforce IMHO (I think developers of other packages would also agree).

But anyway. I'd really not like to see a new package reusing the DataTables name, based on yet another approach from what DataTables experimented at the time it was forked. It's been painful enough to fight the confusion due to the coexistence of DataFrames and DataTables (we still hear complaints about that on forums). Having to explain that there was an "old DataTables" and now a new one, with different concepts and maintained by different people doesn't sound great for clarity. Couldn't you find a more explicit name which reflects the strengths of the new package (like TypedTables or IndexedTables do)?

from datatables.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.