Giter Club home page Giter Club logo

Comments (14)

root-11 avatar root-11 commented on June 9, 2024

TODOs:

  • config.page_size=1_000_000

  • config.workdir=/tmp/tablite/

  • config.disk_limit="10G"

  • mkdir at first write. NOT before.

  • central write_table function

    • create table yaml
    • write to /tables
    • create page index yaml
    • write to /page_index
  • central write_page function

    • write to /pages
    • store data in .npy format
  • rework Table.save to write .npz

  • rework Table.load to read .npz

  • rework MP functions to use path instead of h5path

from tablite.

root-11 avatar root-11 commented on June 9, 2024

Segment join, lookup into paginated operations to avoid OOMError.

from tablite.

root-11 avatar root-11 commented on June 9, 2024

Preferred approach for subclassing tables...

class MyTable(tablite.Table):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.x = kwargs.get("x", 42)  # <== special variable required on MyTable.

    def copy(self):
        # tablite.Table implements
        # new = cls();   # self.x is now 42 !!!
        # for name,column in self.items():
        #     new[name] = column

        # MyTable therefore implements:
        cp = super(MyTable, self).copy()
        cp.x = self.x  # updating to real x.
        return cp

from tablite.

root-11 avatar root-11 commented on June 9, 2024

To assure friendly import and override structure, tablite will use the following class hierarchy

import
level
.py module
0 version
config
utils
datatypes
datasets
1 base - holds classes: Table, Column, Page
2 core - core functions: import, export, join, lookup, filter, ... (mp stuff)
3 tools

functions in core will include the option to drop in a modified tqdm if required.

    from base import Table as BaseTable	
    class Table(BaseTable):
        ....

    Table.import_file(...., add_your_favorite_tqdm_here...)

from tablite.

root-11 avatar root-11 commented on June 9, 2024

In the refactor we've decided to remove the slow operations of pop and remove, but will keep remove_all

from tablite.

root-11 avatar root-11 commented on June 9, 2024

Replace is refactored from using a single value to a mapping:
replaces values using mapping

    example:
    >>> t = Table(columns={'A': [1,2,3,4]})
    >>> t['A'].replace({2:20,4:40})
    >>> t[:]
    np.ndarray([1,20,3,40])

from tablite.

root-11 avatar root-11 commented on June 9, 2024

Column.insert is being removed as it encourages the user to use slow operations.
It is better to perform the data manipulation in memory and drop the result into the column using col.extend(....) or col[a:b] = [result].

from tablite.

root-11 avatar root-11 commented on June 9, 2024

deprecated

  • copy_to_clipboard and copy_from_clipboard as pyperclip doesn't seem to be maintained.
  • class method from_dict as Table(columns=dict) now is supported.

from tablite.

root-11 avatar root-11 commented on June 9, 2024

Column.histogram() will now return a dict with {key1: count1, key2:count2,...} instead of two lists people then have to zip into a dict anyway.

from tablite.

root-11 avatar root-11 commented on June 9, 2024

deprecated table functions:

  • reload_saved_tables. Use t.save(path) and Table.load(path)
  • reset_storage. It's in temp.
  • from_dict

deprecated column functions:

  • to_numpy. Default table['name'] returns a numpy array. User should call table['name'].tolist() to get python lists (up to 6x slower)

Everything else remains.

from tablite.

root-11 avatar root-11 commented on June 9, 2024

staticmethod Table.head has been moved to tablite.tools where it should belong.

from tablite.

root-11 avatar root-11 commented on June 9, 2024
    As Table now accepts the keyword `columns` as a dict:
        t = Table(columns={'b':[4,5,6], 'c':[7,8,9]})
    and the header/data combinations:
        t = Table(header=['b','c'], data=[[4,5,6],[7,8,9]])

    it is no longer necessary to write:
        t = Table
        t['b'] = [4,5,6]
        t['c'] = [7,8,9]

    and the following assignment method is DEPRECATED:

        t = Table()
        t[('b','c')] = [ [4,5,6], [7,8,9] ]
        Which then produced the table with two columns:
        t['b'] == [4,5,6]
        t['c'] == [7,8,9]

from tablite.

root-11 avatar root-11 commented on June 9, 2024

missed feature: tablite must cache datatype on pages, so that type determination is near instant.

from tablite.

root-11 avatar root-11 commented on June 9, 2024

Complete in versions : 2023.6

from tablite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.