Problems: A report made clear that the HDF5 size does not decr

Preferred approach for subclassing tables... <div class="highlight highlight-sourc

In the refactor we've decided to remove the slow operations of <code class="notranslat

deprecated table functions: reload_saved_tables. Use t.save(pa

Giter Club

Light

HDF5 file size never decreases + concurrent interpreters can overwrite each others files. about tablite HOT 14 CLOSED

root-11 commented on June 9, 2024

HDF5 file size never decreases + concurrent interpreters can overwrite each others files.

from tablite.

Comments (14)

root-11 commented on June 9, 2024

TODOs:

from tablite.

root-11 commented on June 9, 2024

Segment join, lookup into paginated operations to avoid OOMError.

from tablite.

root-11 commented on June 9, 2024

Preferred approach for subclassing tables...

class MyTable(tablite.Table):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.x = kwargs.get("x", 42)  # <== special variable required on MyTable.

    def copy(self):
        # tablite.Table implements
        # new = cls();   # self.x is now 42 !!!
        # for name,column in self.items():
        #     new[name] = column

        # MyTable therefore implements:
        cp = super(MyTable, self).copy()
        cp.x = self.x  # updating to real x.
        return cp

from tablite.

root-11 commented on June 9, 2024

To assure friendly import and override structure, tablite will use the following class hierarchy

import level	.py module
0	version config utils datatypes datasets
1	base - holds classes: Table, Column, Page
2	core - core functions: import, export, join, lookup, filter, ... (mp stuff)
3	tools

functions in core will include the option to drop in a modified tqdm if required.

    from base import Table as BaseTable	
    class Table(BaseTable):
        ....

    Table.import_file(...., add_your_favorite_tqdm_here...)

from tablite.

root-11 commented on June 9, 2024

In the refactor we've decided to remove the slow operations of pop and remove, but will keep remove_all

from tablite.

root-11 commented on June 9, 2024

Replace is refactored from using a single value to a mapping:
replaces values using mapping

    example:
    >>> t = Table(columns={'A': [1,2,3,4]})
    >>> t['A'].replace({2:20,4:40})
    >>> t[:]
    np.ndarray([1,20,3,40])

from tablite.

root-11 commented on June 9, 2024

Column.insert is being removed as it encourages the user to use slow operations.
It is better to perform the data manipulation in memory and drop the result into the column using col.extend(....) or col[a:b] = [result].

from tablite.

root-11 commented on June 9, 2024

deprecated

copy_to_clipboard and copy_from_clipboard as pyperclip doesn't seem to be maintained.
class method from_dict as Table(columns=dict) now is supported.

from tablite.

root-11 commented on June 9, 2024

Column.histogram() will now return a dict with {key1: count1, key2:count2,...} instead of two lists people then have to zip into a dict anyway.

from tablite.

root-11 commented on June 9, 2024

deprecated table functions:

reload_saved_tables. Use t.save(path) and Table.load(path)
reset_storage. It's in temp.
from_dict

deprecated column functions:

to_numpy. Default table['name'] returns a numpy array. User should call table['name'].tolist() to get python lists (up to 6x slower)

Everything else remains.

from tablite.

root-11 commented on June 9, 2024

staticmethod Table.head has been moved to tablite.tools where it should belong.

from tablite.

root-11 commented on June 9, 2024

    As Table now accepts the keyword `columns` as a dict:
        t = Table(columns={'b':[4,5,6], 'c':[7,8,9]})
    and the header/data combinations:
        t = Table(header=['b','c'], data=[[4,5,6],[7,8,9]])

    it is no longer necessary to write:
        t = Table
        t['b'] = [4,5,6]
        t['c'] = [7,8,9]

    and the following assignment method is DEPRECATED:

        t = Table()
        t[('b','c')] = [ [4,5,6], [7,8,9] ]
        Which then produced the table with two columns:
        t['b'] == [4,5,6]
        t['c'] == [7,8,9]

from tablite.

root-11 commented on June 9, 2024

missed feature: tablite must cache datatype on pages, so that type determination is near instant.

from tablite.

root-11 commented on June 9, 2024

Complete in versions : 2023.6

from tablite.

Related Issues (20)

Join (reindexing) fails when table spans multiple pages HOT 2
Documentation is out of sync HOT 1
Determine method to handle out-of-memory for large joins. HOT 1
Proposed format specification HOT 1
multi proc groupby HOT 1
multi proc join HOT 3
Add warning in add_rows that is the slowest method HOT 1
Deprecating support for python 3.8 in favor of type hints throughout the code HOT 1
Columns with empty names HOT 2
Table.load very slow with dtype('O') HOT 5
Bloat in H5 storage following repeated SIGKILL HOT 3
Statistics discrepancies in median/mode HOT 1
Do Tablite Support different datasets Concurrently ? HOT 6
Addition of match operator HOT 5
sorting problem with datetime dt columns HOT 1
Inconsistent row slice HOT 3
Slow import of files with text escape HOT 16
statistics() fails on time column HOT 2
my first issue

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.