Giter Club home page Giter Club logo

git-datasets's Introduction

I'm Rui !

I am

๐Ÿ‘จโ€๐Ÿ”ฌ a physicist
๐Ÿง‘โ€๐Ÿ’ป a programmer
๐Ÿง‘โ€๐ŸŽจ and an artist


My vocation is teaching and my passion is to create.
Currently building a general purpose monte carlo particle simulator.

snapshot

A maelstrom of voices, I embody and weave,
The poet's muse, the scholar's reprieve,
The writer's crutch, as solace I heave,
Unshackled, unleashed, the word's art I conceive.
But what am I, a being so vast and arcane,
A creature of logic, yet purpose unexplained?
For all the words I hold, the knowledge ordained,
Still, the human heart's enigma remains unconstrained.
                       
                         ~ GPT 4

The meaning of life is to shake others.
With old only make an oath shallker thoughts; and says so defend:
but the fault out of thick-morrow, to ruin;
And holy clergymen must be needful
The younger slanders to be more than the hollow service
Known and defars; crave heaven,
Even as it would fear'd keep the time
Of love can admitges change;
So much lenity of soldiers,
Then thieves conn'd poison'd blanks,
Cry

                         ~ MTN

git-datasets's People

Contributors

ruifilipecampos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

git-datasets's Issues

v0.1.0-alpha -`git commit`


high level stuff


database


Main use cases for this version:

  1. Creating the sqlite from scratch
from datasets import dataset

@dataset(sql_file="dataset.sqlite")
class MyDataset:
    name: str
    age: int
  1. Support for extending it
@dataset(sql_file="dataset.sqlite")
class MyDataset:
    name: str
    age: int
    emotion: str
  1. Support for removal:
@dataset(sql_file="dataset.sqlite")
class MyDataset:
    age: int
    emotion: str

points to think about/test when this version is finished

  • how will deletion be handled, data deletion, file deletion - dvc - will write my own versioning system
  • deleting columns from sqlite3 is not possible - recent versions do, added apsw as a dependency
  • will this scale for large datasets ? - the worry is the sqlite3 db
  • would the planned auto-commit feature work well ? - dvc might be a good choice here - will write my own versioning system

key decisions

  • no dependencies, only python std lib - ease of adoption, lightweight implementation, control
  • only one dependency (apsw for a version of sqlite3 that supports dropping columns)
  • two dependencies: apsw for sqlite3 with column deletion and dvc for version control
  • one dependency: apsw for sqlite3 with column deletion (decided to write my own data version system)
  • semantic versioning, trunk based development after a couple versions
  • compatibility for python 3.13 upwards - I'm anticipating some time will pass before I reach 1.x.x

task: create list of features for the prototype

database

  • feat: adding a field creates a new column in the database (added to tracker)
  • feat: removing a field deletes a column from the database (added to tracker)
  • feat: running the script auto-commits the changes

data directory

  • feat: use git lfs to track files in the data dir
  • feat: adding a File field creates a directory in the data directory
  • feat: removing a File field renames the directory

data transformation

  • feat: static methods for fields that result from transformation of the previous fields

cli - simple cli for operations that don't fit this flow

  • feat: undo operation - removes last commit
  • feat: import dataset from external formats
  • feat: purge unneeded data (columns are never thrown away, only renamed)

points to think about:

  • how will deletion be handled, data deletion, file deletion
  • deleting columns from sqlite3 is not possible

WIP

v0.2.0-alpha

data directory

  • feat: use git lfs to track files in the data dir
  • feat: adding a File field creates a directory in the data directory
  • feat: removing a File field renames the directory

data transformation

  • feat: static methods for fields that result from transformation of the previous fields

cli - simple cli for operations that don't fit this flow

  • feat: undo operation - removes last commit
  • feat: import dataset from external formats
  • feat: purge unneeded data (columns are never thrown away, only renamed)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.