Ammonoid is a Repository server for Datasets Management. It's open source and developing in Python. Currently, at July 2023, it is a spark thought only. Hope the first line code will be checked-in soon.
- Generate hash of dataset versions.
- JSON/JSONL format dataset support.
- git style check-in/out command line operation.
- Show diff between versions.
- check-in/out a dataset in one of raw/tfrecord/zip.
- Union/Intersection/Subtraction/Complement
- Select/Join/Slice