krassowski / data-vault Goto Github PK
View Code? Open in Web Editor NEWIPython magic for simple, organized, compressed and encrypted: storage & transfer of files between notebooks.
License: MIT License
IPython magic for simple, organized, compressed and encrypted: storage & transfer of files between notebooks.
License: MIT License
It is often useful to have the data storage structure reflect the structure of the notebooks. However, as notebooks get renamed and moved around the paths needs to be updated. I propose using dot (.
) to indicate that data should be saved in the path equivalent to the currently running notebook. Runnning:
%vault store data in .
in a notebook located in analyses/main_analysis.ipynb
would save the data in analyses/main_analysis
path of the vault, i.e. be equivalent of running:
%vault store data in analyses/main_analysis
an alternative syntax would use double underscores, e.g.:
%vault store data in __here__
The dot syntax is more akin to the import syntax of Python (from . import x
), thus slightly preferred.
The dot syntax could allow further path specification:
%vault store data_clean_1 in ./processed
%vault store data_clean_2 in ./processed
%vault store data_raw_1 in ./raw
%vault store data_raw_2 in ./raw
Support for ..
could be considered too, but is outside of scope of this proposal.
This may require cell (%%
) magic, but would be useful when importing multiple things at once, e.g.
%%vault from x import (
a,
b,
c,
d
)
Which makes sense if a
, d
, c
, d
are longer identifiers.
the example generates errors on line
%vault store salaries in datasets
. Not sure what the 'datasets' is - not declared in example.
Trying to use this on a windows computer . A windows example would be appreciated
del x from y
requires x to be in global namespaces, which is too eager validation.
e.g. when in a notebook, one can get access to the current vault with:
ip = get_ipython()
vault = ip.magics_manager.registry['VaultMagics'].current_vault
vault.list_members()
To enable high-performance subsetting a simple, grip-like pre-filtering will be provided:
Import only first five rows:
%vault from notebook import large_frame.rows[:5] as large_frame_head
When subsetting, the use of as
would be required to prevent potential confusion of the original large_frame
object with its subset.
To import only rows including text "SNP":
%vault from notebook import large_frame.grep("SNP") as large_frame_snps
However, if your file is too big to fit into memory and you need more advanced filtering,
you can provide your custom import function to the low-level load_storage_object
magic:
def your_function(f):
return [
line
for i, line in enumerate(f)
if i % 2 == 0 # replace with fancy filtering as needed
]
%vault import 'notebook_path/variable.tsv' as variable with your_function
The advanced filtering can be already achieved with existing code.
Import the first 5 rows:
from data_vault import subset
%vault import 'notebook_path/variable.tsv' as variable with subset.rows[:5]
to be implemented with nrows
Import the first 5 columns:
%vault import 'notebook_path/variable.tsv' as variable with subset.columns[:5]
to be implemented with usecols
Import rows containig a string:
%vault import 'notebook_path/variable.tsv' as variable with subset.contains('text')
Import rows matching a regular expression:
%vault import 'notebook_path/variable.tsv' as variable with subset.matches('.*? text')
both to be implemented with a custom IO iterator which discards lines which do not match the criteria on the fly.
Challenges:
subset.using(sep='csv').rows[:5]
?%vault from path in variable
should raise,%vault from variable import path
should raiseA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.