I've been looking for efficient ways to help users search Jupyter notebooks, and stori

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Feature Request: Import Jupyter notebooks about sqlitebiter HOT 7 CLOSED

thombashi commented on May 28, 2024

Feature Request: Import Jupyter notebooks

from sqlitebiter.

Comments (7)

thombashi commented on May 28, 2024 1

@psychemedia
I had implemented Jupyter Notebook support at sqlitebite 0.16.0

For example, https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb will converted into the following:

$ sqlitebiter url "https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb"
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'cells_source' table
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'cells_outputs' table
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'cells_outputs_kv' table
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'cells_kv' table
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'metadata_kernelspec' table
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'metadata_language_info' table
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'metadata_kv' table
[INFO] sqlitebiter url: convert 'raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb' to 'kv' table
[INFO] sqlitebiter url: number of created tables: 8
[INFO] sqlitebiter url: database path: out.sqlite

schema

sqlite> .schema
CREATE TABLE IF NOT EXISTS 'cells_source' (cell_id INTEGER NOT NULL, line_no INTEGER NOT NULL, text TEXT);
CREATE TABLE IF NOT EXISTS 'cells_outputs' (cell_id INTEGER NOT NULL, type TEXT NOT NULL, line_no INTEGER, data BLOB);
CREATE TABLE IF NOT EXISTS 'cells_outputs_kv' (cell_id INTEGER NOT NULL, key TEXT NOT NULL, value TEXT);
CREATE TABLE IF NOT EXISTS 'cells_kv' (cell_id INTEGER NOT NULL, key TEXT NOT NULL, value TEXT);
CREATE TABLE IF NOT EXISTS 'metadata_kernelspec' (key TEXT NOT NULL, value TEXT NOT NULL);
CREATE TABLE IF NOT EXISTS 'metadata_language_info' (key TEXT NOT NULL, value TEXT NOT NULL);
CREATE TABLE IF NOT EXISTS 'metadata_kv' (key TEXT NOT NULL, value TEXT NOT NULL);
CREATE TABLE IF NOT EXISTS 'kv' (key TEXT NOT NULL, value TEXT);

tables

cells_source

cell_id	line_no	text
0	0	# Recurrent Neural Network Example
0	1
0	2	Build a recurrent neural network (LSTM) with TensorFlow.
0	3
0	4	- Author: Aymeric Damien
0	5	- Project: https://github.com/aymericdamien/TensorFlow-Examples/
1	0	## RNN Overview
1	1
1	2
1	3
1	4	References:
1	5	- Long Short Term Memory, Sepp Hochreiter & Jurgen Schmidhuber, Neural Computation 9(8): 1735-1780, 1997.
1	6
1	7	## MNIST Dataset Overview
1	8
1	9	This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).
1	10
1	11
1	12
1	13	To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 timesteps for every sample.
1	14
1	15	More info: http://yann.lecun.com/exdb/mnist/
2	0	from future import print_function
2	1
2	2	import tensorflow as tf
2	3	from tensorflow.contrib import rnn
2	4
2	5	# Import MNIST data
2	6	from tensorflow.examples.tutorials.mnist import input_data
2	7	mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
3	0	# Training Parameters
3	1	learning_rate = 0.001
3	2	training_steps = 10000
3	3	batch_size = 128
3	4	display_step = 200
3	5
3	6	# Network Parameters
3	7	num_input = 28 # MNIST data input (img shape: 28*28)
3	8	timesteps = 28 # timesteps
3	9	num_hidden = 128 # hidden layer num of features
3	10	num_classes = 10 # MNIST total classes (0-9 digits)
3	11
3	12	# tf Graph input
3	13	X = tf.placeholder("float", [None, timesteps, num_input])
3	14	Y = tf.placeholder("float", [None, num_classes])
4	0	# Define weights
4	1	weights = {
4	2	'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
4	3	}
4	4	biases = {
4	5	'out': tf.Variable(tf.random_normal([num_classes]))
4	6	}
5	0	def RNN(x, weights, biases):
5	1
5	2	# Prepare data shape to match `rnn` function requirements
5	3	# Current data input shape: (batch_size, timesteps, n_input)
5	4	# Required shape: 'timesteps' tensors list of shape (batch_size, n_input)
5	5
5	6	# Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
5	7	x = tf.unstack(x, timesteps, 1)
5	8
5	9	# Define a lstm cell with tensorflow
5	10	lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)
5	11
5	12	# Get lstm cell output
5	13	outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
5	14
5	15	# Linear activation, using rnn inner loop last output
5	16	return tf.matmul(outputs[-1], weights['out']) + biases['out']
6	0	logits = RNN(X, weights, biases)
6	1	prediction = tf.nn.softmax(logits)
6	2
6	3	# Define loss and optimizer
6	4	loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
6	5	logits=logits, labels=Y))
6	6	optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
6	7	train_op = optimizer.minimize(loss_op)
6	8
6	9	# Evaluate model (with test logits, for dropout to be disabled)
6	10	correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
6	11	accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
6	12
6	13	# Initialize the variables (i.e. assign their default value)
6	14	init = tf.global_variables_initializer()
7	0	# Start training
7	1	with tf.Session() as sess:
7	2
7	3	# Run the initializer
7	4	sess.run(init)
7	5
7	6	for step in range(1, training_steps+1):
7	7	batch_x, batch_y = mnist.train.next_batch(batch_size)
7	8	# Reshape data to get 28 seq of 28 elements
7	9	batch_x = batch_x.reshape((batch_size, timesteps, num_input))
7	10	# Run optimization op (backprop)
7	11	sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
7	12	if step % display_step == 0 or step == 1:
7	13	# Calculate batch loss and accuracy
7	14	loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
7	15	Y: batch_y})
7	16	print("Step " + str(step) + ", Minibatch Loss= " + \
7	17	"{:.4f}".format(loss) + ", Training Accuracy= " + \
7	18	"{:.3f}".format(acc))
7	19
7	20	print("Optimization Finished!")
7	21
7	22	# Calculate accuracy for 128 mnist test images
7	23	test_len = 128
7	24	test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
7	25	test_label = mnist.test.labels[:test_len]
7	26	print("Testing Accuracy:", \
7	27	sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

cells_outputs

cell_id	type	line_no	data
2	text	0	Extracting /tmp/data/train-images-idx3-ubyte.gz
2	text	1	Extracting /tmp/data/train-labels-idx1-ubyte.gz
2	text	2	Extracting /tmp/data/t10k-images-idx3-ubyte.gz
2	text	3	Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
7	text	0	Step 1, Minibatch Loss= 2.6268, Training Accuracy= 0.102
7	text	1	Step 200, Minibatch Loss= 2.0722, Training Accuracy= 0.328
7	text	2	Step 400, Minibatch Loss= 1.9181, Training Accuracy= 0.336
7	text	3	Step 600, Minibatch Loss= 1.8858, Training Accuracy= 0.336
7	text	4	Step 800, Minibatch Loss= 1.7022, Training Accuracy= 0.422
7	text	5	Step 1000, Minibatch Loss= 1.6365, Training Accuracy= 0.477
7	text	6	Step 1200, Minibatch Loss= 1.6691, Training Accuracy= 0.516
7	text	7	Step 1400, Minibatch Loss= 1.4626, Training Accuracy= 0.547
7	text	8	Step 1600, Minibatch Loss= 1.4707, Training Accuracy= 0.539
7	text	9	Step 1800, Minibatch Loss= 1.4087, Training Accuracy= 0.570
7	text	10	Step 2000, Minibatch Loss= 1.3033, Training Accuracy= 0.570
7	text	11	Step 2200, Minibatch Loss= 1.3773, Training Accuracy= 0.508
7	text	12	Step 2400, Minibatch Loss= 1.3092, Training Accuracy= 0.570
7	text	13	Step 2600, Minibatch Loss= 1.2272, Training Accuracy= 0.609
7	text	14	Step 2800, Minibatch Loss= 1.1827, Training Accuracy= 0.633
7	text	15	Step 3000, Minibatch Loss= 1.0453, Training Accuracy= 0.641
7	text	16	Step 3200, Minibatch Loss= 1.0400, Training Accuracy= 0.648
7	text	17	Step 3400, Minibatch Loss= 1.1145, Training Accuracy= 0.656
7	text	18	Step 3600, Minibatch Loss= 0.9884, Training Accuracy= 0.688
7	text	19	Step 3800, Minibatch Loss= 1.0395, Training Accuracy= 0.703
7	text	20	Step 4000, Minibatch Loss= 1.0096, Training Accuracy= 0.664
7	text	21	Step 4200, Minibatch Loss= 0.8806, Training Accuracy= 0.758
7	text	22	Step 4400, Minibatch Loss= 0.9090, Training Accuracy= 0.766
7	text	23	Step 4600, Minibatch Loss= 1.0060, Training Accuracy= 0.703
7	text	24	Step 4800, Minibatch Loss= 0.8954, Training Accuracy= 0.703
7	text	25	Step 5000, Minibatch Loss= 0.8163, Training Accuracy= 0.750
7	text	26	Step 5200, Minibatch Loss= 0.7620, Training Accuracy= 0.773
7	text	27	Step 5400, Minibatch Loss= 0.7388, Training Accuracy= 0.758
7	text	28	Step 5600, Minibatch Loss= 0.7604, Training Accuracy= 0.695
7	text	29	Step 5800, Minibatch Loss= 0.7459, Training Accuracy= 0.734
7	text	30	Step 6000, Minibatch Loss= 0.7448, Training Accuracy= 0.734
7	text	31	Step 6200, Minibatch Loss= 0.7208, Training Accuracy= 0.773
7	text	32	Step 6400, Minibatch Loss= 0.6557, Training Accuracy= 0.773
7	text	33	Step 6600, Minibatch Loss= 0.8616, Training Accuracy= 0.758
7	text	34	Step 6800, Minibatch Loss= 0.6089, Training Accuracy= 0.773
7	text	35	Step 7000, Minibatch Loss= 0.5020, Training Accuracy= 0.844
7	text	36	Step 7200, Minibatch Loss= 0.5980, Training Accuracy= 0.812
7	text	37	Step 7400, Minibatch Loss= 0.6786, Training Accuracy= 0.766
7	text	38	Step 7600, Minibatch Loss= 0.4891, Training Accuracy= 0.859
7	text	39	Step 7800, Minibatch Loss= 0.7042, Training Accuracy= 0.797
7	text	40	Step 8000, Minibatch Loss= 0.4200, Training Accuracy= 0.859
7	text	41	Step 8200, Minibatch Loss= 0.6442, Training Accuracy= 0.742
7	text	42	Step 8400, Minibatch Loss= 0.5569, Training Accuracy= 0.828
7	text	43	Step 8600, Minibatch Loss= 0.5838, Training Accuracy= 0.836
7	text	44	Step 8800, Minibatch Loss= 0.5579, Training Accuracy= 0.812
7	text	45	Step 9000, Minibatch Loss= 0.4337, Training Accuracy= 0.867
7	text	46	Step 9200, Minibatch Loss= 0.4366, Training Accuracy= 0.844
7	text	47	Step 9400, Minibatch Loss= 0.5051, Training Accuracy= 0.844
7	text	48	Step 9600, Minibatch Loss= 0.5244, Training Accuracy= 0.805
7	text	49	Step 9800, Minibatch Loss= 0.4932, Training Accuracy= 0.805
7	text	50	Step 10000, Minibatch Loss= 0.4833, Training Accuracy= 0.852
7	text	51	Optimization Finished!
7	text	52	Testing Accuracy: 0.882812

cells_outputs_kv

cell_id	key	value
2	name	stdout
2	output_type	stream
7	name	stdout
7	output_type	stream

cells_kv

cell_id	key	value
0	cell_type	markdown
0	metadata	{'collapsed': True}
1	cell_type	markdown
1	metadata
2	cell_type	code
2	execution_count	1
2	metadata	{'collapsed': False}
3	cell_type	code
3	execution_count	2
3	metadata	{'collapsed': False}
4	cell_type	code
4	execution_count	3
4	metadata	{'collapsed': True}
5	cell_type	code
5	execution_count	4
5	metadata	{'collapsed': False}
6	cell_type	code
6	execution_count	5
6	metadata	{'collapsed': True}
7	cell_type	code
7	execution_count	6
7	metadata	{'collapsed': False}
8	cell_type	code
8	execution_count
8	metadata	{'collapsed': True}

metadata_kernelspec

key	value
display_name	Python [default]
language	python
name	python2

metadata_language_info

key	value
codemirror_mode_name	ipython
codemirror_mode_version	2
file_extension	.py
mimetype	text/x-python
name	python
nbconvert_exporter	python
pygments_lexer	ipython2
version	2.7.12

metadata_kv

kv

key	value
nbformat	4
nbformat_minor	0

Any feedback or comments would be appreciated.

from sqlitebiter.

thombashi commented on May 28, 2024

Thank you for your request.

That's an interesting feature. I will consider the feature for the future release.

from sqlitebiter.

psychemedia commented on May 28, 2024

@thombashi oh, great - looks interesting; I'll try to have a play in next day or two and get comments back to you.

Thanks,

--tony

from sqlitebiter.

psychemedia commented on May 28, 2024

One of the things I'd like to be able to do is build a simple search tool around notebooks. It's great that if I can extract content from multiple notebooks into the same tables, but it would also be useful if there was another table that listed notebook_id and the notebook filepath/filename (or maybe, directory path and filename as separate columns) and then used notebook_id e.g. as part of a compound primary key in the other tables. Then I could run queries on the content of a single notebook, or pull back the name of a notebook that contained a particular search term.

As an example, I note I can get all the content back from a single cell as a string separated by line breaks using the a query of the form:

SELECT cell_id, GROUP_CONCAT(text, '\n')
FROM (
   SELECT cell_id, text
   FROM cells_source 
   ORDER BY cell_id, line_no
   )
GROUP BY cell_id;

It would be nice if I could also do something like:

SELECT notebook, cell_id, GROUP_CONCAT(text, '\n')
FROM (
   SELECT notebook_filename AS notebook, cell_id, text
   FROM cells_source, notebooks
   WHERE notebooks.notebook_id = cells_source.notebook_id
   ORDER BY cell_id, line_no
   )
GROUP BY cell_id;

from sqlitebiter.

thombashi commented on May 28, 2024

Thank you for your feedback.
Nootebook ids (and the notebook filepath/filename) are surely useful.
I will consider that for the future release (probably the next release).

from sqlitebiter.

thombashi commented on May 28, 2024

@psychemedia

another table that listed notebook_id and the notebook filepath/filename (or maybe, directory path and filename as separate columns)

I had added the above as _source_infor_ table.
Since sqlitebiter 0.19.0, the output SQLite database scheme is as follows when you convert IPython Notebooks (you can specify multiple notebooks)

CREATE TABLE IF NOT EXISTS '_source_info_' (source_id INTEGER NOT NULL, dir_name TEXT, base_name TEXT NOT NULL, format_name TEXT NOT NULL, dst_table TEXT NOT NULL, size INTEGER, mtime INTEGER);
CREATE TABLE IF NOT EXISTS 'cells_source' (source_id INTEGER NOT NULL, cell_id INTEGER NOT NULL, line_no INTEGER NOT NULL, text TEXT);
CREATE TABLE IF NOT EXISTS 'cells_kv' (source_id INTEGER NOT NULL, cell_id INTEGER NOT NULL, key TEXT NOT NULL, value TEXT);
CREATE TABLE IF NOT EXISTS 'cells_outputs' (source_id INTEGER NOT NULL, cell_id INTEGER NOT NULL, type TEXT NOT NULL, line_no INTEGER NOT NULL, data BLOB);
CREATE TABLE IF NOT EXISTS 'cells_outputs_kv' (source_id INTEGER NOT NULL, cell_id INTEGER NOT NULL, key TEXT NOT NULL, value TEXT);
CREATE TABLE IF NOT EXISTS 'metadata_kernelspec' (source_id INTEGER NOT NULL, key TEXT NOT NULL, value TEXT NOT NULL);
CREATE TABLE IF NOT EXISTS 'metadata_language_info' (source_id INTEGER NOT NULL, key TEXT NOT NULL, value TEXT NOT NULL);
CREATE TABLE IF NOT EXISTS 'kv' (source_id INTEGER NOT NULL, key TEXT NOT NULL, value TEXT);

Some of the attributes names differ from your statements (to also use for other file formats other than IPython Notebook):

notebook_id -> source_id
filepath/filename -> dir_name, base_name

Now, you can execute example queries that you provided in the previous post as the followings:

SELECT cell_id, GROUP_CONCAT(text, '\n')
FROM (
   SELECT cell_id, text
   FROM cells_source 
   ORDER BY cell_id, line_no
   )
GROUP BY cell_id;

SELECT notebook, cell_id, GROUP_CONCAT(text, '\n')
FROM (
   SELECT base_name AS notebook, cell_id, text
   FROM cells_source, _source_info_
   WHERE _source_info_.source_id = cells_source.source_id
   ORDER BY cell_id, line_no
   )
GROUP BY cell_id;

from sqlitebiter.

thombashi commented on May 28, 2024

I'll close the issue.
Feel free to reopen if you still have any problems about the issue.

from sqlitebiter.

Feature Request: Import Jupyter notebooks about sqlitebiter HOT 7 CLOSED

Comments (7)

schema

tables

cells_source

cells_outputs

cells_outputs_kv

cells_kv

metadata_kernelspec

metadata_language_info

metadata_kv

kv

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent