tensorflow / privacy Goto Github PK

Library for training machine learning models with privacy for training data

License: Apache License 2.0

Python 68.58% Shell 1.04% Jupyter Notebook 28.79% Starlark 1.59%

privacy's Introduction

TensorFlow Privacy

This repository contains the source code for TensorFlow Privacy, a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided.

The TensorFlow Privacy library is under continual development, always welcoming contributions. In particular, we always welcome help towards resolving the issues currently open.

Latest Updates

2024-02-14: As of version 0.9.0, the TensorFlow Privacy github repository will be published as two separate PyPI packages. The first will inherit the name tensorflow-privacy and contain the parts related to training of DP models. The second, tensorflow-empirical-privacy, will contain the parts related to testing for empirical privacy.

2023-02-21: A new implementation of efficient per-example gradient clipping is now available for DP keras models consisting only of Dense and Embedding layers. The models use the fast gradient calculation results of this paper. The implementation should allow for doing DP training without any meaningful memory or runtime overhead. It also removes the need for tuning the number of microbatches as it clips the gradient with respect to each example.

Setting up TensorFlow Privacy

Dependencies

This library uses TensorFlow to define machine learning models. Therefore, installing TensorFlow (>= 1.14) is a pre-requisite. You can find instructions here. For better performance, it is also recommended to install TensorFlow with GPU support (detailed instructions on how to do this are available in the TensorFlow installation documentation).

Installing TensorFlow Privacy

If you only want to use TensorFlow Privacy as a library, you can simply execute

pip install tensorflow-privacy

Otherwise, you can clone this GitHub repository into a directory of your choice:

git clone https://github.com/tensorflow/privacy

You can then install the local package in "editable" mode in order to add it to your PYTHONPATH:

cd privacy
pip install -e .

If you'd like to make contributions, we recommend first forking the repository and then cloning your fork rather than cloning this repository directly.

Contributing

Contributions are welcomed! Bug fixes and new features can be initiated through GitHub pull requests. To speed the code review process, we ask that:

When making code contributions to TensorFlow Privacy, you follow the PEP8 with two spaces coding style (the same as the one used by TensorFlow) in your pull requests. In most cases this can be done by running autopep8 -i --indent-size 2 <file> on the files you have edited.
You should also check your code with pylint and TensorFlow's pylint configuration file by running pylint --rcfile=/path/to/the/tf/rcfile <edited file.py>.
When making your first pull request, you sign the Google CLA
We do not accept pull requests that add git submodules because of the problems that arise when maintaining git submodules

Tutorials directory

To help you get started with the functionalities provided by this library, we provide a detailed walkthrough here that will teach you how to wrap existing optimizers (e.g., SGD, Adam, ...) into their differentially private counterparts using TensorFlow (TF) Privacy. You will also learn how to tune the parameters introduced by differentially private optimization and how to measure the privacy guarantees provided using analysis tools included in TF Privacy.

In addition, the tutorials/ folder comes with scripts demonstrating how to use the library features. The list of tutorials is described in the README included in the tutorials directory.

NOTE: the tutorials are maintained carefully. However, they are not considered part of the API and they can change at any time without warning. You should not write 3rd party code that imports the tutorials and expect that the interface will not break.

Research directory

This folder contains code to reproduce results from research papers related to privacy in machine learning. It is not maintained as carefully as the tutorials directory, but rather intended as a convenient archive.

TensorFlow 2.x

TensorFlow Privacy now works with TensorFlow 2! You can use the new Keras-based estimators found in privacy/tensorflow_privacy/privacy/optimizers/dp_optimizer_keras.py.

For this to work with tf.keras.Model and tf.estimator.Estimator, however, you need to install TensorFlow 2.4 or later.

Remarks

The content of this repository supersedes the following existing folder in the tensorflow/models repository

Contacts

If you have any questions that cannot be addressed by raising an issue, feel free to contact:

Galen Andrew (@galenmandrew)
Steve Chien (@schien1729)
Nicolas Papernot (@npapernot)

Copyright

privacy's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle github30 shankar0206 daniellsm andow7 vikram-bhati petrichorcode cclauss cxepale gauravkumar3738 greninja yuhonghong7035 carlini knaggita knik1234 themavencoder catyans maxghenis sogunda devhttps codeaudit akhilesh97 innovationexploited tony32769 markdaoust locussam badbayesian wanghuazhong qizhao96 zhongyunuestc isuranimalasri 0101011 reihaneh-torkzadehmahani hobbit19 msbarnes axi57 dhanishashraf01 mshubhankar leix28 kaiya madonokouki shadirahimian ashiakerwang wanlixue perseuslee jackm969 thaneacheron peihsuan11 meitianjinbu zicofish kvthr jarodlkp ceteongvanness mr-ravin zkzhangke zhaoyang626 an1006634493 spethw jaytoday batermj atiannini hsulin0806 darthpumpkin awesomemachinelearning xiyueyiwan eurofou wkzsgxxn ccj99 cxfyxl yuxiangw splatonline xiaowei0202 fftd waterponey ratmcu mparizet manishanker mtroglia derekozhang stjordanis weihongli a2un weitong zeitgeistqian r78zyang georgian-io-archive deisler134 nidhoggurz numairfazili redshiftzero fanghongbiao ashah044 hyeamykim aramrami chenlihai fkautz rbarraud jvmncs qwinpin hzrpku

privacy's Issues

add pointer to paper documenting the epsilon computation when it is available

Hi all,

The code seems to implement Abadi et al.'s work. However, the way you compute the overall epsilon does not follow the Abadi's paper. I was wondering if you can give me some references about how you calculated the epsilon.

Thanks,

Document how to set delta

No file named 'gaussian_query' in privacy.optimizers

2 examples in the tutorials ('mnist_dpsgd_tutorial_eager.py' and 'mnist_dpsgd_tutorial_keras.py') contain the import statement 'from privacy.optimizers.gaussian_query import GaussianAverageQuery', resulting in an error "ImportError: No module named 'privacy.optimizers.gaussian_query'".

Minimicrobatch size error, only function when the minibatchsize = = 1

I implemented the other neural network model and take loss function by dp_optimizer.DPGradientDescentGaussianOptimizer.

In that time, I successed when the num_microbatch is 1. But when the num_microbatch is over 1, I got an error.

Traceback (most recent call last):
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 686, in _call_cpp_shape_fn_impl
    input_tensors_as_shapes, status)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension size must be evenly divisible by 2 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [2,?].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "autoencoder_dp.py", line 70, in <module>
    population_size=60000).minimize(cost)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 399, in minimize
    grad_loss=grad_loss)
  File "/home/madono/madono/test2/dpgan/privacy/optimizers/dp_optimizer.py", line 68, in compute_gradients
    microbatches_losses = tf.reshape(loss, [self._num_microbatches, -1])
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5782, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3292, in create_op
    compute_device=compute_device)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3332, in _create_op_helper
    set_shapes_for_outputs(op)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2496, in set_shapes_for_outputs
    return _set_shapes_for_outputs(op)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2469, in _set_shapes_for_outputs
    shapes = shape_func(op)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2399, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
    require_shape_fn)
  File "/home/madono/.pyenv/versions/anaconda3-2018.12/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimension size must be evenly divisible by 2 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [2,?].

I implement like

optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
          l2_norm_clip=1.0,
          noise_multiplier=1.1,
          num_microbatches=2,
          learning_rate=0.0002,
population_size=60000).minimize(cost)
#optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.initialize_all_variables()

# Explore trainable variables (weight_bias)
var = [v for v in tf.trainable_variables() if 'mimiciii/fc/autoencoder' in v.name] # (784, 128), (128,), (128, 784), (784,)
var_grad = tf.gradients(cost, var) # gradient of cost w.r.t. trainable variables, len(var_grad): 8, type(var_grad): list
norm_gradient_variables = []

# Launch the graph
with tf.Session() as sess:
    writer = tf.summary.FileWriter("./graph/my_graph", sess.graph)
    sess.run(init)
    total_batch = int(mnist.train.num_examples/batch_size)
    # Training cycle
    for epoch in range(training_epochs):
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={X: batch_xs})
            var_grad_val = sess.run(var_grad, feed_dict={X: batch_xs})

            # var_grad_val = [var_grad_val[0], var_grad_val[2]] # no bias, change for different network
            if type(var_grad_val) != type([0]):  # if a is not a list, which indicate it contains only one weight matrix
                var_grad_val = [var_grad_val]
            norm_gradient_variables.append(norm_w(var_grad_val))  # compute the norm of all trainable variables
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1),
                  "cost=", "{:.9f}".format(c))

Python3 tutorial training fails: "ValueError: No variables provided"

Running the default tutorial with no arguments in Python 3.5, TF 1.12 gives the following error

  File "tutorials/mnist_dpsgd_tutorial.py", line 184, in <module>                                                                                                                 
    tf.app.run()                                                                                                                                                                  
  File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run                                                                    
    _sys.exit(main(argv))                                                                                                                                                         
  File "tutorials/mnist_dpsgd_tutorial.py", line 169, in main                                                                                                                     
    mnist_classifier.train(input_fn=train_input_fn, steps=steps_per_epoch)                                                                                                        
  File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train                                                           
    loss = self._train_model(input_fn, hooks, saving_listeners)                                                                                                                   
  File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model                                                   
    return self._train_model_default(input_fn, hooks, saving_listeners)                                                                                                           
  File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default                                           
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)                                                                                                                   
  File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn                                                 
    model_fn_results = self._model_fn(features=features, **kwargs)                                                                                                                
  File "tutorials/mnist_dpsgd_tutorial.py", line 86, in cnn_model_fn                                                                                                              
    train_op = optimizer.minimize(loss=vector_loss, global_step=global_step)                                                                                                      
  File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 410, in minimize                                                         
    name=name)                                                                                                                                                                    
  File "/home/ncarlini/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 570, in apply_gradients                                                  
    raise ValueError("No variables provided.")                                                                                                                                    
ValueError: No variables provided.

Pytorch implementation of DPGradientDescentGaussianOptimizer

I implemented DPGradientDescentGaussianOptimizer in pytorch, however, I cannot get the similar accuracies as you mentioned in the tutorial. Using the default parameters in the current file I get about 94% accuracy I tried to tune the parameters I got about 95% accuracy at most . Do you have idea why this is happening?

https://github.com/srxzr/DPSGD

Interpreting the epsilon calculated using moments accountant

I have a question which might be a bit naive. Is the epsilon calculated using the moments accountant the privacy guarantee of the full model or only a single entry of the weight matrix? I have read the Abadi et al but I couldn't find where the privacy costs of each of the noisily updated weights is composed (not across time steps but across the model). Since computing a label for a new datapoint would require the release of the full model, will the privacy guarantee be a product of the calculated epsilon with the number of parameters of the model?

Adding an example of a convex loss (eg, linear regression) to the tutorial.

softmax_cross_entropy_with_logits_v2 -> sparse_softmax_cross_entropy_with_logits

Replace softmax_cross_entropy_with_logits_v2 in cnn_model_fn (mnist_vizier.py) with sparse_softmax_cross_entropy_with_logits.

It does not change the numerical value of the loss function, but it simplifies the code (no need to transform labels into one-hot vectors) and it is conceptually a better fit for the task at hand (since the classes are disjoint).

tf.contrib.framework not found with tensorflow 2.0

In lm_dpsgd_tutorial.py:

File "privacy/tutorials/lm_dpsgd_tutorial.py", line 45, in <module>
    from privacy.optimizers import dp_optimizer
  File "privacy/optimizers/dp_optimizer.py", line 22, in <module>
    from privacy.analysis import privacy_ledger
  File "privacy/analysis/privacy_ledger.py", line 29, in <module>
    nest = tf.contrib.framework.nest
AttributeError: module 'tensorflow' has no attribute 'contrib'

The contrib module seems to have been deleted in tensorflow 2.0.
Is it still possible to call framework.nest from another module or should I downgrade tensorflow ?

error when running mnist_dpsgd_tutorial_keras

when i was running the code , there is a error as follows " module 'tensorflow.python.keras.losses' has no attribute 'CategoricalCrossentropy'" , i am wondering if it is because i have an old version of tensorflow , if it is ,please tell me which version you are using ,thx.

Document orders list in eps computation

Tensorflow version problem in dp_optimizer.py

The TF version in dp_optimizer.py is required to be 2.0, but for other .py files, 1.X is required (tf.contrib is needed for them).

Out of date comment

https://github.com/tensorflow/privacy/blob/master/tutorials/mnist_dpsgd_tutorial.py#L72

Training fails with --dpsgd=False

It makes no difference to the diagnostic message (below) whether the training is done on CPU vs GPU.

INFO:tensorflow:loss = 3.3560193, step = 0
ERROR:tensorflow:Model diverged with loss = NaN.
...
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

Accuracy on MNIST is 0

I have run mnist_dpsgd_tutorial_keras.py without changing any parameters, except the epoch is set to 1, but the result is like "3000/3000 [==============================] - 153s 51ms/sample - loss: 2.1090 - acc: 0.0000e+00". And its test accuracy is 10%. If I only train it on digit 0 and digit 1, the training accuracy is around 80%, but if I train on other digits, the training accuracy is 0.

Create abstract DPAverageQuery

Some algorithms like DP Federated Averaging need to support weighted DP means. The "weight" could be an argument to preprocess_record, but not all DPQueries need to support weighted records. An abstract subclass of DPQuery called DPAverageQuery could add that argument. As another advantage, algorithms that require an average could check the type of DPQuery to ensure that it is a subclass of DPAverageQuery.

Python3 tutorial: TypeError

There is a new issue after applying the fix in #4 . After finishing the first epoch:

Test accuracy after 1 epochs is: 0.715
Traceback (most recent call last):
  File "mnist_dpsgd_tutorial.py", line 184, in <module>
    tf.app.run()
  File "/home/leon/anaconda3/envs/privacy2/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "mnist_dpsgd_tutorial.py", line 178, in main
    eps = compute_epsilon(epoch * steps_per_epoch)
  File "mnist_dpsgd_tutorial.py", line 156, in compute_epsilon
    orders = [1 + x / 10. for x in range(1, 100)] + range(12, 64)
TypeError: can only concatenate list (not "range") to list

Edit: There's a PR for this: #3

mnist_dpsgd_tutorial_keras error

When I run the mnist_dpsgd_tutorial_keras.py, get a error "ValueError: Dimension size must be evenly divisible by 250 but is 1 for 'training/TFOptimizer/Reshape' (op: 'Reshape') with input shapes: [], [2] and with input tensors computed as partial shapes: input[1] = [250,?]."

Error using MNIST dataset parameters are same as given in MNIST DPSGD

I have just run the mnist_dpsgd_tutorial_keras.py file and I got this error.
kindly reply asap.
thanks in advance

WARNING:tensorflow:From c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From D:\DP\privacy\dp_query\gaussian_query.py:49: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 60000 samples, validate on 10000 samples
WARNING:tensorflow:From c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/60

InvalidArgumentError Traceback (most recent call last)
in ()
----> 1 tf.app.run()

c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\platform\app.py in run(main, argv)
123 # Call the main function, passing through any arguments
124 # to the final program.
--> 125 _sys.exit(main(argv))
126

in main(unused_argv)
49 epochs=FLAGS.epochs,
50 validation_data=(test_data, test_labels),
---> 51 batch_size=FLAGS.batch_size)
52
53 # Compute the privacy budget expended.

c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, max_queue_size, workers, use_multiprocessing, **kwargs)
878 initial_epoch=initial_epoch,
879 steps_per_epoch=steps_per_epoch,
--> 880 validation_steps=validation_steps)
881
882 def evaluate(self,

c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\engine\training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, mode, validation_in_fit, **kwargs)
327
328 # Get outputs.
--> 329 batch_outs = f(ins_batch)
330 if not isinstance(batch_outs, list):
331 batch_outs = [batch_outs]

c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\keras\backend.py in call(self, inputs)
3074
3075 fetched = self._callable_fn(*array_vals,
-> 3076 run_metadata=self.run_metadata)
3077 self._call_fetch_callbacks(fetched[-len(self._fetches):])
3078 return nest.pack_sequence_as(self._outputs_structure,

c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\client\session.py in call(self, *args, **kwargs)
1437 ret = tf_session.TF_SessionRunCallable(
1438 self._session._session, self._handle, args, status,
-> 1439 run_metadata_ptr)
1440 if run_metadata:
1441 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

c:\users\tom-16s1\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\framework\errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
526 None, None,
527 compat.as_text(c_api.TF_Message(self.status.status)),
--> 528 c_api.TF_GetCode(self.status.status))
529 # Delete the underlying status object from memory otherwise it stays alive
530 # as there is a reference to status from this from the traceback due to

InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [250,10] and labels shape [2500]
[[{{node loss/dense_1_loss/CategoricalCrossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

Python3 rdp_accountant: "name 'long' is not defined"

rdp_accountant.py contains an assert statement assert isinstance(alpha, (int, long)). However, Python3 doesn't have a long type anymore.

Error message:

Traceback (most recent call last):
  File "mnist_dpsgd_tutorial.py", line 184, in <module>
    tf.app.run()
  File "/home/leon/anaconda3/envs/privacy2/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "mnist_dpsgd_tutorial.py", line 178, in main
    eps = compute_epsilon(epoch * steps_per_epoch)
  File "mnist_dpsgd_tutorial.py", line 161, in compute_epsilon
    orders=orders)
  File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 261, in compute_rdp
    for order in orders])
  File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 261, in <listcomp>
    for order in orders])
  File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 240, in _compute_rdp
    return _compute_log_a(q, sigma, alpha) / (alpha - 1)
  File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 145, in _compute_log_a
    return _compute_log_a_int(q, sigma, int(alpha))
  File "/home/leon/Ferrum/machine_learning_prototypes/privacy_prototype/privacy/privacy/analysis/rdp_accountant.py", line 89, in _compute_log_a_int
    assert isinstance(alpha, (int, long))
NameError: name 'long' is not defined

Edit: Looks like there's a PR for this: #1

Error when reload mnist data and labels

When I run 'train_teachers.py' under Python3 for the first time,
python train_teachers.py --nb_teachers=100 --teacher_id=0 --dataset=mnist --max_steps=1000
datas and labels are downloaded and loaded successfully.
But when run for a second time,
python train_teachers.py --nb_teachers=100 --teacher_id=1 --dataset=mnist --max_steps=1000
the error occur:

Traceback (most recent call last):
  File "train_teachers.py", line 102, in <module>
    tf.app.run()
  File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
    _sys.exit(main(argv))
  File "train_teachers.py", line 99, in main
    assert train_teacher(FLAGS.dataset, FLAGS.nb_teachers, FLAGS.teacher_id)
  File "train_teachers.py", line 60, in train_teacher
    train_data, train_labels, test_data, test_labels = input.ld_mnist()
  File "E:\privacy-master\research\pate_2017\input.py", line 421, in ld_mnist
    train_data = extract_mnist_data(local_urls[0], 60000, 28, 1)
  File "E:\privacy-master\research\pate_2017\input.py", line 285, in extract_mnist_data
    return np.load(file_obj)
  File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\numpy\lib\npyio.py", line 416, in load
    magic = fid.read(N)
  File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 132, in read
    pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
  File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 100, in _prepare_value
    return compat.as_str_any(val)
  File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\util\compat.py", line 107, in as_str_any
    return as_str(value)
  File "D:\Users\10066\Anaconda3\envs\nn\lib\site-packages\tensorflow\python\util\compat.py", line 80, in as_text
    return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

I try to fix this error by making some change in input.py. Original code is:

def extract_mnist_data(filename, num_images, image_size, pixel_depth):
  """
  Extract the images into a 4D tensor [image index, y, x, channels].

  Values are rescaled from [0, 255] down to [-0.5, 0.5].
  """
  # if not os.path.exists(file):
  if not tf.gfile.Exists(filename+".npy"):
    with gzip.open(filename) as bytestream:
      bytestream.read(16)
      buf = bytestream.read(image_size * image_size * num_images)
      data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
      data = (data - (pixel_depth / 2.0)) / pixel_depth
      data = data.reshape(num_images, image_size, image_size, 1)
      np.save(filename, data)
      return data
  else:
    with tf.gfile.Open(filename+".npy", mode='r') as file_obj:
      return np.load(file_obj)


def extract_mnist_labels(filename, num_images):
  """
  Extract the labels into a vector of int64 label IDs.
  """
  # if not os.path.exists(file):
  if not tf.gfile.Exists(filename+".npy"):
    with gzip.open(filename) as bytestream:
      bytestream.read(8)
      buf = bytestream.read(1 * num_images)
      labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int32)
      np.save(filename, labels)
    return labels
  else:
    with tf.gfile.Open(filename+".npy", mode='r') as file_obj:
      return np.load(file_obj)

I delete judgements of existance, as follows:


def extract_mnist_data(filename, num_images, image_size, pixel_depth):
  """
  Extract the images into a 4D tensor [image index, y, x, channels].

  Values are rescaled from [0, 255] down to [-0.5, 0.5].
  """
  # if not os.path.exists(file):
  with gzip.open(filename) as bytestream:
    bytestream.read(16)
    buf = bytestream.read(image_size * image_size * num_images)
    data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
    data = (data - (pixel_depth / 2.0)) / pixel_depth
    data = data.reshape(num_images, image_size, image_size, 1)
    np.save(filename, data)
    return data


def extract_mnist_labels(filename, num_images):
  """
  Extract the labels into a vector of int64 label IDs.
  """
  # if not os.path.exists(file):
  with gzip.open(filename) as bytestream:
    bytestream.read(8)
    buf = bytestream.read(1 * num_images)
    labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int32)
    np.save(filename, labels)
    return labels

It could work, but I'm wondering what cause the error, and also not sure if there is any risk to make such changes.

Add an LSTM model to the tutorial

error during training the student model

@npapernot Hi, I try to run pate_2017 code.
I have succesfully trained the teacher model, but when I train the student model using command "python train_student.py --nb_teachers=100 --dataset=mnist --stdnt_share=5000"
However, there is an error like this:

Traceback (most recent call last):
File "train_student.py", line 208, in
tf.app.run()
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "train_student.py", line 205, in main
assert train_student(FLAGS.dataset, FLAGS.nb_teachers)
File "train_student.py", line 177, in train_student
stdnt_dataset = prepare_student_data(dataset, nb_teachers, save=True)
File "train_student.py", line 111, in prepare_student_data
test_data, test_labels = input.ld_mnist(test_only=True)
File "C:\Users\eleva\privacy\research\pate_2017\input.py", line 386, in ld_mnist
train_data = extract_mnist_data(local_urls[0], 60000, 28, 1)
File "C:\Users\eleva\privacy\research\pate_2017\input.py", line 274, in extract_mnist_data
return np.load(file_obj)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\numpy\lib\npyio.py", line 416, in load
magic = fid.read(N)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 132, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 100, in _prepare_value
return compat.as_str_any(val)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\util\compat.py", line 107, in as_str_any
return as_str(value)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\util\compat.py", line 80, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

Add support for processing of PrivacyLedger by privacy accountant.

setting value of L2 clipping and noise multiplier

What is the range of L2 clipping and noise multiplier?
I read from the tutorial, it was mentioned to use L2 = [0.3, 1.0] and noise = [0.3, ]
Are these parameters data independent?
Is there any relationship between them?
If I use L2/noise > 10, does it make any sense?

update nest imports to be compatible w/ tf2

I tried to modify "tensorflow/python/keras/engine/training_utils.py" according the keras_tutorial:def get_loss_function():
...

return losses.LossFunctionWrapper(loss_fn, name=loss_fn.name)

return losses.LossFunctionWrapper(loss_fn,

                                name=loss_fn.__name__,

                                reduction=losses_impl.Reduction.NONE)

I found out only tensorflow>1.12 has the function "get_loss_function(loss):".
And only tensorflow2.0 has the line of code"return losses.LossFunctionWrapper(loss_fn, name=loss_fn.name)". So I guess the tutorial use tensorflow2.0.0?

But When import privacy.analysis. privacy_ledger.py, there are lots of error. like "nest = tf.contrib.framework.nest" since tf2.0 does not have module"contrib".

My question is which tensorflow version should I use if I try to use keras.
Thanks

Keras tutorial is waiting on an internal Keras modification to execute properly

We are waiting for Keras' compile method to accept a non-scalar loss so the corresponding tutorial can run without making changes to the tensorflow library.

See b/124011218

mnist_dpsgd_tutorial_keras.py getting accuracies of about 10%

When running mnist_dpsgd_tutorial_keras.py the accuracies are only about 10%, however the standard model without DPGradientDescentGaussianOptimizer gets close to 100% accuracy. mnist_dpsgd_tutorial.py with DP however gives close to 95% accuracy.

Can TF Privacy be used for generating synthetic data?

Congrats on today's launch! I'm involved in a project to generate a synthetic version of a restricted dataset, to preserve privacy of the original data. A few months ago, we reviewed some of the research around differential privacy and synthetic data, and determined that it was too nascent to try to apply. Instead we've been synthesizing the dataset using simpler ML models like random forests to sequentially sample from the conditional distribution (our data is flat, and basically only has continuous features).

However, TF Privacy hitting the shelves seems like an advance worth us reconsidering our abandonment of DP. Has TF Privacy been used for synthesizing data before? To do it the way we've done it so far, we'd need to be able to predict quantiles, which we've seen some ways to do like this blog post, and TF Probability seems promising too. I've used TF a bit, but we're not experts by any means, so we'd appreciate any guidance you can offer. Thanks!

changes to the Optimizer class API on tf2

Because the Optimizer class was moved and does not have a compute_gradients method anymore:

the check for unchanged compute_gradients does not work anymore on tf2
This will most likely mean we have to make a few changes to the calls we make (e.g., when we call compute_gradients for each microbatch). I haven't investigated this yet

Add more information about 'orders' to walkthrough.md

I'm working through the tutorial in walkthrough.md, and there's currently no explanation of how 'orders' are generated, in the section where the RDP is computed. I'd appreciated it if you could add a definition for 'orders' and how to calculate them for new datasets. Thanks!

rdp_accountant.py 　　Function understanding

I'm new to DP, it's hard for me to understand some functions especially '_compute_rdp' , '_compute_log_a'
is there some papers i can reference? Thanks!:)

Some details about implements

I have known the reason about why you add noise after accumulating all gradients in one batch, but I wonder why you use'Query' which sounds like an operation in database to accumulate gradients?
In GaussianSumQuery class, there is variable called 'global_state', but during debug stage, it is always None?
Thanks

consider making microbatches an optional argument to the optimizers and set its default value to minibatch size

tf.train.AdagradOptimizer not found with tensorflow 2.0

I know you fixed the tf.nest problem. But tf 2.0 also removed tf.train.AdagradOptimizer. Could you please do an update for this one as well?
here is the error log:
Traceback (most recent call last):
File "mnist_dpsgd_tutorial_keras.py", line 55, in
from privacy.optimizers.dp_optimizer import DPGradientDescentOptimizer
File "/data/repositories/privacy/privacy/optimizers/dp_optimizer.py", line 187, in
DPAdagradOptimizer = make_optimizer_class(tf.train.AdagradOptimizer)
AttributeError: module 'tensorflow._api.v2.train' has no attribute 'AdagradOptimizer'

Rényi DP Accountant - Sampled Gaussian Mechanism - Reference?

Hi all,

I was wondering whether there was a reference for the Rényi DP accountant, more specifically, the code applied when sub-sampling is involved? I saw this issue: #13

and was wondering whether the paper had been released yet. If not, would it be possible to share a draft?

Best,

Failure to train using dp_optimizer, run ok for non private optimizer

Hi all,
I'm training tensorflow privacy on APS dataset.
The code ran without error for non private mode.
But when I set flag for dpsgd == True. It shows error. I think it must be something with the vector_loss which is the only different between the 2.

The errors are listed below:
Traceback (most recent call last):
File "aps_log_reg.py", line 210, in
app.run(main)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "aps_log_reg.py", line 198, in main
model.train(input_fn, steps=step_per_epoch)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "aps_log_reg.py", line 145, in model_fn
global_step=global_step)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/optimizer.py", line 403, in minimize
grad_loss=grad_loss)
File "/Users/rachelton/DP/privacy/privacy/optimizers/dp_optimizer.py", line 170, in compute_gradients
_, sample_state = tf.while_loop(cond_fn, body_fn, [idx, sample_state])
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3556, in while_loop
return_same_structure)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3087, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3022, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/Users/rachelton/DP/privacy/privacy/optimizers/dp_optimizer.py", line 168, in
body_fn = lambda i, state: [tf.add(i, 1), process_microbatch(i, state)] # pylint: disable=line-too-long
File "/Users/rachelton/DP/privacy/privacy/optimizers/dp_optimizer.py", line 149, in process_microbatch
sample_params, sample_state, grads_list)
File "/Users/rachelton/DP/privacy/privacy/dp_query/dp_query.py", line 159, in accumulate_record
preprocessed_record = self.preprocess_record(params, record)
File "/Users/rachelton/DP/privacy/privacy/analysis/privacy_ledger.py", line 250, in preprocess_record
return self._query.preprocess_record(params, record)
File "/Users/rachelton/DP/privacy/privacy/dp_query/normalized_query.py", line 74, in preprocess_record
return self._numerator.preprocess_record(params, record)
File "/Users/rachelton/DP/privacy/privacy/dp_query/gaussian_query.py", line 100, in preprocess_record
preprocessed_record, _ = self.preprocess_record_impl(params, record)
File "/Users/rachelton/DP/privacy/privacy/dp_query/gaussian_query.py", line 96, in preprocess_record_impl
clipped_as_list, norm = tf.clip_by_global_norm(record_as_list, l2_norm_clip)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/clip_ops.py", line 278, in clip_by_global_norm
constant_op.constant(1.0, dtype=use_norm.dtype) / clip_norm)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/math_ops.py", line 812, in binary_op_wrapper
return func(x, y, name=name)
File "/Users/rachelton/Library/Python/3.7/lib/python/site-packages/tensorflow/python/ops/math_ops.py", line 912, in _truediv_python3
(x_dtype, y_dtype))
TypeError: x and y must have the same dtype, got tf.float64 != tf.float32

my code:
`
from future import absolute_import
from future import division
from future import print_function

from absl import app
from absl import flags
from distutils.version import LooseVersion
from privacy.analysis import privacy_ledger
from privacy.analysis.rdp_accountant import compute_rdp_from_ledger
from privacy.analysis.rdp_accountant import get_privacy_spent
from privacy.optimizers import dp_optimizer

import matplotlib.pyplot as plt
import tensorflow as tf
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.preprocessing import MaxAbsScaler

if LooseVersion(tf.version) < LooseVersion('2.0.0'):
GradientDescentOptimizer = tf.train.GradientDescentOptimizer
else:
GradientDescentOptimizer = tf.optimizers.SGD # pylint: disable=invalid-name

FLAGS = flags.FLAGS

flags.DEFINE_boolean(
'dpsgd', True, 'If True, train with DP-SGD. If False, '
'train with vanilla SGD.')
flags.DEFINE_float('learning_rate', 0.1, 'Learning rate for training')
flags.DEFINE_float('noise_multiplier', 1.1,
'Ratio of the standard deviation to the clipping norm')
flags.DEFINE_float('l2_norm_clip', 1.0, 'Clipping norm')
flags.DEFINE_integer('batch_size', 128, 'Batch size')
flags.DEFINE_integer('epochs', 1, 'Number of epochs')

flags.DEFINE_integer('num_steps', 1000, 'Number of steps')

flags.DEFINE_integer('num_classes', 2, 'Number of classes')
flags.DEFINE_integer('microbatches', 128, 'Number of microbatches ''(must evenly divide batch_size)')
flags.DEFINE_string('model_dir', None, 'Model directory')

class EpsilonPrintingTrainingHook(tf.estimator.SessionRunHook):
"""Training hook to print current value of epsilon after an epoch."""

def __init__(self, ledger):
	"""Initalizes the EpsilonPrintingTrainingHook.
	Args:
	ledger: The privacy ledger.
	"""
	self._samples, self._queries = ledger.get_unformatted_ledger()

def end(self, session):
	orders = [1 + x / 10.0 for x in range(1, 100)] + list(range(12, 64))
	samples = session.run(self._samples)
	queries = session.run(self._queries)
	formatted_ledger = privacy_ledger.format_ledger(samples, queries)
	rdp = compute_rdp_from_ledger(formatted_ledger, orders)
	eps = get_privacy_spent(orders, rdp, target_delta=1e-5)[0]
	print('For delta=1e-5, the current epsilon is: %.2f' % eps)

def get_data():

df_train = pd.read_csv('data_original/aps_failure_training_set.csv')
df_test = pd.read_csv('data_original/aps_failure_test_set.csv')
# 
df_train.replace('na','-1', inplace=True)
df_test.replace('na','-1', inplace=True)
# categorical for label: 0: neg, 1: pos
df_train['class'] = pd.Categorical(df_train['class']).codes
df_test['class']  = pd.Categorical(df_test['class']).codes

# split data into x and y
Y_train = df_train['class'].copy(deep=True)
X_train = df_train.copy(deep=True)
X_train.drop(['class'], inplace=True, axis=1)

Y_test = df_test['class'].copy(deep=True)
X_test = df_test.copy(deep=True)
X_test.drop(['class'], inplace=True, axis=1)

# strings to float
X_train = X_train.astype('float64')
X_test  = X_test.astype('float64')

# scale the dataset
scaler = MaxAbsScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test  = scaler.transform(X_test)

return X_train, Y_train, X_test, Y_test

def linear_layer(x_dict):
x = x_dict['images']
out_layer = tf.keras.layers.Dense(FLAGS.num_classes).apply(x)
return out_layer

def model_fn(features, labels, mode):
logits = linear_layer(features)

# vector loss: each component of the vector correspond to an individual training point and label.
# Use for per example gradient later.
vector_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
	labels=tf.cast(labels, dtype=tf.int64))


scalar_loss = tf.reduce_mean(vector_loss)
print('*******************')
print(vector_loss.dtype)
print(scalar_loss.dtype)
if mode == tf.estimator.ModeKeys.TRAIN:

	if FLAGS.dpsgd:
		ledger = privacy_ledger.PrivacyLedger(
			population_size=60000,
			selection_probability=(FLAGS.batch_size / 60000))

		optimizer = dp_optimizer.DPGradientDescentGaussianOptimizer(
			l2_norm_clip=FLAGS.l2_norm_clip,
			noise_multiplier=FLAGS.noise_multiplier,
			num_microbatches=FLAGS.microbatches,
			ledger=ledger,
			learning_rate=FLAGS.learning_rate)
		training_hooks = [
			EpsilonPrintingTrainingHook(ledger)
		]
		opt_loss = vector_loss
	else:
		optimizer = tf.train.GradientDescentOptimizer(learning_rate=FLAGS.learning_rate)
		opt_loss = scalar_loss
		training_hooks = []
	global_step = tf.train.get_global_step()		
	train_op = optimizer.minimize(loss=opt_loss,
		global_step=global_step)
	return tf.estimator.EstimatorSpec(mode=mode,
		loss=scalar_loss,
		train_op=train_op,
		training_hooks=training_hooks)
elif mode == tf.estimator.ModeKeys.EVAL:
	pred_classes = tf.argmax(logits, axis=1)
	acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)
	return tf.estimator.EstimatorSpec(mode=mode,
		loss=scalar_loss,
		eval_metric_ops={'accuracy':acc_op})
#if mode == tf.estimator.ModeKeys.PREDICT:

return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)

def main(unused_argv):

tf.logging.set_verbosity(tf.logging.INFO)
if FLAGS.dpsgd and FLAGS.batch_size % FLAGS.microbatches != 0:
	raise ValueError('Number of microbatches should divide evenly batch_size')

# get data: train_data, train_label, test_data, test_label
x_train, y_train, x_test, y_test = get_data()
# print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)

# Init estimator
# model_fn, model_dir
model = tf.estimator.Estimator(model_fn)

# define train input
input_fn = tf.estimator.inputs.numpy_input_fn(
	x={'images':x_train},
	y=y_train,
	batch_size=FLAGS.batch_size,
	num_epochs=None,
	shuffle=True)

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
	x={'images': x_test},
	y=y_test,
	batch_size=FLAGS.batch_size,
	shuffle=False)


step_per_epoch = 60000 // FLAGS.batch_size
# train model on train input
for epoch in range(FLAGS.epochs):
	model.train(input_fn, steps=step_per_epoch)

if name == "main":
app.run(main)
`

flags have moved in tf2

BoostedTreesClassifier

Does TF Privacy support tree-based classification models for example: tf.estimator.BoostedTreesClassifier?
Since, as per my understanding the existing DP optimizers are all Gradient Descent based.

Remove magical numbers in tutorials

error during training teachers

in the pate_2017, when i try to train teachers ,a error occured : in the deep_cnn.py , there is a module "from differential_privacy.multiple_teachers import utils", i thought it should be changed into "from pate_2017 import utils" since folder have changed , if i am wrong ,please let me know ,thx.

Add expected output to the tutorial.

It'd be useful to have the first few lines of the tutorial's output and include some target accuracy numbers.

Pip Install error

As far as I know, pip install requires a setup.py file.
Running the command given in the "Installing TensorFlow Privacy" (pip install -e ./privacy)
throws the following error:

Directory './privacy' is not installable. File 'setup.py' not found.

Am I missing something here?

Document orders list in eps computation

No module named "tensorflow_privacy"

4 hours ago a change was made to privacy/privacy/init.py. I'm using Google Colab for a project and this change leads to a module error: 'No module named "tensorflow_privacy' ". Removing all the "tensorflow_privacy." corrects it.

shape problem for Keras example

tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [250,10] and labels shape [2500]

it looks different from other issues like: #17 and #27

Fails to train a Keras non-sequential model with DP optimizer

For a non-sequential model if number of examples is not a multiple of number of microbatches (see the data file below) the training with DPAdamOptimizer fails.

Numpy 1.14.4
Tensorflow 1.13.1

Error:

 6/13 [============>.................] - ETA: 0s - loss: 6.5262Traceback (most recent call last):
  File "/home/andrey/Documents/experiments/dp.py", line 114, in <module>
    epochs=epochs)
  File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
    validation_steps=validation_steps)
  File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
    batch_outs = f(ins_batch)
  File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
    run_metadata=self.run_metadata)
  File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/andrey/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 3
	 [[{{node training/TFOptimizer/Reshape}}]]
         [[{{node loss/mul}}]]

Code:

import tensorflow as tf
from tensorflow.keras import losses
from privacy.optimizers.dp_optimizer import DPAdamOptimizer
from privacy.optimizers.gaussian_query import GaussianAverageQuery

batch_size = 6
microbatches = 3

noise_multiplier = 1.1
l2_norm_clip = 1.0
epochs = 30

names = pd.read_csv('names.csv', delimiter=',')
names = names['ch1'].values
names.shape = names.shape + (1,)

my_input = tf.keras.layers.Input(shape=(1,))
my_dense = tf.keras.layers.Dense(7)(my_input)
model = tf.keras.Model(my_input, my_dense)

dp_average_query = GaussianAverageQuery(l2_norm_clip, l2_norm_clip * noise_multiplier,        microbatches)

optimizer = DPAdamOptimizer(dp_average_query,microbatches,learning_rate=learning_rate,    unroll_microbatches=True)

loss = losses.CategoricalCrossentropy(from_logits=False, reduction=tf.losses.Reduction.NONE)

model.fit(x=names,y=names,batch_size=batch_size,epochs=epochs)

Dataset - names.csv:

id,ch1
1,1
2,3
3,4
4,1
5,2
6,3
7,3
8,1
9,1
10,2
11,3
12,4
13,4

If there is only 12 examples in the dataset, the model trains for batch_size = 6 and microbatches = 3

amount of added noise for low number of microbatches is too harmful for performance.

Sorry, it's probably a stupid question. If I understand correctly reducing number of microbatches from 256 to 32 will result in total norm of the sum to be less than 32S, whereas for 256 batches it should be less than 256S. Applying same amount of noise zS on a vector of a smaller norm results in a poorer performance. Can't we treat the gradient of each microbatch as a sum of gradients for k individual images and therefore clip it by kS, instead of mean and clip it by S?