advancedphotonsource / tike Goto Github PK
View Code? Open in Web Editor NEWRepository for ptychography software
Home Page: http://tike.readthedocs.io
License: Other
Repository for ptychography software
Home Page: http://tike.readthedocs.io
License: Other
Can we use real probes from a ptychography experiment to model data in Jupiter notebook examples? Flat-top gaussian is not a good approximation of real probes. So we cannot correctly test the developed functionality.
Computing the full cost function requires communications and reductions overhead which may not be needed.
According to the manuscript, the least-squares grad common gradients denominator should be the norm across all batches not just the current batch. This is probably implemented incorrectly and needs to be fixed.
i.e. the inputs need to be checked for NaNs, negative numbers, INF, etc.
I seems that restarting a reconstruction has a memory overhead that increases with each restart. This causes out of memory error. For example, restarting the reconstruction 10 times with 10 iterations each time consumes more memory than doing one reconstruction with 100 iterations.
The least-squares solver for maximum-likelihood ptychography proposed two ways to partition batches in the reconstruction: sparse and compact.
Which method is currently in tike? Is there a plan to support both of them?
Doga wants to reorder the input parameters for some of the functions.
I hope numpy should have some predefined constant for the minimal float32 value greater than 0. So we can replace 1e-32 everywhere
https://numpy.org/doc/stable/reference/generated/numpy.finfo.html?highlight=finfo#numpy.finfo
Currently, the Laminography operator has a fixes size ratio between projections sampling and field of view sampling. i.e. for projections sized 32x32, the reconstructed volume is 32x32x32 and is sampled in a 64x64x64 frequency space. We should break this fixed ratio so that we can reconstruct downsampled fields of view i.e. 32x32 projections through a 8x8x8 field of view or 32x32 projections in a 32x32x32 frequency space for 32x32x32 field of view.
What regridding algorithm to use though?
I would suggest having a separate module for cost functions and for gradients. Any cost function and gradient only need fwd and adj operators to be computed. So users can define their own cost functions and gradients. It would be nice to have def cost_gaussian(fwd,adj,...), def gradient_gaussian(fwd,adj,..),etc. I think the same concept is used in ODL.https://odlgroup.github.io/odl/ @nikitinvv
Cuda kernels in usfft.cu, convolution.cu are not optimal.
They are heavy and use a huge number of registers which may significantly slowdown the code. They should be split into smaller ones.
If there exist 'if,else' statement like for fwd/adj operator then it is better to split it into 2, like it was at the beginning.
Loops with many iterations inside the kernels should be avoided by adding new threads, this way allowing the scheduler to switch threads more efficiently, like it was at the beginning
Non-sequential and uncoallesced memory access in the loops make the code slower, the following loop structure is unacceptable from cuda optimization point of view. It also looks unreadable.
// for each image
for (int ti = blockIdx.z; ti < nimage; ti += gridDim.z) {
// for each scan position
for (int ts = blockIdx.y; ts < nscan; ts += gridDim.y) {
....
// for x,y coords in patch
for (int py = blockIdx.x; py < patch_shape; py += gridDim.x) {
for (int px = threadIdx.x; px < patch_shape; px += blockDim.x) {
Use 3d thread blocks and grids associated with each array dimension, like it was at the beginning. This will give you natural coalesced memory access.
block = (min(self.scatter_kernel.max_threads_per_block, (2 * m)**3),)
Say m=3, then block = 216 - not optimal.
Would It be better to return to my initial kernels implementations?
Reduce the number of cost function calls?
Tweak the step size parameters?
http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#substitutions
http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#substitution-definitions
We can use substitutions to only write docstrings once for parameters that are used for multiple functions:
Example:
"""Do something module
.. |theta_docstring| replace:: This text describes theta for all functions.
"""
import numpy
def func_a(theta):
"""Something
Parameters
---------------
theta :
|theta_docstring|
"""
def func_b(theta):
"""Something
Parameters
---------------
theta :
|theta_docstring|
"""
This approach is beneficial because it can reduce source code length and makes maintenance easier because updating one string updates many strings.
Add some tests for a poor initial probe guess.
We want to set up a system for logging information about the progress of reconstruction algorithms such as convergence criteria and which number of iterations have been completed. This system should probably use the standard logger of python and print statements for storing information optionally.
Instead of changing the API of tike
to accommodate development, we can use the following tools to capture data in text files and display messages to stout
.
Here's an example that demonstrates these concepts.
dev_logging.py:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Define an example of the logging system to be used in tike."""
import numpy as np
import logging
# First set the logging level to logging.INFO.
logging.basicConfig(level=logging.INFO)
# Then we need to grab a reference to the existing logger or make a new one.
logger = logging.getLogger(__name__)
if __name__ == '__main__':
# Go to https://pyformat.info/ for a summary of how to use format()
print("{} is a random number.\nThis string will go to the standard output "
"stream".format(np.random.randint(0, 100)))
logger.info("This is an info level statement.\nLogging statements go to "
"standard error.")
logger.debug("This is a debugging level statement.\nIt appears when the "
"logging level is det to DEBUG or higher.")
if __debug__:
print("This statement appears if Python was unoptimized.\n"
"i.e. if it was started without the -O option.")
logger.info("Good bye!")
Run the script in a bash terminal:
$ python -O dev_logging.py > some_data.txt
INFO:__main__:This is an info level statement.
INFO:__main__:Good bye!
some_data.txt:
46 is a random number.
This string will go to the standard outputstream
There is a bug in the example where the cost function printed is empty.
There are a lot of these types of switches in the solver implementations. The Comm API should be reworked to avoid these switches.
if comm.use_mpi:
cost = comm.Allreduce_reduce(cost, 'cpu')
else:
cost = comm.reduce(cost, 'cpu')
It looks like the largest memory allocation events are for fftshift
, fftplan
, and pad
.
While cupy.fft.fftn
has an overwrite_x
parameter, fftshift
does not. This is because implementation of fftshift
is completely written in Python and relies on another Python-only implementation of numpy.roll
.
We should write our own in-place fftshift
, ifftshift
, and roll
functions using a CUDA Kernel because I don't think the NumPy API was designed to handle in-place reordering of dimensions.
There is a CuPy option to avoid plans and use a series of 1D plans. These plans appear to be 686MiB, which seems quite large, but maybe they are worth it.
This padding operation in eq2us
may be avoidable if we do some modular arithmetic in the gather function? Likewise, there is an _unpad
function which could be avoided after scatter
if this function used modular arithmetic.
Fe = xp.pad(Fe0, m, mode='wrap')
F = gather(xp, Fe, x, n, m, mu)
If the detector and probe sizes are set the same, ptycho.simulate
fails.
Currently, linear interpolation in the near plane domain is performed in order to handle floating-point scan positions. Another and accurate way of doing so is to perform shifts in the frequency domain. Say a scan position is given as s = si+sf where si is integer part and sf is floating part. We need to compute the Fourier transform of Q\psi(x+si+sf). By using Fourier properties, F(Q\psi(x+si+sf))(\xi)=F(Q\psi(x+si))(\xi) * exp(-2\pi i \xi sf), and the floating-point position problem boils down to multiplication in the frequency domain.
It would be good to compare two approaches (linear interpolation and multiplication in the frequency domain) in terms of accuracy and performance on CPU and GPU.
In order to avoid amplitude ambiguity in psi and keep it 1 on average, on each iteration we can compute |FQψ| and compare its norm with the norm of data. Then compute the constant c = norm(|FQψ) / norm(sqrt(data)) and do Q/=c. So this way we simply replace prb by prb/c and ψ by ψc, which does not affect the functional |FQ/c ψc| = |FQψ|). @nikitinvv
A lot of ptycho-tomography/laminography data we acquired recently are from thick objects for which the single-slice ptychography model no longer holds. We need to incorporate the multislice model to get better reconstruction.
An ideal interface would be something like this:
import ptychopy
# Function accepts inputs such as integers, tuples, and numpy arrays
ptychopy.epieinit(
jobID='sim512',
beamSize=110e-9,
scanDims=(30, 30),
step=(50e-9, 50e-9),
i=60,
size=512,
lamda=2.4796837508399954e-10,
dx_d=172e-6,
z=1,
simulate=1,
blind=0,
)
ptychopy.epiestep(its=3)
data = ptychopy.epieresobj()
ptychopy.epiepost()
def epieinit(psi, probe, data, scan, ....):
"""Initializes a reconstruction for the ePIE algorithm.
Parameters
----------
psi : (..., nz, n) complex64
The complex wavefront modulation of the object.
probe (..., 1, 1, nprobe, probe_shape, probe_shape) : complex64
The complex illuminatio function.
data (..., nframe, detector_shape, detector_shape) : float32
The square of the absolute value of `farplane` summed over `fly` and
`modes`.
scan : (..., nscan, 2) float32
Coordinates of the minimum corner of the probe grid for each
measurement in the coordinate system of psi. Vertical coordinates
first, horizontal coordinates second.
"""
pass
def epiestep(num_iter=1):
"""Execute one step of ePIe"""
pass
35.249 <module> profile_admm.py:3
└─ 35.225 admm_profile_workload profile_admm.py:59
├─ 29.160 admm tike/tike.py:88
x │ ├─ 15.025 forward tike/tomo.py:256
│ ├─ 12.937 reconstruct tike/ptycho.py:433
│ │ └─ 12.929 grad tike/ptycho.py:345
│ │ ├─ 1.873 combine_grids tike/ptycho.py:204
│ │ │ └─ 1.616 shift scipy/ndimage/interpolation.py:474
x │ │ ├─ 1.768 angle numpy/lib/function_base.py:1914
│ │ ├─ 1.688 uncombine_grids tike/ptycho.py:274
│ │ │ └─ 1.535 shift scipy/ndimage/interpolation.py:474
│ │ ├─ 1.069 ifft2 numpy/fft_intel/fftpack.py:880
│ │ │ └─ 1.069 ifftn numpy/fft_intel/fftpack.py:692
│ │ ├─ 1.062 fft2 numpy/fft_intel/fftpack.py:791
│ │ │ └─ 1.062 fftn numpy/fft_intel/fftpack.py:594
x │ │ └─ 0.956 pad numpy/lib/arraypad.py:1094
│ │ ├─ 0.525 _append_const numpy/lib/arraypad.py:111
│ │ └─ 0.371 _prepend_const numpy/lib/arraypad.py:77
│ └─ 0.798 reconstruct tike/tomo.py:161
│ └─ 0.793 recon tomopy/recon/algorithm.py:75
│ └─ 0.777 _dist_recon tomopy/recon/algorithm.py:362
│ └─ 0.665 __exit__ concurrent/futures/_base.py:580
│ └─ 0.665 shutdown concurrent/futures/thread.py:133
│ └─ 0.665 join threading.py:1022
│ └─ 0.665 _wait_for_tstate_lock threading.py:1060
tike.forward
with tomopy.forward
np.angle
with np.arctan2
np.pad
with tike.fast_pad
There are still 6 seconds of time inside tike.ptycho.grad
which need to be investigated line-by-line.
Recently I've been using more and more compact batch + momentum accelerations for object and probe updates during the LSQ-ML reconstruction. For some datasets, this combo can significantly improve reconstruction quality.
Add a utility function which computes the probe power from an array of probe modes.
theta, h, v
coordinates convention should be changed to theta, v, h
coordinates because of the following:
h
is a slower axis than v
tomopy
input arraysThis change will remove the ability of ptycho.reconstruct
to process more than one view at a time, and require adding a for
loop around the call to ptycho.reconstruct
in tike.admm
.
This is because our language for describing the probe geometry is not unified between the tomography and ptychography problems. Each is described by only part of the parameters: theta or (v and h).
I think it is strange that our tomography solver accepts 3D input, but our ptychography solver accepts only 2D input. The mismatch between the two is a constraint that I do not like.
tike
has the potential for having a lot of optional dependencies, so instead of importing all external solvers at the module level, let's do lazy imports, where they are only imported if actually needed.
import numpy # importing here requires it to be installed even if never used
def recon(x, algorithm='default'):
"""Reconstruct some stuff."""
if algorithm is 'default':
return numpy.default_method(x)
elif algorithm is 'optional':
from option import optional_method
return option.optional_method(x) # this method is only imported if chosen
We want to reduce the number of parameters for outward facing functions or maybe all functions in general by packing the long lists of parameters into topical dictionaries. For examples:
Parameters theta
, v
, h
could become trajectory
by defining
trajectory = {
'theta' : None,
'v' : None,
'h' : None,
}
This kind of packing will definitely be employed for the reconstruction parameters of the tomo.reconstruct
and pytcho.reconstruct
because those need to be passed through tike.admm
.
Allow setting the random seed for repeatability.
1 Need to add more functionality in ptychopy.c.
2 The new calling sequence would be different from the original one, keeping the original one to keep the ability for doing reconstruction from passing a file path.Ex, ePIE_File(), ePIE_Array();
3 Two sets of API, one set is for step mode to check the middle results. Another set could be for the whole reconstruction process.
Enforce the use of 32bit data types in order to make the creation plugins and extensions easier. Also, reduce the the complexity of tike.utils; I think there is some redundancy that could be removed.
Writing functions that specifically CuPy is not necessary anymore after NumPy 1.17 because of the array function protocol. Due to this API feature, NumPy functions that are passed CuPy arrays will automatically call the corresponding CuPy functions instead of trying to convert them to NumPy arrays.
https://numpy.org/neps/nep-0018-array-function-protocol.html
The number of batches, is a much more natural way of expressing how to divide something in the context of multiple-devices. Given the size of a batch, you have to do some math to divide the size between all the devices.
However, the real problem is that these two methods of expressing the same thing: num_batch vs batch_size are both used which is confusing for users.
This issue is to discuss if we need a version control on Notebooks in the repository.
The Notebooks are meant to be interactive, which means we encourage users to play with them. So it is a question if we need versioning for these files? I'd rather have static Python scripts for tutorials that devs have control over and optimize in time, and have ipynb extensions listed in the .gitignore file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.