Image Analysis Workflows, servicing NIAID's "Hedwig" project.
Please see our Spinx Docs for details.
Workflows related to project previously referred to as "Hedwig"
License: BSD 3-Clause "New" or "Revised" License
Image Analysis Workflows, servicing NIAID's "Hedwig" project.
Please see our Spinx Docs for details.
virus checker (lol) is breaking the clean up step - this might be irrelevant after implementation of tmp dirs issue
test latency for file IO via NFS read/write -ing files from RML machines and FL machines.
and tell Seva the IP has changed.
need - QA and Prod accounts -- use my user account for the time being.
Singularity with GPU support is now available on BigSky. I still need to do the user facing documentation. The singularity command is available on all compute and GPU nodes, but not the login nodes (ai-rmlsbatch1/2/3).
Here’s a quick example of how to pull down tensor flow and check that it’s seeing allocation GPUs.
[andrew@ai-rmlsbatch2 ~]$ srun --mem=64G -c 4 singularity pull docker://tensorflow/tensorflow:latest-gpu
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob sha256:35807b77a593c1147d13dc926a91dcc3015616ff7307cc30442c5a8e07546283
Copying blob sha256:1ad5f8a5dfc8f6de8b3c0cfc6cca51ab00cde78ed6a2ff85f7bd21a70a9873c6
Copying blob sha256:896d09e7c91fe5894814fe2a66547b6f646e51c4a06f9dba26d189503059ffb0
Copying blob sha256:3ecacd24229f8af1c730109c559cf633da56416e7a29afc1d8f3b41ec4b253aa
Copying blob sha256:653a1d0c4ca99c7fdfe718718575a7afbadc0f16aad069b7ccb9056a8e73c4cb
Copying blob sha256:941a7e13b559568555600eef3f774e1a6724db55b9a0bde61ff406cdebb5d0f2
Copying blob sha256:64b81251449b595f2141bf378dee9554f08d4fb21cc48e560961dfd24750fc5a
Copying blob sha256:fdc3a7a7495131570921454aad3fa5b2379121e9cef907e8b0894fac11528efd
Copying blob sha256:13a0e6521855f5ab95e498316c6b8a70791320796ca7eaa2bb4d40093ca37864
Copying blob sha256:1ae7fa288f51c2a475eecc3fd8ea623ca38edd7c59aa3c324359fe6c7f311300
Copying blob sha256:d1aa09c367e6b3d470fe5da1ed0546eee0ef3bb96a46a5bdc8ec0612cdbe7b8a
Copying blob sha256:811975ef9c3b83e3ff5e3fc7c4bf5149898824d5ebd503097d23c3239fa603e8
Copying blob sha256:e7f4ebe0a303b165f67f327dced320356f2edc78a4dc30d5f16096b81fc64698
Copying blob sha256:9af2616478318f35e0af3c5a7763a9dc0052528f49dc630016f08577f38f8894
Copying config sha256:0202153ea9c79ca357619a166097e0a67e1846f2bec4fff788df4ffec9f174ca
Writing manifest to image destination
Storing signatures
2021/11/22 15:51:47 info unpack layer: sha256:35807b77a593c1147d13dc926a91dcc3015616ff7307cc30442c5a8e07546283
2021/11/22 15:51:49 info unpack layer: sha256:1ad5f8a5dfc8f6de8b3c0cfc6cca51ab00cde78ed6a2ff85f7bd21a70a9873c6
2021/11/22 15:51:49 info unpack layer: sha256:896d09e7c91fe5894814fe2a66547b6f646e51c4a06f9dba26d189503059ffb0
2021/11/22 15:51:49 info unpack layer: sha256:3ecacd24229f8af1c730109c559cf633da56416e7a29afc1d8f3b41ec4b253aa
2021/11/22 15:51:49 info unpack layer: sha256:653a1d0c4ca99c7fdfe718718575a7afbadc0f16aad069b7ccb9056a8e73c4cb
2021/11/22 15:51:49 info unpack layer: sha256:941a7e13b559568555600eef3f774e1a6724db55b9a0bde61ff406cdebb5d0f2
2021/11/22 15:51:51 warn rootless{usr/lib/x86_64-linux-gnu/gstreamer1.0/gstreamer-1.0/gst-ptp-helper} ignoring (usually) harmless EPERM on setxattr "security.capability"
2021/11/22 15:52:26 info unpack layer: sha256:64b81251449b595f2141bf378dee9554f08d4fb21cc48e560961dfd24750fc5a
2021/11/22 15:52:30 info unpack layer: sha256:fdc3a7a7495131570921454aad3fa5b2379121e9cef907e8b0894fac11528efd
2021/11/22 15:52:30 info unpack layer: sha256:13a0e6521855f5ab95e498316c6b8a70791320796ca7eaa2bb4d40093ca37864
2021/11/22 15:52:31 info unpack layer: sha256:1ae7fa288f51c2a475eecc3fd8ea623ca38edd7c59aa3c324359fe6c7f311300
2021/11/22 15:52:31 info unpack layer: sha256:d1aa09c367e6b3d470fe5da1ed0546eee0ef3bb96a46a5bdc8ec0612cdbe7b8a
2021/11/22 15:52:31 info unpack layer: sha256:811975ef9c3b83e3ff5e3fc7c4bf5149898824d5ebd503097d23c3239fa603e8
2021/11/22 15:52:48 info unpack layer: sha256:e7f4ebe0a303b165f67f327dced320356f2edc78a4dc30d5f16096b81fc64698
2021/11/22 15:52:48 info unpack layer: sha256:9af2616478318f35e0af3c5a7763a9dc0052528f49dc630016f08577f38f8894
INFO: Creating SIF file...
[andrew@ai-rmlsbatch2 ~]$ srun -N 1 -n 1 -c 4 -p gpu --gres=gpu:1 --mem=64G --pty singularity run --nv tensorflow_latest-gpu.sif
___ /________________________________ / /______ __
__ / _ _ _ __ _ ___/ __ _ / / __ / __ _ | /| / /
_ / / / / / /( )/ // / / _ __/ _ / / // /_ |/ |/ /
// ___/// //// _/// // // _/__/|__/
You are running this container as user with ID 2002 and group 2002,
which should map to the ID and group for your user on the Docker host. Great!
Singularity> python
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
2021-11-22 15:57:42.040062: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-22 15:57:47.752082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 30985 MB memory: -> device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7850956048544687292
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 32490651648
locality {
bus_id: 1
links {
}
}
incarnation: 11985504320836316292
physical_device_desc: "device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:3b:00.0, compute capability: 7.0"
xla_global_id: 416903419
]
quit()
Singularity> exit
exit
[andrew@ai-rmlsbatch2 ~]$
It's already under-powered.
can't use existing dirs, as >1 job can be run with same sample_id
eg, Cindy mentioned that she would like to test BRT runs, picking a single tomogram and repeatedly running jobs until some desired result is achieved.
ai-fas12\RMLEMHedwigQA\Projects\RTB\efischer\hansenbry-2021-0317-NIAID-QA\TEM-Tomo-2021-0317-MarchQA\Sept8Test/GZH-001
looks like: https://hedwig2qa.niaid.nih.gov/neuroglancer/UmVjb3JkOjE1OTIwNQ==
eg user "hedwig_bot" need to have read perms (only) to inputs.
write to specific directory
This arg is only ever used to override the imod Header value.
AFAIK they only ever want to use the header version. Remove for now, is confusing. If required, can be added later.
Confirm with RTB
Various files are useful to debug workflows.
If there's an issue anywhere, drop all complete work dir into Assets dir.
annoying to have to log into artifactory to get this module
alternatively find a better way than swapping in and out pip.conf file
file_reader = sitk.ImageFileReader()
file_reader.SetFileName(sys.argv[1])
file_reader.ReadImageInformation()
print(f"image size: {file_reader.GetSize()}")
print(f"image spacing: {file_reader.GetSpacing()}")
import SimpleITK as sitk
img = sitk.ReadImage("PrP-Protein.004.tif")
mm = sitk.MinimumMaximumImageFilter()
mm.Execute(img)
ss = sitk.ShiftScaleImageFilter()
ss.SetOutputPixelType(sitk.sitkUInt8); ss.SetShift(-mm.GetMinimum()); ss.SetScale(256.0/(mm.GetMaximum()-mm.GetMinimum()))
out = ss.Execute(img)
print(ss)
sitk.WriteImage(out, "sitk_out.tif")
address of server with filesystem I can write to.
eg I get notified of dir - I process inputs, write to my outputs dir, and notify outputs dir.
these labels are hard coded into flow. Figure out a nicer way of doing this (that doesn't have env variables hard coded, or preferably environment variables)
(similar to dm pipeline)
\ai-fas12\RMLEMHedwigQA\Projects\RTB\efischer\hansenbry-2021-0317-NIAID-QA\TEM-2D-2021-0716-Format\Neta_DM4
see: /home/macmenaminpe/Downloads/dm_too_dark.png vs dm_ok.png
sets env and registers flow
job started callback will fail during transient network outages in NIAID hardware, of which there's a lot.
https://gist.github.com/philipmac/fe577a7f896dc62aac2e8a841700bd8e
Retry network connections on all related tasks.
there may be no need for HPC compute for the time being.
If a current generation reasonable server existing within RML then containers etc can be run directly there.
Once logging is working with single file - rm slurm logs
modify settings with Andrew's suggestions:
cores: 8 # Total number of cores per job
memory: "64 GB" # Total amount of memory per job
#processes: 1 # Number of Python processes per job
#extra: []
worker_extra_args: ["--gres=gpu:1"]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.