anilsathyan7 / portrait-segmentation Goto Github PK

Real-time portrait segmentation for mobile devices

License: MIT License

Python 7.64% Jupyter Notebook 82.16% HTML 0.12% JavaScript 0.41% Java 8.68% RenderScript 0.23% Starlark 0.21% C++ 0.56%

portrait-segmentation keras tensorflow android deeplearning tensorflow-lite portrait-matting mobilenetv2 unet-image-segmentation tensorflowjs

portrait-segmentation's People

Contributors

Stargazers

Watchers

Forkers

lzb863 templeblock leonzfa amirgoren syedrz zhaoyk1986 amirstudy barbecacov sonfire186 fafancier samsgates tramper2 undercontroller qubit56 yuxuanling rogierhofboer balachandarsv collonville ml-and-ai-repo ericustc mpgprofe magsail eddieback danigunawan yesha19 hongjunsong1 avatarworld xueluli linecode sadam1195 gzzgz trendingtechnology thinhnggia insad umeshsati54 hjguyhan heirish yin-zhang marsvard kshakti20 saikirandornipati xiangliu886 aideveloper-oz elton-lgb xuebinqin manishs86 yippeesoft haderalim tobechao bhargavaram1997 shioriarie ai-machine-vision-lab hongchow ashishpatel26 553566286 dovedx parthdesai026 yawei666 timurguseynov huyuejingling dontcryme wsy1991 helloyokoy epoc88 cvtuge xingweike gangwang2020 pandinosaurus pdkyll coselk tamwaiban tuanha1305 steveseguin wangjinzhou kei9327 soumyadeepdey hensonwells miaochenguo dev233 kyhoolee xinjiang1994 aiotnetx saksham20aggarwal craterone liuqinglong110 ksarpotdar nmnghjss jeongiin parthibanba yutingyao galaxy3135 maksymatnimagna rushi-the-neural-arch marouaneman sunkian mhdkat liuguoyou concurs-program widemeadows robaato

portrait-segmentation's Issues

Dataset used for SlimNet-512 model

Which dataset has been used for SLimNet-512 model? The AISegment dataset present o Kaggle seems to have a different directory structure and needs preprocessing. Can you share the dataset directly used by the colab file for training SlimNet-512?

SINet couldn't convert to tensorflow as frozen graph

According to the research on the Internet, still failed:
Pytorch -> ONNX -> Karas model (Failed)
Pytorch -> ONNX -> tensorflow pb (Got pb folder, but failed, couldn't load)

Can anyone help me out? Appreciate!

An issue on training slim-net model

Hello
How are you?
Thanks for contributing this project.
I tried to train a model with slim-net on AISegment dataset but met the following issue.

2 226 229 232 235 236 234 236 240 242 243 252 236 199 163 199 225 200 133 148 189 190 221 222 236 237 238 239 240 241 241 239 240 241 242 242 242 241 242 242 241 240 239 238 238 235 237 232 238 246 234 211 183 158 143 173 189 180 215 220 241 239 239 247 241 237 238 239 241 241 240 239 240 238 237 237 237 235 234 234 234 233 232 230 229 228 227 228 228 228 227 226 225 223 223 223 223 223 223 226 227 229 231 232 229 228 219 223 195 196 211 217 207 195 199 206 199 183 172 169 198 210 199 180 182 186 206 201 200 230 233 236 239 196 154 131 120 144 174 165 167 167 168 169 171 175 175 178 177 176 176 177 179 180 180 182 183 185 186 190 192 199 226 208 208 200 186 193 197 199 199 199 198 199 206 205 202 200 200 201 204 205 206 210 212 213 214 215 218 218 220 222 224 226 228 230 230 231 231 231 232 232 231 235 236 235 232 228 226 229 231 231 230 228 228 225 226 227 228 230 231 233 230 230 228 228 228 230 230 225 226 229 229 228 227 225 223 225 176 151 141 138 134 140 137 139 144 148 150 152 164 169 174 176 180 185 201 202 206 210 215 220 222 223 223 224 225 226 227 227 228 228 228 228 228 228 228 228 228 228 229 230 231 232 230 228 228 228 228 228
[[{{node loss/conv2d_transpose_4_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

I think that this may be relative to label value range.
I used the entire AISegment dataset.
A mask image in AISegment dataset is a PNG format with 4 channels.
How should I decode this mask image in dataloader?
Should I load this mask image as grayscale?
I found an strange part to load mask image in your slim512.ipynb.

Thanks

Issue on Applying Custom Model on Android

Hi there. I would say good repo you open-sourced here!
Just want to ask why the app crashes when I used another model?
It works only with your deconv_fin_munet.tflite

The 'another model' that I mentioned above accepts input frame with size 224 x 224.
So I change the below declaration (in ImageSegmentorFloatMobileUnet.java) to 224:

int opsize = 224
getImageSizeX() {return 224;}
getImageSizeY() {return 224;}

It successfully built with few warnings. When I click the app on my device, it crash.

Your response regarding this is so much appreciated. Thank you!

Mask overlay calculator CPU

Hi,
I am trying to run the model on CPU, I wrote a custom calculator, which simply multiplies input with mask and background with bitwise_not of mask, But the output I am getting is not good.

But, when I build the model for Android and use the mask overlay calculator, The result is much better.
Help would mean a lot.
Thanks

Training dataset for the deconv_bnoptimized_munet.h5 model?

Just want to premise by saying that this is excellent work @anilsathyan7 .

However, after reading the README, I am still a bit confused about what dataset was used for training the deconv_bnoptimized_munet.h5 model?

I am guessing it was the training data-set at https://drive.google.com/file/d/1UBLzvcqvt_fin9Y-48I_-lWQYfYpt_6J/view?usp=sharing that you have in the README.

But what datasets the img_uint8.npy and msk_uint8.npy files based on, is it the augmented PFCN or AI Segment or the PASCAL VOC Person?

Thanks!

About the android project

Could you please provide the android project, thanks a lot.

dataset

SlimNet paper

Hi,
Is there a paper for SlimNet or an arxiv article?
Could not find any reference for this model

Thank you

MODNet input and output sizes

I've downloaded the mlmodel in https://github.com/anilsathyan7/Portrait-Segmentation/tree/master/MODNet and noticed that the input size expects 512x512 but the ouput is 1252 MultiArray (Float32).
How would I use the output to create a 2D mask of the original image?

making mask of the pictures

Hi, @anilsathyan7 ,
i'm trying to train with 224*224 images of small dataset, can you please help me in creating masks, of the pictures that i have? i've been to the deep lab demo.ipynb in the acknowledgements that you mentioned, i'm kind of in a grey area of what mask images are needed to be taken and considered for conversion, along with the source image.

How to get the most accurate result

Hello, when I test your test.py with models/mnv3_seg/munet_mnv3_wm10.h5 and models/mnv3_seg/munet_mnv3_wm10.h5, I find the result is not very good. What model and parameters I should set to get the accurate result? Can you give some advises?

Thanks.

How to record video after did a background removal in android app?

Video models, two outputs?

Hi,

I cannot figure out why the video models published has two different outputs, why is that?

Thanks

Difference in mobilenet_v3.py

Hi,

I noticed that you're using a custom mobilenet_v3.py rather than the default one from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/applications/mobilenet_v3.py

I compared the 2 and the only meaningful difference is on this line: https://github.com/anilsathyan7/Portrait-Segmentation/blob/master/MobilenetV3/mobilenet_v3.py#L258

Could you explain why your version has this line commented out?

About training set used in portrait-net for video

Congrats on the awesome work done and thanks for sharing.
I wanna train a portrait-net for video.
The performance is not as good as the usually semantic segmentation net whose number of input channel is 3.
Could you tell me how many empty previous mask and augmented previous mask in your training set?
Thanks.

Slimnet pre-trained model questions

Hi! And thanks for your amazing source of DL information. I'm exploring the repository and the pre-trained 'slim_reshape.tflite' model.

In my python script I'm feeding input of 512x512x3 RGB-frame but I can't seem to get any sensible result out from the model. I'm suspecting that my input tensor data array is ill-formatted, or I'm interpreting the output tensor wrong (alpha mask?).

Thanks for any tips!

PortraitNet Model

@anilsathyan7 thanks for the great repo.
Question PortraitNet model that you have used in this repo. Did you train the model from scratch or directly took the author's pre-trained model. I'm trying to train from the scratch so far unsuccessful, the training loss doesn't decrease after a few hundred epochs. Any suggestions on this would be useful.

Need help to change input shape.

Thanks for this awesome repository! Thanks for your great researching!

I passed training for slim512 net, and got what I want. My question is:

How can I change the input tensor shape to [256, 256, 3]?
I changed the Input to [256, 256,3], but failed, I got error:
ValueError: Operands could not be broadcast together with shapes (32, 32, 128) (16, 16, 128),
so it is not easy only to change Input shapes and image_size.
"The inputs are initially downsampled from a size of 512 to 128 (i,e 1/4'th)", If changed input tensor shape to [256,256,3], can save reference time with same accuracy? or Not?
I want to keep the current accuracy, and save more reference time, how can I improve the slim-net? change input shape can do it?

Thanks again for your great works!

Manipulating Camera Frames

Hi
Thanks for the great tutorial and Appreciate your effort
I came across a requiremnt in my sample app to change the background of the camera frames like when a person is doing a video call .i need to manipulate the frame and change the background of the person(bokhe/image) before it is rendering to a textureview
i use following method to achieve the frames

public Mat onCameraFrame(CameraBridgeViewBase.CvCameraViewFrame inputFrame) {
}
I had gone through your tutorial and didntt find the frame manipulation and all.I am a beginner to image processing.Can you suggest some directions to achieve the functionality
Thanks in Advance

Increase segmentation accuracy

Hi, thank you very much for your previous help. How can I significantly increase segmentation accuracy of the slim512 network, speed is not important in principle, I need a very high segmentation accuracy and the ability to convert the model to TFLite.

SlimNet training

Hello, how did you train SlimNet with GTX 1080 Ti ?, I tried to train the network in Google Colab (slim512.ipynb), but 13 GB GPU memory is not enough, the environment crashes.

Regarding each model

Hi,
Thank you so much for uploading such a wonderful work.

I am getting a little confused as to which refers to which model?

I have the following questions

Is the model described in train.py the same as bilinear_fin_munet.h5 ? How did you get the .tflite model from this? When I export the train.py model I am getting a different architecture.
Which model does Model 1 , Model 2 and Model 3 refer to? Can you tell their corresponding saved models ?

Thank you

deconv_fin_munet.tflite

Which network do I have to train to get deconv_fin_munet.tflite as the final file?

The train.py (and subsequent steps mentioned in README) gives bilinear_fin_munet.tflite as output.
Do I need to do any other modification to this file to get deconv_fin_munet.tflite

SlimNet

Hi, i just wonder where can l get the training codes for SlimNet as j want to train the SlimNet by myself.

Architecture for Human Segmentation?

Thanks for the amazing library!

I am looking to implement high-quality semantic segmentation on a mobile device for human cutout (full body).

Architecture?

What architecture/encoder would be a good choice for the task at hand? MobileNetV2, MobileNetV3, DeeplabV3+, ShuffleNet, PortraitNet, SINet.... There are so many, its confusing....
https://github.com/qubvel/segmentation_models.pytorch

I wanted highest-acccuracy, rather than smallest or fastest

Objects held by Person?

In the final output mask, how can I even get the objects that a person is holding, say a cup, a purse, a tennis racquet, a balloon, a toy, a magazine. It could be just about anything.

I am very much perplexed with this problem.

For training of human segmentation, I was planning to use the Supervisely Person dataset. If I am not mistaken, the Supervisely dataset doesn't contain masks for objects that the person might be holding. To achieve this, would a dataset like Supervisely be unfit for the job? Or we need to train on a dataset with more labels than just "person"?

But ideally, if an object is lying on the side, it is ok if it does not come in the mask. But if the person is holding the object, it should definitely come in the final mask.

How can this be achieved?

Thanks!

Removing background for 480p (wide) video instead of square

Hi,

It seems that all the models are currently receiving square images as the input. In order to make them work with wider ones (for example 320x240) would I need to retrain them on images of that size or is there another way to adapt them to the different size?

Thanks!

Cannot run the Segme_v2 application

Hi,
I am building the segme_v2 android application using Android studio. The Build is successful , but on running the apk file, I am getting an internal error:Failed to apply Delegate. Following is the whole error log I am getting.

java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: TfLiteGpuDelegate Init: New object definition is not supported.
TfLiteGpuDelegate Prepare: delegate is not initialized
Node number 94 (TfLiteGpuDelegateV2) failed to prepare.

Restored previous execution plan after delegate application failure.
 at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegate(Native Method)
 at org.tensorflow.lite.NativeInterpreterWrapper.init(NativeInterpreterWrapper.java:85)
 at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:61)
 at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:223)
 at com.example.android.tflitecamerademo4.ImageSegmentor.recreateInterpreter(ImageSegmentor.java:147)
 at com.example.android.tflitecamerademo4.ImageSegmentor.useGpu(ImageSegmentor.java:156)
 at com.example.android.tflitecamerademo4.Camera2BasicFragment.lambda$updateActiveModel$0$Camera2BasicFragment(Camera2BasicFragment.java:457)
 at com.example.android.tflitecamerademo4.-$$Lambda$Camera2BasicFragment$eoo74V6qLXRuzvkCmA1gEfC0acM.run(Unknown Source:8)
 at android.os.Handler.handleCallback(Handler.java:873)
 at android.os.Handler.dispatchMessage(Handler.java:99)
 at android.os.Looper.loop(Looper.java:201)
 at android.os.HandlerThread.run(HandlerThread.java:65)

I am not even able to understand this error. Any help would be hugely appreciated. Thanks for uploading such a brilliant work.

How are the SINet.onnx and SINet_Softmax.onnx files created?

I am working on model conversion for different platforms using your repository. This exact SINet.onnx version works (and SINet_Softmax.onnx also works!). The SINet_ONNX.ipynb script exports onnx which I am not able to convert.

How exactly did you create SINet.onnx and SINet_Softmax.onnx?

about Boundary loss & Consistency Constraint Loss

Thanks for your work!
I wonder if the bodundary loss & consistency constraint loss are included in the code? if not, could you please share more details to help me reproduce that?

Can't build Android

i can't not fix error

Build file '/Users/sondx/Desk/FirstTime/demo_android/Portrait-Segmentation-master/android/SegMe_V1/app/build.gradle' line: 50
A problem occurred evaluating project ':app'.

Cannot invoke method apply() on null object

when i try to build android app

Portrait Segmentation android app Integrate with React Native

Hello, I want to integrate Portrait Segmentation android app with React Native. And I find out React Native using ViewManager as a view and Portrait Segmentation android app using Fragment. I am really struck with that, Do you have any idea to integrate with it?

Thanks.

Training with mobilenetv3-unet

Hello
How are you?
I trained a new model with Slim-net on AISegment dataset successfully by your help.
The accuracy of the model is high but the inference time is a little slow.
I am going to train a new model with mobilenetv3-unet architecture.
But I found a strange part in your script for MobileNetv3 network.

The number of channels is 4 rather than 3.
So I changed this channel value to 3.
Also I used the DataLoader class in slim512.ipynb.
I used a mask image that has pixel value 0 or 255.
But While training, the training loss value is a negative value.
So I used the same mask images(0 or 1 pixel value) in training with Slim-Net.
The training was done successfully but the accuracy of the model is low.
How should I understand all of these facts?
Thanks

Android build error

after i clone the folder of seg_v2 and wanna build it by gradle on android studio, i meet the erro that:

`Execution failed for task ':app:compileDebugRenderscript'.

com.android.ide.common.process.ProcessException: Error while executing process /home/zpc/Android/Sdk/build-tools/29.0.2/llvm-rs-cc with arguments {-I /home/zpc/Android/Sdk/build-tools/29.0.2/renderscript/include/ -I /home/zpc/Android/Sdk/build-tools/29.0.2/renderscript/clang-include/ -rs-package-name=android.support.v8.renderscript -p /home/zpc/Documents/Portrait-Segmentation-master/android/SegMe_V2/app/build/generated/renderscript_source_output_dir/debug/out -target-api 24 /home/zpc/Documents/Portrait-Segmentation-master/android/SegMe_V2/app/src/main/rs/saturation.rs -O 3 -o /home/zpc/Documents/Portrait-Segmentation-master/android/SegMe_V2/app/build/generated/res/rs/debug/raw}

it seems that caused by the saturation.rs, but I cannot figure it out.
Any solutions? pls!

How to assign segmentation results into a Bitmap

I am new machine learning, and your project is very good for learning. I'm experimenting with your code.
In the Android demo, in the function ImageSegmentorFloatMobileUnet.imageblend(...), you assign the segmentation results (segmap) to mskmat: mskmat.put(0,0,segmap[0]);
I am doing test without using OpenCV, so, how can assign the segmap result directly into a normal android Bitmap? I don't want to use OpenCV and Mat.
Thanks

Dataset combine

Hi!

I tried combining some data sets on the readme after I used Mobilenetv2-unet model training, but when I checked the results were worse than the model's origin, I thought the problem of matching data
Can you share dataset combine ?
thanks

Train the model on another datasets

thanks for sharing,if I want to train the model on another datasets,how should I prepare for the dataset?

for example,I have an original image apple.jpg that includes an apple,and another image mask.png that includes the apple's mask.

how should I convert the dataset?thanks in advance
oh,there is ! https://github.com/anilsathyan7/Portrait-Segmentation/blob/master/utils/data.py

SavedModel checkpoints of Slimnet

Hi, It would be great if you could share the SavedModel format checkpoints of Slimnet.
I want to try and test the quantised model.

Thank you

TFJS memory leak

Running inference in the browser leads to a memory leak issue, as GPU allocated memory rises continuously.

temporal code

Well job,
The temporal code is not at here, right?

dataset

Hi, i just want to know, how mant pics are used in your training stage?

Transpose_seg (deconv) training

Hi, I'm interested in training the model models/transpose_seg/deconv_bnoptimized_munet.h5.

I've seen the model has a good ratio resources consumption vs output, but still I'd like to improve the results a little bit if possible.

However I don't see in the source code should I do it. I understand that train.py and portrait_segmentation.ipynb are for the bilinear model, and portrait_segmentation_v3.ipynb for the mnv3_seg.

How could I generate/train the deconv one?

Thanks

Could you add a lisence file?

I want to use your great job, it will be better if you add a lisence file~
@anilsathyan7

Dataset

Excuse me, I cannot download the dataset even if I use vpn. What should I do? Thank you.

Question about aspect ratio of the input images

Hi,

I'm not entirely sure how the MobilenetV3 models handle input images of different aspect ratio. For example, possible inputs can be of 4:3, 16:9, 3:4 or 9:16.

Does the model work equally well under these different aspect ratios? Or should the image be padded to be squared before sent to the inference (so that the actual content is not distorted due to resizing).

Thanks in advance for any clarification.

prisma_segmentation dataset

Hi, where can i get dataset for prisma_segmentation notebook?

Deeplab training & inference normalization

Hi, first of all great work!

I would like to try to retrain the deeplab_nchw.onnx model with more images to try to improve its accuracy. I understand that ´train.py´ generates the Model Type - 1 (billinear) but, how to train the one based on deeplab?

Apart, on inference, should the inputs of deeplab_nchw.onnx be normalized? If so, should I use the same parameters imgs - np.array([0.50693673, 0.47721124, 0.44640532])) /np.array([0.28926975, 0.27801928, 0.28596011]?

Thanks

An Issue when loading SlimNet model on Mobile GPU

Hello
How are you?
Thanks for contributing to this project.
I trained a model with SlimNet architecture on my dataset and got a TFlite model.
This TFlite model works well on the CPU.
I am going to use this TFLite model on mobile GPU.
When loading this model on Android GPU, I got the following issue.

Caused by: java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.

So I searched for any solution but I did NOT find a correct solution yet.
tensorflow/tensorflow#38036

Could u help me?

how to train sinet?

thank you for your contribution. can you upload your sinet train code or would you tell me how to train SINet

PortraitNet Quant Inference

@anilsathyan7
Thanks for such a good collection of network repo.

I was trying to do inference on Quantized Portrait_video and getting a wrong results.

model file : models->portrait_video->portrait_video_quant.tflite
when checked the input and output details

_input: [{'name': 'input.1', 'index': 1, 'shape': array([ 1, 224, 224, 4], dtype=int32), 'shape_signature': array([ 1, 224, 224, 4], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (0.0183159988373518, -13), 'quantization_parameters': {'scales': array([0.018316], dtype=float32), 'zero_points': array([-13], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

output: [{'name': 'Identity', 'index': 0, 'shape': array([ 1, 50176], dtype=int32), 'shape_signature': array([ 1, 50176], dtype=int32), 'dtype': <class 'numpy.int8'>, 'quantization': (0.00390625, -128), 'quantization_parameters': {'scales': array([0.00390625], dtype=float32), 'zero_points': array([-128], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]_

It needs INT8 images. So i converted the image (after resize and normalize) to INT8. changes made to portrait_video.py

    ###Add prior as fourth channel
    image=np.dstack([image,prior])
    prepimg = image[np.newaxis, :, :, :]
    print('prepimg type :', prepimg.dtype)
    prepimg = np.array(prepimg, dtype=np.int8)
    print('after prepimg type :', prepimg.dtype)
    print(prepimg.shape)

when i input this image to the network i'm getting some random noise (garbage output).

Here is my question do we need to normalize the image using quantization parameters before we feed it to the network.
Highly appreciate if you can suggest on preprocessing required on input image.

anilsathyan7 / portrait-segmentation Goto Github PK

portrait-segmentation's People

Contributors

Stargazers

Watchers

Forkers

portrait-segmentation's Issues

Recommend Projects

Recommend Topics

Recommend Org