When I provide images in a loop without any delay, the processing time for yolov7-face or yolov8-face is short. However, when I feed the images to the detection function one by one, introducing a 1-second time interval between each photo, the processing time becomes longer. What might be causing this issue?
Here are the processing times for images in a loop:
../images//test1.jpg average: 389.05ms
../images//test3.jpg average: 134.054ms
../images//cam4.jpg average: 104.824ms
../images//test11.jpg average: 93.1855ms
../images//test7.jpg average: 86.4966ms
../images//test8.jpg average: 85.9823ms
../images//arac2.jpg average: 67.5789ms
../images//arac3.jpg average: 69.3688ms
../images//arac4.jpg average: 68.7759ms
../images//test9.jpg average: 75.8391ms
And here are the processing times with 1-second intervals between images:
../images//test1.jpg average: 267.529ms
../images//test3.jpg average: 313.996ms
../images//cam4.jpg average: 159.6ms
../images//test11.jpg average: 315.25ms
../images//test7.jpg average: 296.985ms
../images//test8.jpg average: 237.869ms
../images//arac2.jpg average: 206.976ms
../images//arac3.jpg average: 244.924ms
../images//arac4.jpg average: 185.883ms
../images//test9.jpg average: 239.323ms
Upon analyzing the detect function, I've identified that the following line is taking a long time:
CHECK(cudaMemcpyAsync(decode_ptr_host[i],decode_ptr_device,sizeof(float)(1+MAX_OBJECTSNUM_BOX_ELEMENT),cudaMemcpyDeviceToHost,stream));
What could be the issue and what can be the solution? CudaMemCpy is slower when images are given one by one. How can I solve this?