- Visual-Linguistic Fusion: These models understand images and generate human-like captions by combining visual and textual elements. - Security/Threat Classification: Combining the output of Image Captioning and applying NLP to classify image based on the caption on whether it's threat or not is one possible prospect in other words video/image security classification is one of the future prospect I could think of using Image Classification.
- Data Preparation: Collect paired image and text data, preprocess images, and tokenize captions.
- Feature Extraction: Use ResNet-50 to extract image features.
- Sequence Model: Employ an LSTM-based decoder for caption generation.
- Training: Train the model with image-caption pairs using a loss function.
- Inference: During inference, initialize the decoder with image features and use a greedy algorithm to generate captions one word at a time.
- Caption Generation: Predict the most likely word at each step based on the current state and previously generated words.
- Completion: Continue generating words until an end token or a preset maximum length is reached, producing the final image caption. - Dev Cloud
- Intel Extension for Tensorflow
- OpenVino
- Tensor Flow DevCloud Utilization: DevCloud provided valuable computational resources for training and inference. It enabled efficient scaling and resource allocation for model training, ensuring timely completion of training tasks.
Intel Extension for TensorFlow: The project made use of the Intel Extension for TensorFlow to optimize TensorFlow-based operations for better performance on Intel hardware. This extension ensured that the model ran efficiently during both training and inference stages, leveraging the hardware acceleration features.
OpenVINO Integration: OpenVINO was instrumental in further optimizing the model for efficient inferencing. By converting the model to OpenVINO's format and leveraging its hardware-specific optimizations, the project achieved faster and more efficient image caption generation on Intel hardware, making the system suitable for real-time applications and deployment on a variety of Intel-powered platforms.
- Addition of an web app using gradio, flask or hugging face space.