Giter Club home page Giter Club logo

Hi there, I'm Nam šŸ‘‹

I'm a PostDoc with a passion for ML, DL and Computer Vision. I enjoy working on ANR Chamdoc project. Check out some of my work below!

  • šŸ”­ Iā€™m currently working on ANR CHAMdoc project.
  • šŸŒ± Iā€™m currently learning NLP as well as LLMs, RAG.
  • šŸ‘Æ Iā€™m looking to collaborate on NLP as well as Computer Vision problem.
  • šŸ’¬ Ask me about Document Processing, OCR, Denoising, Generative Modeling, ASR, Action Recognition.
  • šŸ“« How to reach me: [email protected]

šŸš€ Projects

Overview

The goal of this project is to develop a comprehensive, automated workflow for analyzing Cham documents, an ancient script with cultural and historical significance. The workflow consists of three primary stages: Image Enhancement, Text Line Segmentation, and Text Line Transliteration (OCR). Each stage is designed to address the unique challenges presented by Cham manuscripts, including their age, script complexity, and the varying quality of preserved documents.

Duration: Sep 2023 - Now

1. Image Retrieval

  • Objective: Retrieval all the similar stamps given input stamp.
  • Solution: Triplet loss with customize miner.

Duration: Feb 2020 - Jul 2023

2. Image Enhancement
  • Objective: Improve the quality of scanned Cham document images.
  • Solution: Pix2Pix model with Multi scale Attention.
3. Text Line Segmentation
  • Objective: Accurately segment text lines from enhanced images.
  • Solution: Seam Carving algorithm with additional cost fucntion
4. Text Line Transliteration (OCR)
  • Objective: Convert segmented Cham text lines into machine-readable text.
  • Solution: Seq2Seq model with Transformer
Final Deliverable

This workflow automates the analysis of Cham documents (inscription and manuscript), enhancing images, segmenting text, and converting it into digital text. This process aids in the preservation and further study of Cham cultural heritage.

Duration: 2020 - 2021

Overview

This project focused on developing an automated system for extracting key information from invoices. The project involved creating a robust pre-processing algorithm to handle rotated invoice images and integrating a micro AI service into a complete information extraction pipeline.

Key Activities

1. Pre-Processing Algorithm Development

  • Objective: Correctly rotate invoice images to standardize orientation for accurate data extraction.
  • Tasks:
    • Developed a pre-processing algorithm to detect and correct the orientation of scanned or photographed invoices.
    • Utilized image processing techniques to identify text and layout patterns that indicate the correct orientation.
    • Integrated the algorithm into the pipeline, ensuring that all invoices are properly aligned before further processing.

2. Micro AI Service Development

  • Objective: Create a micro AI service to handle the extraction of key information from invoices.
  • Tasks:
    • Designed and implemented a microservice using AI models capable of identifying and extracting relevant fields (e.g., invoice number, date, total amount) from invoices.
    • Integrated the micro AI service into the broader pipeline, ensuring seamless data flow and processing.

Tools & Technologies

  • Image Processing: Used for developing the rotation correction algorithm, focusing on text detection and orientation analysis.
  • AI/ML Models: Implemented to recognize and extract key invoice fields, adaptable to different invoice templates and layouts.
  • Microservices Architecture: Designed the AI service as a microservice to enable modular and scalable integration with the extraction pipeline.

Automatic Speech Recognition (ASR) Project

Project: Voice Trigger System for Russian, Spanish, and French

Duration: August 2018 - October 2019

Overview

This project focused on developing a Voice Trigger System tailored for Russian, Spanish, and French languages. The system utilizes Automatic Speech Recognition (ASR) technologies to detect specific voice commands or "triggers." The primary tools used were HTK (Hidden Markov Model Toolkit) and Kaldi, both widely recognized in the speech recognition community.

Key Activities

1. Speech Recognition Investigation

  • Objective: Explore and evaluate ASR methodologies based on HTK and Kaldi tools.
  • Tasks:
    • Conducted a the use of mixing tool between HTK and Kaldi for voice trigger systems.
    • Investigated acoustic and language modeling techniques to optimize recognition accuracy for each target language.

2. Voice Trigger Model Fine-Tuning

  • Objective: Adapt and fine-tune the Voice Trigger models for Russian, Spanish, and French.
  • Tasks:
    • Customized and trained models using language-specific datasets to enhance trigger detection accuracy.
    • Addressed language-specific challenges such as phonetic variability and acoustic differences.

Tools & Technologies

  • HTK (Hidden Markov Model Toolkit): Used for initial ASR model training and evaluation.
  • Kaldi: Employed for advanced modeling, including deep learning-based approaches for speech recognition.
  • Datasets: Language-specific datasets for Russian, Spanish, and French to train and validate the models.

Outcomes

  • Successfully developed and fine-tuned Voice Trigger models for Russian, Spanish, and French languages.
  • Achieved high accuracy in detecting voice triggers across different languages by leveraging the strengths of both HTK and Kaldi.
  • The project laid the groundwork for further advancements in multilingual ASR systems, particularly in voice-activated applications.

šŸ› ļø Technologies & Tools

  • Programming Languages: Python, MATLAB, Javascript, Objective C, Bash
  • Frameworks: Pytorch, Tensorflow, Transformers, Pandas
  • Tools: Flask, HTK, Kaldi

nguyennampfiev's Projects

nmflibrary icon nmflibrary

MATLAB library for non-negative matrix factorization (NMF): Version 1.8.0

reinforcement-learning icon reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.