Santu Hazra: Artificial Intelligence, Deep Learning, Computer Vision, Natural Language Processing, Data Science

Senior Software Engineer - AI/ML

Machani Robotics | July 2024 - Present

- Multilingual Speech Recognition: Developed a real-time speech-to-text pipeline using GStreamer and Whisper CPP, supporting 5 languages (English, Spanish, Italian, German, Portuguese) for diverse user interactions and seamless multilingual communication.
- Whisper Fine-Tuning for STT: Fine-tuned the Whisper Base multilingual model using 40+ hours of Common Voice and LibriSpeech datasets, achieving a 15% reduction in Word Error Rate (WER) across all supported languages.
- LLM Fine-Tuning with Phi3: Fine-tuned the Phi3 (3B) language model with 2000+ domain-specific Q&A pairs generated using GPT-4, improving contextual understanding and reducing hallucination in chatbot responses.
- Retrieval-Augmented Generation (RAG): Designed and implemented a RAG framework combining LLMs with live face and voice input, enabling dynamic, context-aware conversations and enhancing real-time personalization.

SDE-I - AI Engineer

Machani Robotics | February 2021 - June 2024

- Face Recognition Pipeline on Edge: Built a full face recognition pipeline using ArcFace (fine-tuned for Indian faces, 92% accuracy), DeepStream, and ONNX, optimized for low-latency deployment on NVIDIA AGX Jetson.
- Real-Time Vector Search with MILVUS: Created an embedding storage and search system using FAISS and MILVUS, supporting fast and scalable vector-based lookup for face and speaker identity verification.
- Custom Text-to-Speech System: Integrated OpenAI TTS and Cereproc APIs for voice synthesis and led custom TTS model development to support emotion and voice cloning, improving voice clarity and context-awareness by 20%.
- Gesture Generation Using LLMs: Developed a deep learning-based gesture generation module using LLMs, improving the realism of body language and facial animation by 30%.
- Optimized Edge AI Deployment: Engineered lightweight, ONNX-converted AI models with GStreamer pipelines, reducing inference latency for face and voice tasks by 20% on NVIDIA Jetson hardware.

Deep Learning and AI Instructor (Part Time)

AnalytiixLab | April 2022 - Present

- Successfully completed 8 batches and trained over 150 students in Deep Learning and AI basics.
- Also, conducted corporate training with Tredence Inc. and Samsung for the same.

Associate Data Scientist

Cognizant Technology Solution | April 2015 - January 2021

- Developed and implemented advanced analytics solutions to drive customer insights and business strategies across various projects.
- Developed predictive models to identify potential churn customers, enabling a 15% improvement in retention and supporting the creation of targeted promotional strategies.
- Prioritized high-revenue leads using customer acquisition analytics, optimizing marketing resources and boosting acquisition efficiency by 20%.
- Conducted sentiment analysis on 10,000+ consumer reviews, delivering actionable insights that directly influenced product and marketing strategies.
- Implemented machine learning models to classify driver behavior from 2D dashcam images, improving safety outcomes and reducing incident detection time by 25%.
- Collaborated with cross-functional teams to deliver scalable, data-driven solutions and effectively presented analytical findings to key stakeholders.

Hi, I'm

Santu Hazra

About

About Me

Senior Software Engineer - AI/ML

Name: Santu Hazra

Birthday: 8 March 1992

Degree: B.Tech

Experience: 10+ Years

Phone: +91 740-663-9000

Email: ec.santuh@gmail.com

Expericence

Expericence

My Expericence

Senior Software Engineer - AI/ML

SDE-I - AI Engineer

Deep Learning and AI Instructor (Part Time)

Associate Data Scientist

Education

Education

My Education

B.Tech In ECE

Higher Secondary

Secondary

Certifications

Skills

My Skills

Python

90%

Pytorch

90%

RAG

85%

Transformer & LLM

90%

C++

65%

MLOps

85%

Deep Learning

95%

NLP & Speech Processing

90%

Stable Diffusion

85%

Multi Modal Models (CLIP, GPT4 etc)

85%

ReinforceMent Learning

80%

Machine Learning

95%

Projects

My Portfolio

Image Classification

NOTE: Since models are hosted in AWS lambda it will take approximately 60 seconds to start AWS for first time in each model.

Facial Application

Pose Application

Blog

Latest Blog

Integrating YOLOv11 with NVIDIA DeepStream for Real-time Object Detection

Live Speech-to-Text with Distil-Whisper and PyTorch

Deploying a ResNet Model to AWS Lambda: A Step-by-Step Guide

Interests

Interests

Contact

Contact Me