Hi, I'm

Santu Hazra

Deep Learning Engineer, AI Developer, Computer Vision Scientist, NLP Engineer, Data Scientist

About

About Me

My about Image

Senior Software Engineer - AI/ML

I’m an AI/ML Engineer with 10+ years of experience building intelligent, real-time systems across computer vision, speech processing, and large language models. Currently, I lead AI initiatives at Machani Robotics, where I’ve developed end-to-end pipelines for humanoid robots—from voice transcription in 5 languages to RAG-based contextual conversations and gesture generation using LLMs.

I specialize in fine-tuning models like Phi3, ArcFace, and Whisper, and deploying them efficiently on edge devices like NVIDIA AGX Jetson using GStreamer and ONNX—reducing latency and enhancing real-time performance. I love the challenge of making AI feel human, whether it’s through voice, facial recognition, or expressive body movements.

Name: Santu Hazra
Birthday: 8 March 1992
Degree: B.Tech
Experience: 10+ Years

Expericence

Expericence

My Expericence

Senior Software Engineer - AI/ML

Machani Robotics | July 2024 - Present

    • Multilingual Speech Recognition: Developed a real-time speech-to-text pipeline using GStreamer and Whisper CPP, supporting 5 languages (English, Spanish, Italian, German, Portuguese) for diverse user interactions and seamless multilingual communication.

    • Whisper Fine-Tuning for STT: Fine-tuned the Whisper Base multilingual model using 40+ hours of Common Voice and LibriSpeech datasets, achieving a 15% reduction in Word Error Rate (WER) across all supported languages.

    • LLM Fine-Tuning with Phi3: Fine-tuned the Phi3 (3B) language model with 2000+ domain-specific Q&A pairs generated using GPT-4, improving contextual understanding and reducing hallucination in chatbot responses.

    • Retrieval-Augmented Generation (RAG): Designed and implemented a RAG framework combining LLMs with live face and voice input, enabling dynamic, context-aware conversations and enhancing real-time personalization.

SDE-I - AI Engineer

Machani Robotics | February 2021 - June 2024

    • Face Recognition Pipeline on Edge: Built a full face recognition pipeline using ArcFace (fine-tuned for Indian faces, 92% accuracy), DeepStream, and ONNX, optimized for low-latency deployment on NVIDIA AGX Jetson.

    • Real-Time Vector Search with MILVUS: Created an embedding storage and search system using FAISS and MILVUS, supporting fast and scalable vector-based lookup for face and speaker identity verification.

    • Custom Text-to-Speech System: Integrated OpenAI TTS and Cereproc APIs for voice synthesis and led custom TTS model development to support emotion and voice cloning, improving voice clarity and context-awareness by 20%.

    • Gesture Generation Using LLMs: Developed a deep learning-based gesture generation module using LLMs, improving the realism of body language and facial animation by 30%.

    • Optimized Edge AI Deployment: Engineered lightweight, ONNX-converted AI models with GStreamer pipelines, reducing inference latency for face and voice tasks by 20% on NVIDIA Jetson hardware.

Deep Learning and AI Instructor (Part Time)

AnalytiixLab | April 2022 - Present

    • Successfully completed 8 batches and trained over 150 students in Deep Learning and AI basics.

    • Also, conducted corporate training with Tredence Inc. and Samsung for the same.

Associate Data Scientist

Cognizant Technology Solution | April 2015 - January 2021

    • Developed and implemented advanced analytics solutions to drive customer insights and business strategies across various projects.

    • Developed predictive models to identify potential churn customers, enabling a 15% improvement in retention and supporting the creation of targeted promotional strategies.

    • Prioritized high-revenue leads using customer acquisition analytics, optimizing marketing resources and boosting acquisition efficiency by 20%.

    • Conducted sentiment analysis on 10,000+ consumer reviews, delivering actionable insights that directly influenced product and marketing strategies.

    • Implemented machine learning models to classify driver behavior from 2D dashcam images, improving safety outcomes and reducing incident detection time by 25%.

    • Collaborated with cross-functional teams to deliver scalable, data-driven solutions and effectively presented analytical findings to key stakeholders.

Education

Education

My Education

B.Tech In ECE

West Bengal University of technology | August 2010 - July 2014

Higher Secondary

West Bengal Board of Higher Secondary Education | August 2007 - July 2009

Secondary

West Bengal Board of Secondary Education | July 2007

Certifications

Skills

My Skills

Python
90%
Pytorch
90%
RAG
85%
Transformer & LLM
90%
C++
65%
MLOps
85%
Deep Learning
95%
NLP & Speech Processing
90%
Stable Diffusion
85%
Multi Modal Models (CLIP, GPT4 etc)
85%
ReinforceMent Learning
80%
Machine Learning
95%

Projects

My Portfolio

Blog

Latest Blog

Interests

Interests

trekking image icon skiing image icon swiming image icon surfing image icon

Contact

Contact Me

©santuh. All Rights Reserved. Designed by Santu Hazra