Galib Mahmud Jim

I am a Machine Learning Engineer with a research background in applied ML and data-driven system design. My work has spanned predictive modeling, clinical ML, and automated data pipelines, with an emphasis on grounding research in production constraints. Currently, I am working on ML pipelines for construction and architectural data, with a growing focus on LLM integration and domain-specific adaptation.

Education

2020 – 2025

University of Dhaka

BSc in Computer Science and Engineering

Dhaka, Bangladesh

Research

ACL ARR 2026May 2026Under Review

5-DIALECTS-BN: Unmasking the Impact of Transliteration on Bangla Dialectal LLMs

Md Mahir Jawad, Galib Mahmud Jim, Rafid Ahmed, Mir Sazzat Hossain, Md Fahim, Md Farhad Alam Bhuiyan

Large Language Models (LLMs) have achieved remarkable progress across natural language processing (NLP) tasks, yet their capabilities degrade sharply for low-resource languages and dialectally diverse settings. Bangla, the world's sixth most spoken language, exemplifies this gap: existing resources overwhelmingly target Standard Bangla, leaving its regional dialects without the benchmarks needed to develop or evaluate dialect-aware systems. We address this gap with 5-Dialects-BN, the first multi-faceted benchmark dataset for Bangla dialectal NLP. The dataset comprises 6,000 manually annotated entries spanning five major dialects: Chittagong, Barisal, Noakhali, Sylhet, and Rangpur, enriched with Romanized transliterations, English and Standard Bangla translations, and subjectivity labels. The resource supports dialect identification, dialect-to-standard normalization, machine translation, subjectivity classification, and parameter-efficient fine-tuning of multilingual LLMs.

Bangla NLPDialectal LLMsLow-resource NLPBenchmark DatasetTransliteration

2025 – PresentIn Progress

Assessing Ensemble Techniques for Diabetes Prediction and Their Application in Symptom-Based Diagnosis

This study aims to develop a predictive model that estimates the risk of diabetes in an individual before any laboratory tests are performed, using only noninvasive and easily obtainable information — including age, gender, body measurements, blood pressure, family history, lifestyle factors, and common symptoms. The goal is to optimize screening accuracy and resource use in clinical decision-making prior to laboratory testing, enabling early identification of high-risk individuals in settings with limited laboratory facilities.

Ensemble LearningDiabetes PredictionClinical MLSymptom-Based DiagnosisNoninvasive Screening

Projects

DiaRisk

Diabetes Early Prediction using ML

PythonFastAPIMongoDBDockerFlutter

Developed an ensemble machine learning model to predict Type 2 Diabetes using a dataset from Birdem General Hospital, Bangladesh. Optimized model performance through data preprocessing and hyperparameter tuning. Built an Android application for users to interact with the model and receive diabetes risk assessments.

Code

Save The World BD

Plastic Pollution Awareness Portal

Next.js

A portal developed as part of a research collaborative initiative between the University of Dhaka and East West University, Bangladesh. Raises awareness about plastic pollution and advocates for sustainable solutions.

Code Live

Learning-Hub

LMS for coding enthusiasts

ReactFastAPIMongoDB

A web application for online learning with user and admin functionalities. Users can create and access courses and quizzes, while admins manage content and user accounts.

Code

DropTel

Group expense tracking mobile app

FlutterMongoDB

An innovative expense tracking mobile application offering a user-friendly design and comprehensive functionalities, providing a practical solution for individuals and groups managing finances during various events.

Code

Experience

Nov 2025 – Present

Machine Learning Engineer

Penta Global Ltd.

Dhaka, Bangladesh

Feb 2025 – May 2025

Junior Software Engineer

Data Elysium Software Inc.

Remote

Education

University of Dhaka

Research

5-DIALECTS-BN: Unmasking the Impact of Transliteration on Bangla Dialectal LLMs

Assessing Ensemble Techniques for Diabetes Prediction and Their Application in Symptom-Based Diagnosis

Projects

DiaRisk

Save The World BD

Learning-Hub

DropTel

Experience

Machine Learning Engineer

Junior Software Engineer

Contact