Data Scientist

Sanidhya Karnik

Building ML models, data pipelines, and AI-powered solutions. Turning complex data into actionable insights.

Sanidhya Karnik

Background

I'm a Data Scientist with 3.5+ years of experience building machine learning models, forecasting systems, and analytics solutions across finance, hospitality, and tech.

I hold an MS in Data Analytics Engineering from Northeastern University and a BTech in Mechanical Engineering from IIT Madras. My work spans LLM applications, time-series forecasting, pricing optimization, and conversion analytics.

Currently exploring opportunities in data science and ML engineering where I can build systems that drive real business impact.

Data Scientist Co-op

HarbourVest Partners • 2024
  • Built Prophet-based forecasting models across 10M+ records
  • Engineered Python data pipelines for real-time SIEM integration
  • Designed Power BI dashboards for VP-level decision making

Analytics Consultant

Niwish • 2023
  • Increased lead conversion rates from 3% to 19%
  • Implemented funnel drop analysis for web and mobile
  • Built automated transaction auditing with Python and SQL

Senior Business Analyst

OYO Hotels • 2020–2023
  • Deployed regression-based pricing model across 50K+ storefronts
  • Launched OYO360 pilot generating $12M+ in revenue
  • Designed A/B tests and fraud-prevention algorithms

Featured Projects

01

Agentic Pipeline Repair

An autonomous multi-agent system for data pipeline monitoring and repair using Amazon Nova 2 Lite. Features four specialized agents (Monitor, Diagnostics, Repair, Orchestrator) with real dbt model integration, FastAPI backend, React dashboard, and full AWS deployment with EC2 and RDS PostgreSQL.

Amazon Nova 2 LangGraph AI Agents dbt React FastAPI EC2 RDS

02

MCP Operations Infrastructure

A production-ready Model Context Protocol server for secure LLM interactions. Features token-based authentication, role-based access control (RBAC), PostgreSQL audit trails, and multi-API integrations including Tavily search and weather APIs. Containerized with Docker for easy deployment.

MCP Python Docker PostgreSQL Pydantic RBAC

03

EHR MLOps Pipeline

End-to-end MLOps platform for predicting 30-day hospital readmissions using MIMIC-IV clinical data. Features XGBoost model (AUC: 0.72) with SHAP explainability, dbt data transformations, FastAPI serving with real-time explanations, Airflow orchestration, and production-ready Kubernetes/Terraform infrastructure.

XGBoost SHAP dbt FastAPI Airflow Docker Kubernetes Terraform

04

Intelligent Learning Advisor

An LLM-powered course recommendation system using Agentic RAG architecture. Combines content-based and collaborative filtering with real-time web search to deliver personalized learning paths for students transitioning into data science.

LangChain LangGraph FAISS Python RAG

05

Figurative Language Detection

Fine-tuned transformer models (RoBERTa, BERT, DeBERTa) on the VUA Metaphor Corpus and multi-domain data from Reddit, IMDb, and news sources. Implemented class-weighted loss functions to handle imbalanced datasets for metaphor and irony detection.

RoBERTa BERT DeBERTa NLP Transfer Learning

Skills

AI & Machine Learning

LLMs RAG Pipelines LangChain PyTorch TensorFlow Scikit-learn NLP Reinforcement Learning

Data & Cloud

SQL Python Spark Airflow AWS GCP Azure Docker

Analytics & Visualization

Power BI Tableau Looker BigQuery DBT Alteryx

Get in Touch

Open to collaborating on AI/ML projects or discussing opportunities in data science. Feel free to reach out.