Skip to content
View Het415's full-sized avatar

Block or report Het415

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Het415/README.md

Het Prajapati

Data Scientist · ML Engineer · NLP & Agentic AI

LinkedIn GitHub Email Boston


MS Data Science @ Northeastern University. I build end-to-end ML systems; from ETL pipelines and predictive models to production-deployed LLM applications. Focused on retail analytics, agentic AI, and turning messy data into decisions.


Projects

ListingLens — Amazon Seller Intelligence Platform

Multi-stage NLP pipeline processing 250 reviews/product · BERT sentiment scoring · XGBoost return risk classifier (96.5% acc, 0.997 ROC-AUC) · RAG pipeline with FAISS + LLaMA 3 70B · Deployed on Railway + Vercel

Distributed Backtesting Engine — Algorithmic Trading

PySpark parallel framework · 123 strategies × 100 S&P 500 stocks · 12,300 backtests on 303K real market records · 5-step data governance pipeline · 9-panel Plotly BI dashboards

Spotify Breakout Predictor — Viral Music Classification

99.2% accuracy · 0.998 ROC-AUC · temporal + 5-fold cross-validation · TikTok views as dominant predictor (41% importance) · Interactive Streamlit dashboard


Stack

Languages · Python · R · SQL · Java · JavaScript

ML / AI · Scikit-learn · XGBoost · PyTorch · TensorFlow · HuggingFace · LangChain · FAISS · RAG Pipelines

Data Engineering · PySpark · ETL · Data Warehousing · Snowflake · MySQL · PostgreSQL

Deployment · FastAPI · Next.js · Railway · Vercel · AWS (CLF-C02)


Experience

Data Science Intern · Compatible Solutions (Jan 2025 – Jun 2025)

  • Built ETL pipelines processing 100K+ records with data governance for BI reporting
  • Improved predictive model accuracy by 20% over legacy system via feature engineering + hyperparameter tuning
  • Conducted A/B testing and delivered interactive dashboards for stakeholder decision-making

Data Science Intern · Yhills / IIT Hyderabad (Mar 2023 – May 2023)

  • Built ML models for H1N1 vaccine prediction (84% acc) and NYC taxi fare (RMSE $3.20)
  • Engineered 25+ features from temporal, geographic, and demographic data; improved performance by 30%
  • Communicated insights via visual dashboards for non-technical stakeholders

Currently Exploring

  • LLM applications & agentic workflows
  • Scalable ML systems & deep learning
  • Distributed data processing (Spark + cloud)

GitHub Streak

Pinned Loading

  1. listinglens listinglens Public

    AI-powered Amazon seller intelligence platform — BERT sentiment, XGBoost return risk prediction, and RAG chatbot grounded in real customer reviews.

    TypeScript 1

  2. algorithmic-trading-backtest algorithmic-trading-backtest Public

    Distributed trading strategy backtesting with PySpark - 12,300 backtests on 100 stocks

    Jupyter Notebook

  3. Spotify_predictor_enhanced Spotify_predictor_enhanced Public

    A Machine Learning tool to predict upcoming Spotify breakout artists using TikTok and Shazam data. Features a Random Forest backend and a Streamlit frontend.

    Jupyter Notebook