Saad Yaqine - ML Engineer & Data Scientist

Saad Yaqine

ML Engineer & Data Scientist

I build production-ready AI systems that solve real business problems. Specializing in NLP, LLMs, and end-to-end MLOps pipelines, I've deployed sentiment analysis models for financial trading, automated code review systems, and intelligent document retrieval platforms. From data streaming architecture to production deployment, I deliver measurable results.

End-to-End ML Pipelines
NLP & LLM Engineering
MLOps & Production Systems
Real-Time Data Processing
Cloud & Infrastructure

About Me

I'm a Machine Learning Engineer and Data Scientist with 2 years of experience building AI-driven solutions that go beyond proof-of-concept. My focus is on production-grade systems: models that run reliably in real-world environments, handle scale, and deliver business value.

πŸŽ“ Education

Master's Degree in Computer Science

Polytech Marseille

Artificial Intelligence, Machine Learning, Data Science

πŸ“œ Certifications

MLOps Practitioner & Advanced Designer

Dataiku β€’ 2024

Google Cloud Professional Data Scientist

Google Cloud β€’ In Progress

End-to-end ownership

I don't just train models. I architect data pipelines, containerize services, set up monitoring, and deploy to production. My projects span the full ML lifecycle.

Production-first mindset

Whether it's a real-time trading bot processing Kafka streams or an automated code review system integrated with GitHub, my work is designed for reliability, not just demos.

NLP/LLM expertise

I've fine-tuned transformer models for financial sentiment analysis, built RAG systems with vector databases, and integrated LLMs for summarization and generation tasks.

Pragmatic problem-solving

I choose technologies based on requirements, not hype. Sometimes that means FAISS over a managed vector DB, or a well-structured FastAPI service over a heavyweight framework.

Frenchβ€’ Fluent
Englishβ€’ Fluent
Arabicβ€’ Native

Professional Experience

Bouygues Telecom

Data Scientist (Alternance)

March 2024 - March 2025

Meudon, France

  • β–ΆCollaborated with business teams to identify criteria for prioritizing high-potential calls
  • β–ΆDesigned a scoring model (XGBoost) prioritizing high-conversion potential calls: +46% useful contact rate, 94% accuracy
  • β–ΆBuilt complete CI/CD pipeline with automated tests (accuracy, latency), progressive deployment and drift monitoring
  • β–ΆReduced time-to-market by 30% through deployment automation (Docker, Azure ML)
PythonPySparkDataikuAzure MLDatabricksPower BITeradataXGBoost

LeBonCoin

Data Scientist (Internship)

April 2023 - October 2023

Paris, France

  • β–ΆAutomated classification of user ads (manual categorization was causing delays and errors)
  • β–ΆBuilt NLP model combining TF-IDF and embeddings + Random Forest: accuracy from 78% to 92% across 15 categories
  • β–ΆDeployed to production via FastAPI microservice on Vertex AI, real-time processing of new ads
PythonScikit-learnVertex AIFastAPIPostgreSQLMLFlowNLP

Universidad PolitΓ©cnica de Valencia

Data Scientist (Internship)

December 2022 - March 2023

Spain

  • β–ΆBuilt MAGENTA dataset: 25K+ Arabic texts for AI-generated text detection (3 domains)
  • β–ΆGenerated synthetic texts via LLM (AraGPT-2, BLOOM) and fine-tuned AraBERT: 96.4% Macro-F1
  • β–ΆPublication: ICALP 2023 (International Conference on Arabic Language Processing), Springer
PythonPyTorchTransformersScikit-learnAraBERTAraGPT-2BLOOM

Marsa Maroc

Data Analyst (Internship)

July 2022 - August 2022

Mohammadia, Morocco

  • β–ΆAnalyzed maintenance data (sensors, ERP, intervention history) for transition from curative to predictive maintenance
  • β–ΆBuilt ETL pipeline to centralize heterogeneous sources in Snowflake and created Tableau dashboards (-40% reporting time)
PythonSQLSnowflakeTableauETL

Featured Projects

FinSentBot screenshot

FinSentBot

Real-Time Trading Signal Generation

Built an end-to-end automated trading intelligence system that combines real-time news sentiment analysis with live market data to generate Buy/Hold/Sell signals.

🎯 Problem

Financial traders need to process vast amounts of news and market data to make informed decisions. Manual sentiment analysis is slow, subjective, and can't keep pace with market movements.

πŸ’‘ Solution

Developed a production-ready system using Apache Kafka for real-time data streaming, FinBERT for financial sentiment analysis, and custom ML models for signal generation.

πŸ“Š Key Results

  • βœ“Automated analysis of 100+ financial news articles daily
  • βœ“87% accuracy on financial sentiment classification
  • βœ“End-to-end latency under 5 seconds from news to signal
  • βœ“Production-ready with comprehensive logging and error handling
PythonPyTorchTransformers (FinBERT)Apache KafkaDockerStreamlityfinanceBeautifulSoup
View on GitHub
Code Review AI screenshot

Code Review AI

Automated Python Analysis System

Developed an AI-powered code review system that combines Abstract Syntax Tree (AST) parsing with Claude AI to automatically analyze pull requests and provide detailed, actionable feedback.

🎯 Problem

Manual code reviews are time-consuming and often miss subtle bugs, security issues, or style inconsistencies. Development teams need automated quality checks that integrate seamlessly into their workflow.

πŸ’‘ Solution

Created a production system using FastAPI webhooks, Python AST parsing, and Claude AI for intelligent code analysis with automatic PR commenting.

πŸ“Š Key Results

  • βœ“Deployed to production on Railway with live webhook integration
  • βœ“2,400+ lines of production code with modular architecture
  • βœ“Average review time reduced to <30 seconds per PR
  • βœ“100% automation with zero manual intervention required
PythonFastAPIClaude AI (Anthropic)ASTPyGithubDockerpytestRailway
View on GitHub
DocuMind screenshot

DocuMind

Intelligent RAG System for Semantic Search

Built a Retrieval-Augmented Generation (RAG) system that combines semantic embeddings with vector similarity search to enable intelligent document discovery.

🎯 Problem

Traditional keyword search fails to capture semantic meaning, making it difficult to find relevant information in large document collections. Users need intelligent systems that understand context and intent.

πŸ’‘ Solution

Implemented a RAG architecture using Sentence Transformers for embeddings, FAISS for vector search, and planned LLM integration for generation.

πŸ“Š Key Results

  • βœ“123 documents indexed with sub-second query response times
  • βœ“92% of test queries return relevant results in top-3
  • βœ“Fully containerized MVP ready for production deployment
  • βœ“Roadmap includes LoRA fine-tuning and multi-modal support
PythonSentence TransformersFAISSPyTorchHugging FaceStreamlitDockerNumPy
View on GitHub
AI News Agent screenshot

AI News Agent

Automated Multi-Source News Pipeline

Created an end-to-end automated pipeline that scrapes major tech publications, deduplicates and processes articles, generates AI summaries, and distributes daily email digests.

🎯 Problem

Staying current with technology news across multiple publications is time-consuming. Professionals need curated, summarized content delivered automatically without manual aggregation.

πŸ’‘ Solution

Built a fully automated pipeline using Selenium for scraping, OpenAI for summarization, and GitHub Actions for scheduling.

πŸ“Š Key Results

  • βœ“Automated aggregation from 5 major tech publications daily
  • βœ“100% automated pipeline with zero manual intervention
  • βœ“French summaries optimized for quick consumption (2-3 sentences per article)
  • βœ“Free hosting via GitHub Actions with no server costs
  • βœ“Average 20-30 articles processed and summarized daily
PythonOpenAI APISeleniumSQLiteGitHub ActionsStreamlit
View on GitHub

Technical Skills

Programming & Data

PythonAdvanced
SQLProficient
Bash/ShellProficient
PandasAdvanced
NumPyAdvanced
PySparkProficient
KafkaProficient

Machine Learning & AI

PyTorchAdvanced
TensorFlowProficient
scikit-learnAdvanced
Hugging Face TransformersAdvanced
BERT/FinBERTAdvanced
Model Training & EvaluationAdvanced
Hyperparameter TuningAdvanced

NLP & LLMs

spaCyAdvanced
NLTKProficient
Sentiment AnalysisAdvanced
Text ClassificationAdvanced
LangChainProficient
RAG SystemsProficient
Sentence TransformersProficient
Claude AI APIProficient
OpenAI APIProficient

MLOps & Deployment

DockerAdvanced
FastAPIAdvanced
GitHub ActionsProficient
pytestProficient
CI/CDProficient
DataikuAdvanced

Cloud & Infrastructure

AWS (Lambda, SageMaker, S3, EC2)Proficient
GCPProficient
Azure (Data Factory)Proficient

Data Visualization & Tools

StreamlitAdvanced
TableauProficient
Matplotlib/SeabornAdvanced
Git/GitHubAdvanced

Let's Connect

I'm currently available for full-time ML Engineer / Data Scientist roles and freelance AI/ML projects. Whether you're looking to build production NLP systems, deploy real-time ML pipelines, or implement RAG solutions, I'd love to hear from you.

πŸ“ Based in Paris, France

Open to remote opportunities across Europe and international projects

⏱️ Response Time

I typically respond within 24 hours.