Projects

Here are some of the projects I’ve enjoyed working on. Click on them to learn more!

ModernBERT for Patents: Faster Insights, Smarter Classification

ModernBERT for complex patent classification, demonstrating >2x faster inference than traditional BERT with state-of-the-art accuracy using hierarchical loss. Introduced USPTO-3M, a large public dataset of 3 million patents.

Silent Sabotage: Backdooring Code-Executing LLM Agents

Investigated the unique backdoor vulnerabilities of CodeAct LLM agents, demonstrating highly effective attacks via fine-tuning poisoning, even with minimal poisoned data, highlighting critical security risks in autonomous systems.

Ethical AI Recommendations: Benchmarking LLM Bias in Cold-Start Scenarios

Developed and applied a novel benchmark to evaluate ethical biases (gender, nationality, etc.) in LLM-based recommender systems, especially for new users (cold-start), revealing significant stereotype replication and providing tools for...

Deep Learning Mastery: From Foundations to Advanced Generative Models with PyTorch

Implemented, trained, and evaluated diverse deep learning models (MLPs, CNNs, Transformers, GANs, Diffusion Models) using PyTorch and NumPy for tasks like image classification/generation, sequence modeling, and robotic control.

NLP for Patent Search & Generation at DeepIP (Kili Technology)

Developed and evaluated patent similarity search using Embeddings, and LLMs. Specialized LLMs for patent generation via fine-tuning exploration and advanced instruction design, CoT. Integrated style transfer through architectural refactoring.

Advanced of the Machine Learning Toolkit

A deep dive into supervised, unsupervised, randomized optimization, and reinforcement learning algorithms using Scikit-learn, Matplotlib, Gymnasium, and custom libraries.

Data Processing Platform Development at Lingua Custodia

Developed core Python microservice components for the Datomatic data platform. Created NLP tools, a web scraping framework, and a client library, enhancing financial translation data pipelines

Education

  • University of Technology of Compiègne (2020 – 2025)
    • BS & MEng in Computer Science & Artificial Intelligence
  • University of Calabria, Erasmus (Sep 2024 – Feb 2025)
    • MS in Artificial Intelligence
  • Relevant Coursework: Deep Learning, Machine Learning, Computer Vision, Agentic AI, NLP, Distributed Systems.

Experience

  • Research Scientist Intern supervised by Dr. Wafa Ben Jaballah, Thales (CortAIx Lab) – Paris (Jun. - Dec. 2024) Deep learning research within the EU CyberNemo project, specializing in federated learning log anomaly detection.
    • Applied advanced techniques including PEFT methods and multi-GPU parallelism for training models up to multi-billion parameters in PyTorch.
    • Developed SOTA log anomaly detection systems using transformer architectures for cybersecurity threat identification
    • Conceived novel FL aggregation strategies that improved accuracy in both IID and non-IID federated scenarios while reducing communication costs by 10x
    • Generated synthetic data to train a tokenizer without any access to data, enabling the use of domain-specific tokenizers in FL settings
  • Researcher Assistant supervised by Prof. Insaf Setitra, University of Technology of Compiègne (Feb - Jul 2024)
    • Developed facial empathy analysis system using deep learning and computer vision for mixed-reality applications.
    • Implemented SOTA models achieving superior performance in emotion recognition.
    • Designed comprehensive experimental protocol integrating real-time video analysis with empathy questionnaires.
  • NLP & Data Engineer Intern, Lingua Custodia – Paris (Sep 2023 – Feb 2024)
    • Optimized algorithms for similarity calculations & caching, boosting speed 6x and reducing memory usage 10x.
    • Developed a scalable RESTful API for managing translation memories.
    • Applied DevOps practices, containerization and CI/CD pipeline setup.

Technologies

  • Languages: Python (PyTorch, NumPy, Scikit-learn, PEFT, Transformers, Flower, FastAPI, Multithreading, Pytest), Go, C++, R, MySQL.
  • Tools: Git, Docker, Bash, Weights & Biases, Jupyter, Conda, Google GCP.
  • OS: Ubuntu, Windows (WSL).