Hi! I am

Jijun TANG

A

About

About Me

Big Data Engineer with 5 years of experience, specialized in the design, development, optimization and maintenance of data pipelines (NiFi, Kafka, Spark, Elasticsearch, Airflow, etc). I support business teams in data valorization while ensuring the reliability and scalability of processing. I also have solid experience in backend API development (FastAPI, Java, Spring) and Cloud deployment (GCP, AWS) with DevOps practices (Kubernetes, GitLab CI, Jenkins).

0 Professional projects completed

Download CV

Python
Java
Scala
JavaScript
TypeScript
Apache Spark
Apache Kafka
Apache Airflow
MongoDB
PostgreSQL
Ubuntu
Git
GitLab CI
Kubernetes
Google Cloud Platform
Amazon Web Services

Experience

Aug 2025 - Feb 2026

Big Data Engineer

Ellisphere

Integrated into a team of 9 people working in Agile (developers, product managers), I worked on designing new features and maintaining 4 strategic ETL chains processing judicial and financial data of European companies.

  • Design and development of Natural Language Processing (NLP) features in Java and Scala
  • Implementation of a pipeline validating daily XML files via JAXB using XSD schemas
  • Deployment and maintenance of 4 batch ETL chains in REXX
  • Creation of an Angular 20 UI for business data visualization
  • Administration of Kafka topics and streams for data streaming
  • Implementation of observability (SLO/monitoring) via Prometheus and Grafana dashboards
  • Management of CI/CD pipelines (GLPI, Jenkins, SVN) including documentation, unit tests and code reviews
Orange
Oct 2024 - Feb 2025

Big Data Engineer

Orange

Orange, French telecommunications leader, undertook the migration of its data to GCP Cloud. Within a team of 15 people, I accompanied this transition by optimizing ETL pipelines.

  • Design and development of new features for ETL chains based on Kafka, Spark and ElasticSearch
  • Development and implementation of a RAG (Retrieval-Augmented Generation) solution for an internal LLM (Dinootoo)
  • Operational maintenance of Big Data applications (Kafka, Spark, NiFi, ElasticSearch, Airflow)
  • Testing applications from on-premise to GCP environments (Cloud Composer, BigQuery, Big Table)
  • Update and optimization of GCP DataFlow packages in Python
  • Industrialization via CI/CD integration (GitLab CI, Jenkins, GCP)
  • Participation in Agile (Scrum) rituals via JIRA/Confluence
FDJ
Mar 2023 - Aug 2024

Production Big Data Engineer

La Française des jeux

Within the FDJ Data Lake Production team, I was responsible for the reliability and availability of critical data, ensuring ETL pipeline maintenance, API development and monitoring dashboard creation.

  • Operational maintenance of Big Data infrastructure (Apache Hadoop HDFS, Kafka, Spark, NiFi, ElasticSearch, Airflow, GitLab)
  • Data Quality control via automated Python scripts
  • Design, implementation and maintenance of ETL pipelines (creation and optimization of NiFi flowfiles)
  • Development and deployment of APIs (FastAPI) for documentation data exposure
  • Security certificates management (Kafka/ElasticSearch/NiFi)
  • Continuous integration and deployment (GitLab CI, Kubernetes, Docker, Nexus)
  • Agile Scrum rituals and technical documentation writing
Oct 2022 - Feb 2023

Python Developer

Consort Groupe

Development of an OCR (Computer Vision) algorithm to extract and structure texts present in organizational chart images, automating document entry to save 90% of manual processing time.

  • End-to-end Data Science project management
  • Design of mathematical algorithms for identification and generation of bounding boxes covering essential texts
  • Development, model accuracy evaluation and application testing
  • Deployment of the packaged model on Microsoft Azure cloud
May 2022 - Sep 2022

Full Stack Python Vue.js Developer

FactSet

Integrated into the 'Deep Sector' project providing enriched financial data on TMT sectors. Within an Agile team of 20 people, my role was to improve data access for clients with high availability and low latency constraints.

  • Design, development, documentation and maintenance of 3 critical APIs in Python/FastAPI
  • Fullstack participation (component enrichment via Vue.js)
  • Implementation of continuous integration via GitHub and deployment on AWS
  • Writing unit tests (PyTest) and creation of Mocks via Swagger
  • Daily collaboration with international project teams
Apr 2021 - Oct 2021

Data Scientist

Axens
  • Modeling of a complex physico-chemical process
  • Creation, optimization and iterations of several machine learning, deep learning algorithms in Python: LSTM + GRU combined with a custom encoder layer of transformer architecture dedicated for time series forecasting
  • Deployment of algorithms by Flask API
  • Development environment: Linux + Azure (using GPU NC6)
  • Exchange with experts
  • Digital twin: developed machine learning-based models combined with physicochemical models to predict the end of the catalyst cycle

Education

Sep 2019 - Dec 2021

Master of Science in Engineering and Mathematics for Business

Sorbonne University

Demanding curriculum in applied mathematics foundations such as functional analysis, advanced probabilities and statistics, implementing cutting-edge industry algorithms on computer, realizing deep learning and artificial intelligence projects.

  • • Python
  • • CUDA
  • • Time Series
  • • Partial Differential Equations
  • • Monte Carlo
Sep 2016 - July 2019

Bachelor of Mathematics and Computer Science

Sorbonne University

Curriculum mastering both mathematical knowledge and computer science practices.

Major Technical Skills

OOP & Algorithms

99%

ML, DL, Data Science

90%

Big Data

80%

AWS Cloud

85%

FrontEnd

70%

DevOps

90%

Python

99%

Spark

85%

Java

90%

Kafka

85%

SQL

90%

Docker / Kubernetes

85%

Certificates

In Progress

Certified Kubernetes Administrator (CKA)

Linux Foundation

Training in progress, expected completion: June 2026

Dec 2022

AWS Certified Developer – Associate

Amazon Web Services
Oct 2022

Big Data Analysis with Scala and Spark

École polytechnique fédérale de Lausanne
  • Basics of Spark's RDDs
  • Reduction Operations & Distributed Key-Value Pairs
  • Partitioning and Shuffling
  • Structured data: SQL, Dataframes, and Datasets
Mar 2022

Algorithms on Graphs

University of California San Diego
  • Decomposition of Graphs: Undirected & Directed Graphs
  • Topological Sort and Strongly Connected Components
  • Breadth-First Search and Shortest Path Tree
  • Dijkstra, Bellman-Ford
  • Minimum Spanning Trees, Kruskal's and Prim's algorithms
Mar 2022

Deep Learning Specialization

DeepLearning.AI
  • Neural Networks and Deep Learning
  • Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
  • Structuring Machine Learning Projects
  • Convolutional Neural Networks
  • Sequence Models

Projects

My Projects

Click on a project to see details

NLP Features Development

Java, Scala, XML, XSD - Ellisphere

RAG for Internal LLM

Python, GCP, LLM - Orange

ETL Pipeline Monitoring

Kafka, Prometheus, Grafana - FDJ

OCR for Org Charts

Python, Keras, Azure - Consort Groupe

Financial APIs

Python, FastAPI, Vue.js, AWS - FactSet

Time Series Prediction

Python, Keras, Azure ML - Axens
×
0 Years of Experience
0 Professional Projects
0 Certifications
0 Technologies Mastered

Blog

My Blogs

Thank you for passing by and reading my little thoughts!

Check My Medium Blog Posts

April 30, 2026 Admin 4

My notes on AI, Big Data, Spirituality and Life philosophy.

My Leetcode Memos

Oct. 28, 2022 Victor TANG 0

Come and check what I learned from the most challenging exercises in leetcode!

Solution for Out of Boundary Paths

July. 16, 2022 Jijun TANG 0

An innovative solution using bfs and state memorization.

I'm Available for a contract

Hire me!

Contact

Contact Me

Feel free to contact me through any of the channels below for business collaborations!