VLG Data Engineering
  • Blog
  • About
VLG Data Engineering

VLG Data Eng.


A blog on Data Engineering & Machine Learning Operations

Why quick-sort works

 Posted on January 7, 2023

Prooving the correctness of quick-sort algorithms with high-school maths. [Read More]
algorithms  computer science  sorting  theory 

A multi container ML app (3/3): Kubernetes

 Posted on September 17, 2022

Deploying our 3-container translation app on a Kubernetes cluster to get scalability and resilience. [Read More]
kubernetes  k8s  gcp  google cloud  gke  swarm 

A multi container ML app (2/3): deploying with Docker Compose

 Posted on September 15, 2022

Now that we have Docker containers, let’s deploy them together with Docker Compose. Also covered: security with Docker secrets, data persistence with Docker volumes and dependency ordering. [Read More]
docker  container  compose  secrets  volume  swarm 

A multi container ML app (1/3): Docker

 Posted on September 11, 2022

Building a translation app by putting together 3 containerized microservices: a Flask frontend, a FastAPI backend and a MySQL database. Let’s skim through the development process and the containerization. Also covered: Docker registry and CI/CD with GitHub Actions. [Read More]
docker  container  api  nlp  database  flask  fastapi  python  sql  ci/cd  registry 

Build a real-time stream of air quality data with Apache Kafka

 Posted on August 10, 2022

Let’s build a data stream with Kafka today! We will retrieve air quality data using the World Air Quality Index project’s API, then push it on a Kafka cluster. [Read More]
kafka  stream  API  data engineering  distributed  http  request  air  cluster 

An asymmetric loss for regression models

 Posted on January 9, 2022

Drive regression models towards under/overestimation while keeping accurate outputs with the linear-exponential loss. [Read More]
loss  custom  asymmetric  underestimation  overestimation  regression  python 

NLP with 🤗 Hugging Face

 Posted on July 12, 2021

Zero-shot classification is basically text classification with no training at all. How does it compare with transfer learning/fine-tuning? We’ll see using the beloved 🤗 transformers library. [Read More]
NLP  zero-shot classification  text classification  distilbert  transformers  hugging face 

Interpretable machine learning with SHAP

 Posted on January 24, 2021

In this post, we predict health insurance costs with an efficient black box model, namely random forest. Then we interpret individual predictions as well as the global behavior of the estimator using SHapley Additive exPlanations. [Read More]
interpretability  explainability  Shapley  SHAP  correlation  multicollinearity  python  black box  insurance 

Image recognition with PyTorch and fastai

 Posted on December 22, 2020

Computer vision is one of the most fascinating domains in Machine Learning. Libraries like PyTorch and more recently, fastai, have made these kinds of models extraordinarily accessible. In this post, we build an aircraft classifier from gathering data to training and deployment. [Read More]
computer vision  transfer learning  pre-trained models  deployment  pytorch  torchvision  fastai  fast.ai  python 

Shiny Central Limit Theorem

 Posted on November 28, 2020

The central limit theorem is one of the greatest hits in the history of statistics. I wrote a little Shiny app to visualize it and to illustrate its infamous “counterexample”, Cauchy distribution: https://datatrigger.shinyapps.io/CLT_Visualization/. [Read More]
r  shiny  data visualization  central limit theorem 
  • Older Posts →

Vincent Le Goualher  • © 2023  •  VLG Data Engineering

Hugo v0.110.0 powered  •  Theme Beautiful Hugo adapted from Beautiful Jekyll