Weighted Random Forest with Spark 3

The third version of the number one distributed computing framework Spark was released in June 2020. Sample weights support was implemented for tree-based algorithms: decision tree, gradient tree boosting and random forest. Today we experiment with this new feature on an imbalanced dataset about credit card fraud. [Read More]

Outlier detection

In this post, I try to define what an outlier is and I present several ways to approach the problem of anomaly detection. Then, I present the Local Outlier Factor algorithm and apply it on a specific dataset to show its power, using both Python and R. I also compare its performance with the Isolation Forest method. [Read More]