I’ve given talks at various Python, data science, and machine learning conferences over the years. Here are some recordings.
From scientific research to innovation – :probabl.¶
Open Science Days 2025
A talk about the journey from scientific research to innovation, featuring :probabl. (probabl.ai) - the company behind the sustainable development of scikit-learn and other open source machine learning tools. I discuss how open source scientific software can be transformed into sustainable products while maintaining the spirit of open science and community collaboration.
Writing a scikit-learn compatible estimator in the modern age¶
PyCon Italia 2025 · Bologna, Italy
A modern take on writing custom scikit-learn estimators and transformers. This talk covers the latest best practices for creating estimators that work seamlessly with scikit-learn pipelines, including the new estimator tags system, metadata routing API, and testing infrastructure. An updated guide for anyone wanting to extend scikit-learn with their own algorithms.
EU AI Act: Unveiling Lesser-Known Aspects and Implementation Entities¶
EuroPython 2024 · Prague, Czech Republic
The EU AI Act is now partly in effect, prohibiting certain AI systems. This talk explores the lesser-known aspects of the regulation and the various entities responsible for its implementation. I discuss what practitioners and developers need to know about compliance, the classification of AI systems by risk level, and the practical implications for those building and deploying machine learning systems in Europe.
Let’s exploit pickle, and skops to the rescue!¶
EuroSciPy 2023 · Basel, Switzerland · Conference page
Pickle files can be evil and simply loading them can run arbitrary code on your system. This talk presents why that is, how it can be exploited, and how skops is tackling the issue for scikit-learn/statistical ML models. I go through the process used by the pickle module to persist Python objects, demonstrating how __getstate__, __setstate__, and __reduce__ methods can be exploited to create malicious pickle files. Then I introduce the skops library’s alternative format for securely storing scikit-learn, xgboost, lightgbm, and catboost estimators.
Dynamically generated methods with a non-generic signature¶
EuroPython 2023 · Prague, Czech Republic · Conference page
A deep dive into Python metaprogramming using scikit-learn as a case study. This talk explores how to add methods to all subclasses of a base class where the generated method signatures depend on existing subclass methods. I cover inspecting method signatures with the inspect module, traversing the Method Resolution Order (MRO), using descriptors for method generation, applying PEP-362 to attach signature objects, dynamically creating docstrings, and leveraging PEP-487’s __init_subclass__ to attach methods to child classes.
Best practices to open source a product and creating a community around it¶
EuroPython 2022 · Dublin, Ireland · Conference page
In certain areas of the industry, open source has become mainstream. But making source code publicly available on platforms like GitHub is not enough. This talk explores critical factors for successful open source projects including licensing, governance structures, and common mistakes that discourage newcomers. I cover various governance models (do-ocracy, founder-led, electoral, corporate-backed, and foundation-backed approaches) and community pitfalls such as poor onboarding, unclear leadership, weak communication, and insufficient transparency.
scikit-learn and fairness, tools and challenges¶
NeurIPS 2020 · Virtual · SlidesLive
Presented with Nana Yamazaki at the NeurIPS Expo. We start with a common classification pipeline, then assess fairness/bias of the data/outputs using disparate impact ratio as an example metric, and finally mitigate unfair outputs while searching for hyperparameters that give the best accuracy while satisfying fairness constraints. This workflow exposes limitations of the scikit-learn API related to passing around feature names and sample metadata in a pipeline down to the scorers. We discuss workarounds and the ongoing work to address these issues.
How to write a scikit-learn compatible estimator/transformer¶
FOSDEM 2020 · Brussels, Belgium · Conference page
A hands-on tutorial on writing custom estimators or transformers that work seamlessly in scikit-learn pipelines. The session covers tips and tricks, testing your estimator against scikit-learn’s common tests, and integration with pipelines and grid searches. This was my first time presenting to ~500 people in the Python room.
Current affairs, updates, and the roadmap of scikit-learn¶
PyConDE & PyData Berlin 2019 · Conference page
An update on recent scikit-learn changes, current affairs, and the project roadmap. I discuss the challenges of managing scikit-learn with significant community engagement but limited core developer capacity, covering recent events around the scikit-learn community and major ongoing initiatives. The talk also addresses the broader ecosystem including scikit-learn-contrib and scikit-learn-extra, which houses models and methods that don’t pass the inclusion criteria of scikit-learn.
The path between developing and serving machine learning models¶
PyData Berlin 2017 · Conference page
A tutorial on deploying machine learning models using PipelineIO and PMML (Predictive Model Markup Language). This talk explores the gap between developing models in Python and serving them in production environments, covering practical approaches to model serialization and deployment.