Profilbild von Benjamin Bluhm Machine Learning Engineer aus Frankfurt

Benjamin Bluhm

teilweise verfügbar

Letztes Update: 07.04.2024

Machine Learning Engineer

Abschluss: PhD in Statistics/Econometrics
Stunden-/Tagessatz: anzeigen
Sprachkenntnisse: deutsch (Muttersprache) | englisch (verhandlungssicher)

Dateianlagen

CV-BenjaminBluhm-German-2024-03_180324.pdf

Skills

  • Mehrjährige Projekterfahrung im Data Science und KI Umfeld in unterschiedlichen Branchen (Banken, Logistik, Telekom, Retail, Pharma, Medien, Chemie)
  • Softwareentwicklung in den Bereichen Machine Learning, Künstliche Intelligenz, Generative KI
  • Entwicklung generativer KI Anwendungen und Bereitstellung der Anwednungen mit Hilfe von Microservices
  • Bereitstellung von Machine Learning Modellen und KI Anwendungen in unterschiedlichen Cloud Umgebungen
    • Azure
    • AWS
    • Databricks
  • Prototyping und Operationalisierung von Modellen
  • CICD via DevOps & MLOps
  • Entwicklung von produktionsreifem Code in Python sowie tiefgehende Kenntnisse relevanter Bibliotheken (z.B. scikit-learn, mlflow, openai, langchain, semantic kernel)
  • Sicherer Umgang mit gängigen Entwicklungsumgebungen wie z.B. PyCharm, Visual Studio Code
  • Scrum / Arbeit in agilen Team

Projekthistorie

01/2023 - 10/2023
Senior Consultant ML Engineer
Vaillant Group (Energie, Wasser und Umwelt, >10.000 Mitarbeiter)

- ML Engineer im Rahmen eines großen Data Science Projekts
- Deployment von ML Modellen über Databricks und Azure ML Endpunkte
-  Implementierung von Python coding best practices
- Konzeption & Implementierung von CICD pipelines in Azure DevOps & Azure Databricks
- Entwicklung der Code Promotion Strategie für den Release Prozess von Analytics Modellen
- Entwicklung der MLOps Strategie zur Produktionalisierung von ML Modellen
- Reguläre Pull Request Reviews zur Sicherstellung der Code Qualität

01/2021 - 11/2022
Senior Consultant ML Engineer
BASF (Industrie und Maschinenbau, >10.000 Mitarbeiter)

- Lead ML Engineer im Rahmen eines großen Data Science Projekts
- Entwickler & Maintainer von 2 Python Bibliotheken 
- Implementierung von Python coding best practices
- Konzeption & Implementierung von CICD pipelines in Azure DevOps & Azure Databricks
- Entwicklung der Code Promotion Strategie für den Release Prozess von Analytics Modellen
- Entwicklung der MLOps Strategie zur Produktionalisierung von ML Modellen
- Reguläre Pull Request Reviews zur Sicherstellung der Code Qualität
- Consultant und Mentor der Data Science community im Rahmen eines großen Data Science Projekts

06/2020 - 12/2020
Senior Consultant Data Science
Axel Springer (Medien und Verlage, >10.000 Mitarbeiter)

Creation and evaluation of a dynamic pricing model for one of their major online media products
  • Requirements engineering with key business stakeholders to translate business objective into suitable machine learning approach
  • Creation of large-scale PySpark Job workflow in Palantir Foundry in order to prepare training and test datasets by combining large tables from various sources with historical online user engagement data
  • Prototyping, training and eveluation of binary classification model to predict conversion probabilites using Spark MLlib
  • Implementation of mathematical optimization framework to compute optimal price allocation based on price elasticities 
  • Creation of build schedule for daily calculation of dynamic prices based on develeped model 
  • Implementation of A/B-test in order to evaluate developed model against suitable control group
  • Communication of results and coordination of tasks in regular meetings with stakeholders
  • Key technologies: Python, PySpark, Palantir Foundry, AWS, Spark MLlib, Adobe Analytics

01/2020 - 03/2020
Senior Consultant Data Science
Panther Pricing (Konsumgüter und Handel, < 10 Mitarbeiter)

Implementation of a cloud-based Python data science framework as well as creation of a Spark ETL workflow for turning retailer transaction level data into actionable data for model training

  • Implementation of Netflix’s Metaflow framework in AWS to enable automatic versioning and tracking of machine learning experiments, as well as hybrid execution of machine learning runs (locally and in the cloud)
  • Implementation of a scalable ETL workflow in PySpark to transform retailer’s transaction level data into data tables used for model training. Extraction of raw data from AWS Aurora, creation of multiple complex feature engineering tasks in PySpark and automation of transformation workflow via AWS EMR
  • Model prototyping for retail demand prediction using tree-based approaches; decomposing predictions into feature contributions for improved model interpretability
  • Key technologies: Metaflow, AWS, Python, PySpark, Jupyter, scikit-learn, Parquet, Gitlab

06/2019 - 12/2019
Senior Consultant Data Science
Boehringer Ingelheim (Pharma und Medizintechnik, >10.000 Mitarbeiter)

Project lead on regional level analysis for the U.S. drug market. The objective is to create customer value by producing regional patient clusters as an input to optimize regional patient assistance programs. The project involves both data preparation tasks including feature engineering as well as model training using clustering algorithms for producing regions. The entire project is implemented in Python and Spark.

  • Developing and prototyping of different clustering algorithms in Jupyter using scikit-learn
  • Data exploration, joining and cleaning of different data sources containing patient-level pharmacy transactions using Spark SQL and Dataframe API
  • Creation of new features extracted from transaction raw data and feature aggregation to regional level using Spark SQL and Dataframe API
  • Visualization of results in plotly and matplotlib
  • Implementation of production-ready code in Visual Studio Code
  • Regular presentation of results to stakeholders in the U.S.
  • Key technologies: Python, PySpark, Jupyter, Visual Studio Code, Parquet, Plotly 

06/2017 - 05/2019
Senior Consultant Data Science
REWE Systems (Konsumgüter und Handel, >10.000 Mitarbeiter)

Design and implementation of a large-scale demand forecasting system using linear and non-linear regression models with the objective of improving product availability and reduce out-of-stock rates in REWE food stores.

  • Developing and prototyping of machine learning algorithms and classical statistical approaches for time series demand forecasting
  • Developing approaches to deal with typical time series patterns including outliers, seasonal patterns, structural breaks and holiday effects
  • Hyperparameter tuning using grid search and randomized search
  • Prototyping of potential new features to improve prediction accuracy
  • Implementation of a distributed machine learning system on a Hadoop cluster using PySpark und scikit-learn
  • Implementation of an interactive dashboard for monitoring of KPIs and productive models
  • Key technologies: Python, Pyspark, PyCharm, Jupyter, RStudio, RMarkdown, Zeppelin, HDFS, Drill, Parquet, Teradata, DB2 

05/2017 - 06/2017
Senior Consultant Data Science
GLS Group (Transport und Logistik, >10.000 Mitarbeiter)

Proof of concept design for recipient segmentation in order to optimize last mile package delivery using different clustering algorithms.

  • Data exploration, data cleansing and feature engineering in Python
  • Implementation of simple K-means algorithm / Extension to Gaussian Mixture Model to improve discriminatory power across clusters using probability thresholds 
  • Development of generic work flow to test and evaluate different clustering approaches in Zeppelin Notebook
  • Visualization of key results in ggplot library
  • Key technologies: Spark Mllib, PySpark, Zeppelin, RStudio

02/2017 - 04/2017
Senior Consultant Data Science
Deutsche Telekom (Telekommunikation, >10.000 Mitarbeiter)

Development of an application for predicting customer satisfaction on the basis of technical data from public WiFi hotspots.

  • Prototyping of forecasting algorithms using autoregressive time series models for predicting short-term dynamics of hotspot data 
  • Implementation of a distributed forecasting system using a Spark time series library 
  • Forecasts were used as input to a classification-algorithm to predict future customer satisfaction at different hotspot locations
  • Data loading and writing via Apache Cassandra 
  • Data Visualization in Zeppelin and R
  • Key technologies: Scala, Cassandra, Zeppelin, R

10/2013 - 10/2015
Research Analyst
European Central Bank (Öffentlicher Dienst, 5000-10.000 Mitarbeiter)

Development of an analytical toolset for the evaluation of statistical models used in the quarterly ECB’s projection exercise and contribution to the econometric research agenda in the ECB’s economic research department.

  • Implementation of a bayesian model averaging framework in R for data-driven identification of policy relevant factors using posterior model and inclusion probabilities 
  • Co-author of ECB working paper on panel data estimation in Bayesian setting (see publication list)
  • Co-author of ECB occasional paper on addressing model uncertainty using Bayesian Model Averaging  
  • Programming of a user-friendly R routine for BMA estimation which has been used in various projects at the ECB and the Bank of England 
  • Preparation and coordination of two ECB research workshops held in Madrid and Lisbon 
  • Key technologies: RStudio, Matlab

10/2013 - 11/2014
Research Analyst
Deutsche Bank (Banken und Finanzdienstleistungen, >10.000 Mitarbeiter)

Prototyping and implementation of statistical models to predict financial market trends to support data-driven investment decisions in the mutual fund segment of DWS asset management

  • Development of a machine learning algorithm for selecting the most important predictors among equity return drivers
  • Implementing a two-step procedure of linear regularization and dimensionality reduction approaches
  • Extensive evaluation of different in-sample estimation windows and forecasting horizons 
  • Key technologies: Matlab

Reisebereitschaft

Weltweit verfügbar
Profilbild von Benjamin Bluhm Machine Learning Engineer aus Frankfurt Machine Learning Engineer
Registrieren