Data annotation tools: Prodigy and Dataturks, Git, Docker, CI/CD with Github Actions or Travis CI, Heroku
Big Data platforms: Hadoop (hdfs and MapReduce), Spark(Spark Streaming and SparkML) with Java/Scala/Python, Kafka
R and Matlab programming
Java, SQL, JDBC, XML, JSON,
OOP, TDD using JUnit and Mockito, DbC
Projekthistorie
08/2018
-
bis jetzt
Data Scientist
SuisseCoGmbH
Resume Parsing and Analysis Based on NLP and Machine Learning
Hybrid approach based on content and layout techniques
Extract and categorize the resume information into specific fields
Personal details, education, work experience, projects, skills etc.
Layout features extraction
Content features extractio: Fuzzy matching, Named Entity Recognition (NER), POS Tagging, Topic Modeling, Word2Vec
Rule-based grammar for IE
Custom NER models using Prodigy annotation tool with Active Learning
Spacy NER models for every section of content: personal info, education, work experience etc.
Search indexing using Skill2Vec
Target: converting semi-structured data from PDFs to structured JSON files
Ranking score that describes how well candidate fits based on education, skills and experience
10/2017
-
07/2018
Data Scientist
Recognizing User's Activity for the case of Public Transportation (Master Thesis Project)
FAIRTIQ AG
The goal of this project is to design, build and evaluate prediction models for recognising human activities in the context of fine-grained transportation mode detection.
The project involves collecting data from various mobile device sensors, such as accelerometer and GPS, performing feature extraction to extract meaningful features out of raw signals including features from statistical, time and frequency domains. The extracted features are used to build a supervised classifier that recognises the transportation mode for the new data samples.
Study projects
University of Bern/University of Fribourg
Building Hadoop MapReduce applications to analyze large Twitter datasets
Live Twitter analysis using Apache Spark Streaming + Kafka
Online Course on Hadoop Streaming: forum logs processing using Hadoop map/reduce.
Building an ML model for predicting heart diseases in patients based on a biomedical dataset from Zurich, Basel and Lugano hospitals(Best model award)
Building an ML model for digit recognition on MNIST dataset(Best model award)
Building an ML model for Signature verification
Reisebereitschaft
Verfügbar in den Ländern
Schweiz
Data Scientist - NLP
Profil folgen
Für diese Anzeige benötigen Sie die Enterprise-Mitgliedschaft.
Profil folgen
Bitte geben Sie einen Namen für Ihre neue Merkliste an