Projects
Assessing credit default risk with supervised learning > link
- Data preprocessing and feature engineering of data from the Home Credit Default Risk dataset.
- Evaluating, comparing and tuning the performance of several supervised learning models.
- Models tested : DecisionTreeClassifier, RandomForestClassifier, GradientBoostingClassifier, XGBClassifier, LGBMClassifier, LinearSVC, RidgeClassifier
- Use of oversampling and undersampling techniques, cross validation and curve plotting.
E-commerce customer segmentation with unsupervised learning > link
- Data exploration, analysis, visualization and preprocessing : notebook link
- Feature engineering
- Choosing the amount of clusters with KMeans and determining client cluster
- Testing the temporal stability of the model
Natural Language Processing and Computer Vision for the Yelp reviews database > link
Natural Language Processing
- Filtering out negative reviews from a sample (5000 reviews)
- Preprocessing reviews to a format compatible with the NLP model
- Extraction of topics from negative reviews with Gensim's Latent Dirichlet Allocation
- Results analysis and visualization
Computer Vision: Image Classification
- Equalizing the histograms for each photo in the sample (100 photos per label, 500 photos total)
- Testing ORB for feature extraction
- Dimensionality reduction and KMeans clustering
- Using transfer learning with VGG16 for feature extraction
- Dimensionality reduction and KMeans clustering
- Visualizing and analyzing results
- Analyzing some examples of mislabeled photos
A synthesis of the project can be found here : https://katrinmisel.github.io/project_synthesis.html
Comparing deep learning model performances on binary sentiment classification > link
- Exploratory data analysis of Tweet data
- Evaluating two embedding dictionaries: wiki2vec and GloVe
- Evaluating the performance of several models:
- Simple neural network vs. bidirectional LSTM
- GloVe vs. Wiki2Vec embedding
- Tweet preprocessor library vs. simple text prep function vs. no text prep
- Deploying two apps to the cloud:
Image segmentation for autonomous vehicles > link
- Creating a custom data generator with Keras Sequence method
- Image augmentation with the Albumentations library
- Benchmarking multiple architectures, backbones and metrics:
- Mini U-net as baseline > architecture
- Architectures tested: U-net, PSPnet, Linknet
- Backbones tested: VGG16, ResNet34
- Loss functions: Categorical Focal Dice Loss, Categorical Focal Jaccard Loss
- Performance obtained for best model (U-net with ResNet34 backbone, Categorical Focal Dice Loss, augmented data):
- MeanIoU = 0.73
- Loss = 0.20
- Creating a webapp with Flask and Streamlit, deploying baseline > link
Deploying a content recommendation app with Azure > link
- Exploring recommendation algorithm options:
- Content-Based Filtering with article embeddings
- Collaborative Filtering, comparing two librairies: Surprise and Implicit
- Deploying a webapp using Azure functions (HTTP triggered) to Streamlit > link
Chatbot, tools:
- Microsoft Bot Framework SDK v4 for Python
- Azure cognitive service LUIS
- Web App with Azure
- Bot Framework Emulator