top of page
profile-pic.PNG

Hello

About Me

I am a data scientist/data analyst who is passionate about helping teams better leverage their data to increase impact and amplify their mission. As a problem-solver, strategic-thought partner, and technical leader with experience with statistical modeling and a variety of software and databases, I am eager to dive into solving problems collaboratively alongside your team!

SKILLS

Programming Language, Software, & Databases

Python - NumPy, Pandas, scikit-learn, Statsmodels, Keras, Tensorflow

Git

SQL

PostgreSQL

PySpark

Databricks

AWS

Salesforce

Streamlit

Microsoft Office Suite

Google Sheets

Statistical Methods & Modeling

Linear & Logistic Regression

Feature Engineering

Natural Language Processing

Time Series

Clustering

Neural Networks

Grid Searching & Pipelines

KNN

Decision Trees

Random Forests

Data Visualization
 

Tableau

PowerBI

Matplotlib

Seaborn

Plotly

​

​

​

​

​

PROJECTS

Subreddit Classification & Sentiment Analysis with NLP

GitHub Repo

 

I utilized the Natural Language Processing toolbox, Pandas, NumPy, statistical modeling strategies, and other libraries to build a binary classification model  that best classifies posts from two subreddits ('SkincareAddiction' & AsianBeauty') and performed a sentiment analysis. The findings and insights were used to suggest potential marketing strategies.

birgith-roosipuu-CJFG-GOAeVk-unsplash.jpg

School District Standardized Assessment Analysis

GitHub Repo

 

I applied Pandas, Matplotlib, and Seaborn to explore and analyze 2019 LAUSD SAT, ACT, and Free & Reduced Price Lunch datasets from the California Department of Education data portal. Proposed focus areas to improve student college and career readiness.

pexels-pixabay-256395.jpg

Predictors of Housing Price in Ames, Iowa

GitHub Repo

 

In this project I applied a variety of pre-processing (SimpleImputer, OrdinalEncoder), feature engineering (OneHotEncoder, custom-built variance inflation Factor function), and modeling (LinearRegression, Ridge, ElasticNet, GridSearchCV with Lasso) to determine the most predictive features of housing price in Ames, Iowa in order to help first time homebuyers be more informed throughout their search. 

tierra-mallorca-rgJ1J8SDEAY-unsplash.jpg

Regression Modeling - Predictors of Discipline Incidents in Illinois Schools

GitHub Repo

 

I utilized regression modeling to identify predictors of school suspension/expulsion rates across Illinois public schools. I wrangled data from several several sources and applied a variety of pre-processing and feature engineering to build a regression model that had features such as school and student demographics, student mobility and truancy rates, district size, content proficiency rates, and per pupil student expenditure. I tested out multiple models such as linear regression, XGBoost, ExtraTrees, and KNN regressor to determine the characteristics that are the strongest predictors of student discipline to help districts and organizations prioritize resources.

sin-md7dKot5nc0-unsplash.jpg

FREELANCE SERVICES

If you are interested in support for a data project, please send an email to dbsim88@gmail.com to setup up an exploratory call to discuss your data needs and rates.

Data Cleaning & Analysis

Reporting & Data Visualization

Thought-Partnership & Strategy

Describe one of your services

Describe one of your services

Describe one of your services

bottom of page