
Hello
About Me
I am a data scientist/data analyst who is passionate about helping teams better leverage their data to increase impact and amplify their mission. As a problem-solver, strategic-thought partner, and technical leader with experience with statistical modeling and a variety of software and databases, I am eager to dive into solving problems collaboratively alongside your team!
SKILLS
Programming Language, Software, & Databases
Python - NumPy, Pandas, scikit-learn, Statsmodels, Keras, Tensorflow
Git
SQL
PostgreSQL
PySpark
Databricks
AWS
Salesforce
Streamlit
Microsoft Office Suite
Google Sheets
Statistical Methods & Modeling
Linear & Logistic Regression
Feature Engineering
Natural Language Processing
Time Series
Clustering
Neural Networks
Grid Searching & Pipelines
KNN
Decision Trees
Random Forests
Data Visualization
Tableau
PowerBI
Matplotlib
Seaborn
Plotly
​
​
​
​
​
PROJECTS
Subreddit Classification & Sentiment Analysis with NLP
I utilized the Natural Language Processing toolbox, Pandas, NumPy, statistical modeling strategies, and other libraries to build a binary classification model that best classifies posts from two subreddits ('SkincareAddiction' & AsianBeauty') and performed a sentiment analysis. The findings and insights were used to suggest potential marketing strategies.

School District Standardized Assessment Analysis
I applied Pandas, Matplotlib, and Seaborn to explore and analyze 2019 LAUSD SAT, ACT, and Free & Reduced Price Lunch datasets from the California Department of Education data portal. Proposed focus areas to improve student college and career readiness.

Predictors of Housing Price in Ames, Iowa
In this project I applied a variety of pre-processing (SimpleImputer, OrdinalEncoder), feature engineering (OneHotEncoder, custom-built variance inflation Factor function), and modeling (LinearRegression, Ridge, ElasticNet, GridSearchCV with Lasso) to determine the most predictive features of housing price in Ames, Iowa in order to help first time homebuyers be more informed throughout their search.

Regression Modeling - Predictors of Discipline Incidents in Illinois Schools
I utilized regression modeling to identify predictors of school suspension/expulsion rates across Illinois public schools. I wrangled data from several several sources and applied a variety of pre-processing and feature engineering to build a regression model that had features such as school and student demographics, student mobility and truancy rates, district size, content proficiency rates, and per pupil student expenditure. I tested out multiple models such as linear regression, XGBoost, ExtraTrees, and KNN regressor to determine the characteristics that are the strongest predictors of student discipline to help districts and organizations prioritize resources.

FREELANCE SERVICES
If you are interested in support for a data project, please send an email to dbsim88@gmail.com to setup up an exploratory call to discuss your data needs and rates.
Data Cleaning & Analysis
Reporting & Data Visualization
Thought-Partnership & Strategy
Describe one of your services
Describe one of your services
Describe one of your services