About me
Welcome to my portfolio! I'm Julian, a Polish-American with a multidisciplinary professional and
academic
background eager to apply my broad skillset to a career in data.
In addition to being a master’s student in data science, I hold a master's in psychology,
a bachelor's in biology, and a minor in mathematics.
I am skilled in employing Python, R, and SQL for machine learning,
Bayesian modeling, deep learning, spatiotemporal analysis, and big data,
alongside visualization tools like Power BI, Tableau, Excel, and
various Python packages. I also hold Microsoft Azure Fundamentals (AZ-900)
and Power BI Analyst Associate (PL-300) certifications.
Aside from my technical skills, my professional experience is diverse —
ranging from managing program operations and conducting exposure therapy
at McLean Hospital’s OCD Institute to interviewing documentary subjects at
Big Other Productions. Through these roles, I’ve strengthened the communication,
organizational, and administrative skills necessary to excel in collaborative,
data-driven environments.
In this portfolio, you’ll find projects that showcase my ability to extract
insights from varied and complex real-world data.
SQL Projects
The overarching theme of projects 1-3 is the theme of progress,
which I conceptualized as: to what degree is our world equal in terms of education
and economic equality, and how far have we come in the development of green technologies,
specifically in terms of the production and use of electric vehicles.
In these projects, I explore data from various sources using MySQL Workbench.
My final project is a data cleaning SQL query for a publicly available data set concerning
affordable housing construction in New York City.
Project 1: Electric Car Use
In my first project, I analyze the manufacture, sale, and use of electric vehicles (EVs)
between 2010 and 2022 among various countries. Some of the statistics generated include
the change in EV sales over time per country, most recent rankings of EV sales and use,
as well as EV use and sales relative to the total car market.
Project 2: Worldwide Education and Literacy
In my next project, I explore the most recent/robust data on the differences in education
and literacy between different countries. Some metrics include youth literacy rate
(between 16 and 24 years old), average years of schooling, and GDP per capita vs literacy
test scores.
Project 3: Economic Inequality and GDP
In my final data exploration project, I examine worldwide economic inequality before the
COVID-19 pandemic. Since the pandemic had far-reaching negative effects on the global economy,
I decided to focus on inequality measures before 2020.
I analyzed various GDP metrics – including total GDP, GDP per capita and GDP per employee –
as well as two measures of economic inequality: the Gini coefficient and the Atkinson index.
Project 4: NYC Affordable Housing Construction
This SQL script is designed to clean a dataset covering the construction of affordable housing units
in New York City from 2014 to 2018. Data cleaning tasks included: renaming columns, changing data
types, removing duplicates, and miscellaneous reformatting tasks.
Tableau Dashboards
To complement my SQL projects 1-3, I created Tableau dashboards corresponding to each project.
For these dashboards, I selected the most salient data from my SQL queries and generated interactive
visualizations.
Click on "See Dashboards" to view them in my Tableau profile.
Python Projects
The following projects demonstrate my ability to use Python for a variety of end-to-end data science
workflows.
They range from early explorations in programming — including object-oriented
programming, file handling,
SQL integration and automation — to advanced applications in
machine learning, geospatial analysis, and time series
forecasting.
I apply modern data science techniques such as feature engineering, model
tuning, interpretability,
and performance evaluation with libraries including NumPy,
Pandas, Matplotlib,
Scikit-learn, Seaborn, Sktime, Scipy,
Geopandas, libpysal, and XGBoost.
Together, these projects showcase both breadth and depth — from core programming foundations to
real-world data science pipelines.
Repository 1: Airbnb Revenue Prediction with XGBoost
This repository contains a Jupyter Notebook in which I analyze and predict Airbnb listing revenue
using
machine learning techniques. In this project I use NumPy, Pandas,
Matplotlib,
Scikit-learn, and XGBoost libraries.
The pipeline includes preprocessing of numerical (imputation and scaling) and categorical
(imputation and one-hot encoding)
features, regression modeling with XGBoost, evaluation via mean absolute error (MAE), feature
importance analysis,
and learning curve visualization. An alternative pipeline using
HistGradientBoostingRegressor is also provided for comparison.
All functionality is available in the Python scripts:
xgb_pipeline.py (XGBoost pipeline), hgb_pipeline.py (HistGradientBoostingRegressor pipeline),
and analysis.py (feature importance and learning curve functions).
Repository 2: Geospatial Analysis of Voting Patterns in the Netherlands (2023)
This repository contains a Jupyter Notebook in which I analyze and predict the
percentage of votes for the Party for Freedom (PVV) per municipality in the Netherlands
(2023) using geospatial and machine learning techniques. In this project I use
Pandas, NumPy, Matplotlib,
GeoPandas, PySAL, ESDA,
spreg, and Scikit-learn libraries.
The analyses include spatial autocorrelation (Global Moran’s I),
evaluation of different spatial weighting schemes (contiguity-based and KNN-based),
linear and spatial autoregressive modeling (GM Lag), as well as regression
modeling
with cross-validation and hyperparameter tuning using K-Nearest Neighbors,
Random Forest, and Gradient Boosting Regressors. I employed nested
group-wise
cross-validation to assess prediction accuracy, with results visualized via
choropleth maps showing municipal-level prediction errors.
Repository 3: Forecasting Emergency Room Visits Using Time Series Analysis
This repository contains a jupyter file in which I forecast daily emergency room visits at an Iowa
hospital (Jan 2014–Aug 2017) using time series analysis.
In this project I use NumPy, Pandas, Matplotlib,
Sktime, Scikit-learn, and XGBoost libraries.
The analyses include time series decomposition (STL), autocorrelation
(ACF/PACF), ARIMA/AutoARIMA, exponential smoothing, and
machine learning approaches such as K-Nearest Neighbors (KNN) and
XGBoost, with accuracy evaluated using mean absolute percentage error
(MAPE).
Repository 4: Analysis of 5 Largest Green ETFs
This repository contains a jupyter notebook file in which I analyze the performance of the top 5
largest ETFs geared towards sustainability and green technologies
with respect to each other and the Vanguard Total Market ETF (VTI). In this project I use the
following libraries to conduct my analyses:
NumPy, Pandas, Matplotlib, Seaborn, and
SciPy.
Repository 5: Central Park Temperature Time-Series Analysis
For this project, I conducted a time-series analysis of the average monthly temperatures in Central
Park, NY from 1870-2023. My analysis
culminated in a SARIMA model which I used to predict the temperatures for the remainder of 2024. To
run my analyses, I used the following libraries:
NumPy, Pandas, Matplotlib, Statsmodels, and
Pmdarima.
Repository 6: Remote SQL Query Tool
This is a script that allows you to connect to a remote MySQL server and execute an SQL query.
Repository 7: English/Polish Number Translator
This script asks the user to input an integer from 0 to 1,000,000, and returns that number
written out in English, and Polish, along with the IPA and English pronunciations of the
Polish numbers.
Repository 8: Text File Generator and Word Counter
Finally, this script asks the user to write to a text file they created, and subsequently
returns a total count of the English words in the text,
together with a count of the unique English words it contains.
Repository 9: Games
This repository contains simple scripts for the games tic-tac-toe and blackjack in which I employ
object-oriented programming.
R Projects
This section highlights my work in R. While currently featuring one project, it demonstrates my
ability to apply R
for advanced statistical analysis and Bayesian modeling. I am continuing to build this section as I
expand my R portfolio.
Repository 1: Bayesian Multilevel Analysis of Cardiovascular Disease Mortality
This repository contains an R project in which I analyze cardiovascular disease (CVD) mortality
among European women aged 65–69 using Bayesian multilevel modeling via the
brms library.
The analysis pipeline includes data preprocessing (handling missing values, log-transforming skewed
variables, scaling, and centering), Bayesian multilevel Beta regression modeling with hierarchical
structures for region and country, posterior predictive checks, model comparison via k-fold
cross-validation, and interpretation of model parameters. Sensitivity analyses for prior
specifications and phi parameter adjustments are also included to ensure model robustness.
Excel Projects
The following excel projects were selected to demonstrate my ability to work with large data sets
which need to be cleaned/adjusted for clarity.
The first project contains a dashboard which visualizes the data pertaining to the rates of heart
disease among Americans in 2020,
and the second provides a breakdown of the construction of affordable housing in New York City from
2014-2018.
Dashboard 1: U.S. Heart Disease Statistics
This dashboard explores the effects of lifestyle and various health conditions on the rates of heart
disease among Americans. Sepecifically, it examines these effects between men and women, and between
difference races and age groups.
Dashboard 2: NYC Affordable Housing Construction
This dashboard organizes the statistics on the construction of affordable housing in New York City
from 2014 to 2018 according to borough, start/finish year, and level of income.
Elements
Text
This is bold and this is strong. This is italic and this is
emphasized.
This is superscript text and this is subscript text.
This is underlined and this is code: for (;;) { ... }
. Finally, this is a link.
Heading Level 2
Heading Level 3
Heading Level 4
Heading Level 5
Heading Level 6
Blockquote
Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus
euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis
iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem
ipsum dolor sit amet nullam adipiscing eu felis.
Preformatted
i = 0;
while (!deck.isInOrder()) {
print 'Iteration ' + i;
deck.shuffle();
i++;
}
print 'It took ' + i + ' iterations to sort the deck.';
Lists
Unordered
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Alternate
- Dolor pulvinar etiam.
- Sagittis adipiscing.
- Felis enim feugiat.
Ordered
- Dolor pulvinar etiam.
- Etiam vel felis viverra.
- Felis enim feugiat.
- Dolor pulvinar etiam.
- Etiam vel felis lorem.
- Felis enim et feugiat.
Icons
Actions
Table
Default
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |
Alternate
Name |
Description |
Price |
Item One |
Ante turpis integer aliquet porttitor. |
29.99 |
Item Two |
Vis ac commodo adipiscing arcu aliquet. |
19.99 |
Item Three |
Morbi faucibus arcu accumsan lorem. |
29.99 |
Item Four |
Vitae integer tempus condimentum. |
19.99 |
Item Five |
Ante turpis integer aliquet porttitor. |
29.99 |
|
100.00 |