About me

Welcome to my portfolio! I'm Julian, a Polish-American with a multidisciplinary professional and academic background eager to apply my broad skillset to a career in data. In addition to being a master’s student in data science, I hold a master's in psychology, a bachelor's in biology, and a minor in mathematics.

I am skilled in employing Python, R, and SQL for machine learning, Bayesian modeling, deep learning, spatiotemporal analysis, and big data, alongside visualization tools like Power BI, Tableau, Excel, and various Python packages. I also hold Microsoft Azure Fundamentals (AZ-900) and Power BI Analyst Associate (PL-300) certifications.

Aside from my technical skills, my professional experience is diverse — ranging from managing program operations and conducting exposure therapy at McLean Hospital’s OCD Institute to interviewing documentary subjects at Big Other Productions. Through these roles, I’ve strengthened the communication, organizational, and administrative skills necessary to excel in collaborative, data-driven environments.

In this portfolio, you’ll find projects that showcase my ability to extract insights from varied and complex real-world data.

SQL Projects

The overarching theme of projects 1-3 is the theme of progress, which I conceptualized as: to what degree is our world equal in terms of education and economic equality, and how far have we come in the development of green technologies, specifically in terms of the production and use of electric vehicles. In these projects, I explore data from various sources using MySQL Workbench. My final project is a data cleaning SQL query for a publicly available data set concerning affordable housing construction in New York City.

Project 1: Electric Car Use

In my first project, I analyze the manufacture, sale, and use of electric vehicles (EVs) between 2010 and 2022 among various countries. Some of the statistics generated include the change in EV sales over time per country, most recent rankings of EV sales and use, as well as EV use and sales relative to the total car market.


Project 2: Worldwide Education and Literacy

In my next project, I explore the most recent/robust data on the differences in education and literacy between different countries. Some metrics include youth literacy rate (between 16 and 24 years old), average years of schooling, and GDP per capita vs literacy test scores.


Project 3: Economic Inequality and GDP

In my final data exploration project, I examine worldwide economic inequality before the COVID-19 pandemic. Since the pandemic had far-reaching negative effects on the global economy, I decided to focus on inequality measures before 2020. I analyzed various GDP metrics – including total GDP, GDP per capita and GDP per employee – as well as two measures of economic inequality: the Gini coefficient and the Atkinson index.


Project 4: NYC Affordable Housing Construction

This SQL script is designed to clean a dataset covering the construction of affordable housing units in New York City from 2014 to 2018. Data cleaning tasks included: renaming columns, changing data types, removing duplicates, and miscellaneous reformatting tasks.

Tableau Dashboards

To complement my SQL projects 1-3, I created Tableau dashboards corresponding to each project. For these dashboards, I selected the most salient data from my SQL queries and generated interactive visualizations. Click on "See Dashboards" to view them in my Tableau profile.

Python Projects

The following projects demonstrate my ability to use Python for a variety of end-to-end data science workflows. They range from early explorations in programming — including object-oriented programming, file handling, SQL integration and automation — to advanced applications in machine learning, geospatial analysis, and time series forecasting.

I apply modern data science techniques such as feature engineering, model tuning, interpretability, and performance evaluation with libraries including NumPy, Pandas, Matplotlib, Scikit-learn, Seaborn, Sktime, Scipy, Geopandas, libpysal, and XGBoost.

Together, these projects showcase both breadth and depth — from core programming foundations to real-world data science pipelines.


Repository 1: Airbnb Revenue Prediction with XGBoost

This repository contains a Jupyter Notebook in which I analyze and predict Airbnb listing revenue using machine learning techniques. In this project I use NumPy, Pandas, Matplotlib, Scikit-learn, and XGBoost libraries.

The pipeline includes preprocessing of numerical (imputation and scaling) and categorical (imputation and one-hot encoding) features, regression modeling with XGBoost, evaluation via mean absolute error (MAE), feature importance analysis, and learning curve visualization. An alternative pipeline using HistGradientBoostingRegressor is also provided for comparison.

All functionality is available in the Python scripts: xgb_pipeline.py (XGBoost pipeline), hgb_pipeline.py (HistGradientBoostingRegressor pipeline), and analysis.py (feature importance and learning curve functions).


Repository 2: Geospatial Analysis of Voting Patterns in the Netherlands (2023)

This repository contains a Jupyter Notebook in which I analyze and predict the percentage of votes for the Party for Freedom (PVV) per municipality in the Netherlands (2023) using geospatial and machine learning techniques. In this project I use Pandas, NumPy, Matplotlib, GeoPandas, PySAL, ESDA, spreg, and Scikit-learn libraries.

The analyses include spatial autocorrelation (Global Moran’s I), evaluation of different spatial weighting schemes (contiguity-based and KNN-based), linear and spatial autoregressive modeling (GM Lag), as well as regression modeling with cross-validation and hyperparameter tuning using K-Nearest Neighbors, Random Forest, and Gradient Boosting Regressors. I employed nested group-wise cross-validation to assess prediction accuracy, with results visualized via choropleth maps showing municipal-level prediction errors.


Repository 3: Forecasting Emergency Room Visits Using Time Series Analysis

This repository contains a jupyter file in which I forecast daily emergency room visits at an Iowa hospital (Jan 2014–Aug 2017) using time series analysis. In this project I use NumPy, Pandas, Matplotlib, Sktime, Scikit-learn, and XGBoost libraries.

The analyses include time series decomposition (STL), autocorrelation (ACF/PACF), ARIMA/AutoARIMA, exponential smoothing, and machine learning approaches such as K-Nearest Neighbors (KNN) and XGBoost, with accuracy evaluated using mean absolute percentage error (MAPE).


Repository 4: Analysis of 5 Largest Green ETFs

This repository contains a jupyter notebook file in which I analyze the performance of the top 5 largest ETFs geared towards sustainability and green technologies with respect to each other and the Vanguard Total Market ETF (VTI). In this project I use the following libraries to conduct my analyses: NumPy, Pandas, Matplotlib, Seaborn, and SciPy.


Repository 5: Central Park Temperature Time-Series Analysis

For this project, I conducted a time-series analysis of the average monthly temperatures in Central Park, NY from 1870-2023. My analysis culminated in a SARIMA model which I used to predict the temperatures for the remainder of 2024. To run my analyses, I used the following libraries: NumPy, Pandas, Matplotlib, Statsmodels, and Pmdarima.


Repository 6: Remote SQL Query Tool

This is a script that allows you to connect to a remote MySQL server and execute an SQL query.


Repository 7: English/Polish Number Translator

This script asks the user to input an integer from 0 to 1,000,000, and returns that number written out in English, and Polish, along with the IPA and English pronunciations of the Polish numbers.


Repository 8: Text File Generator and Word Counter

Finally, this script asks the user to write to a text file they created, and subsequently returns a total count of the English words in the text, together with a count of the unique English words it contains.


Repository 9: Games

This repository contains simple scripts for the games tic-tac-toe and blackjack in which I employ object-oriented programming.

R Projects

This section highlights my work in R. While currently featuring one project, it demonstrates my ability to apply R for advanced statistical analysis and Bayesian modeling. I am continuing to build this section as I expand my R portfolio.

Repository 1: Bayesian Multilevel Analysis of Cardiovascular Disease Mortality

This repository contains an R project in which I analyze cardiovascular disease (CVD) mortality among European women aged 65–69 using Bayesian multilevel modeling via the brms library.

The analysis pipeline includes data preprocessing (handling missing values, log-transforming skewed variables, scaling, and centering), Bayesian multilevel Beta regression modeling with hierarchical structures for region and country, posterior predictive checks, model comparison via k-fold cross-validation, and interpretation of model parameters. Sensitivity analyses for prior specifications and phi parameter adjustments are also included to ensure model robustness.


Excel Projects

The following excel projects were selected to demonstrate my ability to work with large data sets which need to be cleaned/adjusted for clarity. The first project contains a dashboard which visualizes the data pertaining to the rates of heart disease among Americans in 2020, and the second provides a breakdown of the construction of affordable housing in New York City from 2014-2018.

Dashboard 1: U.S. Heart Disease Statistics

This dashboard explores the effects of lifestyle and various health conditions on the rates of heart disease among Americans. Sepecifically, it examines these effects between men and women, and between difference races and age groups.


Dashboard 2: NYC Affordable Housing Construction

This dashboard organizes the statistics on the construction of affordable housing in New York City from 2014 to 2018 according to borough, start/finish year, and level of income.

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form