Okay, so today I’m gonna walk you through my little project: playing around with the Romania Otopeni International Airport data. Nothing too fancy, just a bit of data wrangling and visualization. Let’s dive in!

First things first, I needed the data. I managed to snag some flight data – you know, arrivals, departures, delays, all that jazz – from a few different sources. Scraped some off FlightAware, downloaded some datasets from open data portals, and even pieced together some info from the airport’s own website. It was a bit of a mess, honestly.
Then came the fun part: cleaning. Oh man, the cleaning. Dates in different formats, inconsistent naming conventions, missing values everywhere. I basically lived in Pandas for a week. Used .fillna()
like it was going out of style, converted a million different date formats with *_datetime()
, and spent way too long figuring out regex to extract relevant info from free-text fields. It was tedious, but crucial. You can’t analyze garbage data, right?
After the data was somewhat presentable, I started exploring. I wanted to see the trends – busiest days, most common destinations, airlines with the most delays, the usual suspects. Used Matplotlib and Seaborn for the plots. Nothing super polished, just quick and dirty visuals to get a sense of what was going on. Bar charts for flight counts per airline, line plots for daily arrival trends, scatter plots to see if there was any correlation between flight distance and delays. The usual drill.
One thing that caught my eye was the delay data. Turns out, some airlines were consistently worse than others. I dug a bit deeper and found that the average delay time varied significantly based on the destination. Flights to certain cities were almost always delayed. Interesting stuff!
Next, I tried to build a simple model to predict flight delays. Nothing fancy, just a basic linear regression using features like time of day, day of week, airline, and destination. The accuracy wasn’t amazing, but it was a starting point. I think with more data and some feature engineering, I could definitely improve it.

I also messed around with some geographic visualizations. Plotted flight routes on a map using GeoPandas. It was cool to see the busiest routes visually. Also tried creating a heatmap of flight frequencies across different destinations. Gave a nice overview of the airport’s network.
Finally, I put together a small dashboard using Streamlit. Just a simple web app with a few interactive plots and filters. It lets you select different airlines, destinations, or date ranges and see the corresponding flight statistics. Nothing groundbreaking, but a nice way to present the data and play around with it.
Overall, it was a fun little project. Learned a lot about data cleaning, visualization, and basic machine learning. Plus, I now know way more about the Otopeni Airport than I ever thought I would. Maybe I’ll try another airport next time!