NBA Seasonal Analysis

Introduction

The NBA player dataset that is being investigated contains NBA player information for accumalated total stats per regular season and playoffs from 2012-2021 season. Our dataset consists of 29 columns such and 7293 rows.

Questions for our Analysis

Since we have data from 10 seasons it would be interesting to explore the following questions

Data Cleaning for analysis

In this section, we would try to point out instances and occurences in the dataset that might need some cleaning and wrangling. I will first check for null values and any duplicate values that we might need to take note of from our dataset.

Looks like we had no nulls to be worried about and the duplicates that we noted in the dataset are players that played in playoffs and the regular season and their Player_ID will repeat for however many seasons they were actively playing.

Below I will creat my own lists of columns that I want to keep from the datasets for the analysis in the next steps

What players stats are correlated with each other

We will divide our stats by minutes played because our data are totals and to give a fair representations we will like to look at the total stats at a per minute played basis, in order to do this we will group by player,player_id and season to find the correlation.

Now we want to only see players who played atleast a minimum of 50 mins total per season.

Now we have a correlation heatmap below between stats

What players stats are correlated with each other?

From the heatmap that w have above we can see the correlations between player stats.

One that stands out the most is that REB and FG3A are strongly negatively correlated (-0.527), This does tend to make sense as a player who tends to shoot alot of threes will not usually be a great rebounder. This can be due to the fact that most players who shoot alot of threes are not in a great position to rebound.

When looking at rebounding and blocks we can see they are strongly positively correlated(0.627). This also makes sense as a players who is near the rim playing a more defensive role in protecting the basketball tends to go for more blocks and in better rebounding position.

Below are the scatterplot distribution for those 2 specific correlations, to furhter justify our findings.

Distribution of Minutes

We will look at our minutes distribution in our dataset for regualar season and also playoffs and see if we notice any differences in the way the minutes our distributed between those two different time periods of the season.

We can from the histogram above that 642 players play less than 100 mins season, and as we increase the minutes the less count of total players we get, showing that alot of NBA players actually dont get to play too many minutes

Above histogram gives us the percentages instead of count for the minutes distribution for regular season.

Below we will graph the same histogram but look at playoffs dataframe

Distribution of Minutes compared to regular season and playoffs?

From our histogram above we can see that in the playoffs the minutes are not as evenly distributed compared to when players are playing in the regular season. This makes sense as most players in the playoffs will play less minutes and essentially the players who are playing close to 40 plus minutes a game are only less than around 1.5% of players.

Distribution of Points compared to regular season and playoffs?

From our historgram above we can infer that our data seems to be right skewed and that players dont score as much in the playoffs when compared to the regular season. This comes to no surprise as players due seem to play less minutes and also your stars will play more in the playoffs. It is still interesting to examine the distribution because youi can see how much of a small percentage of the players in the NBA actually score more than 20 points per game in the playoffs which seems to be just around 1%.

Which players have the most Points and Assists?

I want to look at accumulated stats for each player summed up within the last 10 seasons. In order to do this I will create a new dataframe where I will group each players regular season stats and get their totals.

From created 5 new columns FG%, 3PT%, Avg_ORB, Avg_DRB, FT% from our accumalated stats from 10 seasons and select the top 10 players with the most Points.

Who were the most effective shooters?

In order to see who was an actual effective shooter I added TSA which is true shooting percentage and true shooting attempts.

Who were the best players defensively, in terms of rebounds and Blocks over the past 10 seasons?

How has the game changing over the past 10 years?

To make this analysis I want to group all the stats on the column 'season_start_year' and then add columns that we had on the dataframe earlier which I will do for our new dataframe change_df below.

How has the game changed in the past 10 years?

From our visual above when we look at the specific statistics (Double Tap on the Legend) over the past 10 years we can notice changes with the statistics of the game.

Some notable changes over the past 10 seasons.

Playoff Comparison

When comparing playoffs to regular season we see that Possession per 48 decrease in the playoffs telling us that the game is slowing down and this could be a result of just better defended possesions in the playoffs than the regular seaons.

Fouls have also increased in the playoffs showing that the game does tend to be more physical in than the regular season in the playoffs. And staying consistent with results from earlier, teams tend to shoot more threes in the playoffs aswell than in the regualar season. Interestingly Turnovers have also seemed to increase in teh playoffs maybe due to the increase in defensive pressure and focus in the palyoffs making it harder to execute on offense.

Conclusion

From the results of our analysis we can recognize and agree that the game has changed over the past 10 seasons. We are seeing an increase in perimeter play as FG3A and FG3A% has increased aswell as assists per game. Although there is an increase in three point shot attempts there does seem to be a decrease in the 3 pt percentage within our datasets, this can raise the question that even though we are seeing a higher volume of 3PT attempts

When looking at the distributions of minutes played in the NBA both playoffs and regular season. It is an eye opener to see how little most NBA players actually get to play and the ones who are getting around 40 minutes of playing time

That being said, a couple of useful insights were drawn from the questions posed earlier at the top of the page which are as follows: