Home Runs heating up?

My intuition tells me that objects traveling through the air would meet more resistance when there is more moisture in the air. It turns out that my intuition is wrong. It still doesn’t make sense to me but apparently humid air is less dense. And this applies to baseball specifically because the belief is that there are more home runs in the latter half of the season because many parks are in humid areas (east coast bias) and as the summer progresses it gets hotter and hotter and more and more humid. A lot of this is purely anecdotal: “The ball’s really going to start flying out of the park as the weather heats up” and other such nonsense from the mouths of the talking heads we’re forced to listen to while watching a game.

Anyway, after seeing this post at Revolution Analytics I wanted to use the calendar heat map function created by Paul Bleicher.  (source code is available here) And it seemed like a really fitting opportunity to look at how cumulative daily home runs fluctuated over the course of the MLB season. Based on the science behind the humidity factor you would imagine that there would be a, somewhat, obvious increasing trend at least until it starts to cool off at the end of September. Here is how that data looks in one of these calendar heat maps.

From this perspective I’m seeing home run heavy days sprinkled all over the course of the season. The only conclusion that I can come to is that 1) obviously the science is right but the sample size is too small on a daily basis not to be skewed by one big game and 2) the announcers that perpetuate these myths are just parroting each other with no actual check on what comes out of their pie-holes.


6 thoughts on “Home Runs heating up?

  1. Got the data by date? I suspect a good old line chart (time series) would be easier to interpret. Perhaps using a (say) 7-day moving average to smooth out the fluctuations.

    • You’re probably right but i mostly wanted to try out that chart. I think I’ll do a time series plot and add it on to this post though. I’m curious why you think a 7 day average would give additional insight? I’ll check it out both ways. I love an excuse to get more comfortable with ggplot2.

      • I agree with Jon–check the data according to different time bases, such as days across a season, weeks of the season or months of the season. In your current representation, days of the week, there might be other effects masking the one you are looking for. For instance: some teams only play night games on certain days of the week and some teams only play day games on certain days of the week. And in some locations, day to night fluctuations in humidity can be large.

  2. Does the color scale include white (near 25)? It’s a bit hard to tell from the image. How should all boxes “off-season” be interpreted then? I guess another color scale would solve this (0 = white). Nice figure any how, thanks for sharing.

  3. I think you are confusing density and viscosity. Higher temperature would surely decrease viscosity, hence giving home runs. On the other hand, higher temperature and humidity might be harder on the players, giving less home runs. Hence there would be an optimum temperature.
    (p.s. I also like the calender heatmap, don’t know what to do with it yet)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s