Friday, May 22, 2009

Correlation = Causation?

One thing that you hear all the time is that correlation does not equal causation. Basically, just because two things have outcomes that appear to be correlated doesn't mean that the underlying reasons for the outcomes are truly dependent. The lesson that you're supposed to take away from this is that just because you see a correlation, you shouldn't place too much weight on it, unless you can show causation.

So what the heck do you do when you have a very good conceptual argument for causation, but the correlation is nil?

A case study: 2008-09 O.N. Thugs, UPL Basketball.

If you think about the stats that we use (PTS, 3PM, FG%, FT%, AST, BLK, STL, REB, OREB, A/TO), you'd probably guess that 3PM and PTS would be correlated (though not necessarily in a super high way), and you'd figure that the accuracy stats would be correlated with PTS, and also that AST would be correlated with A/TO. But two categories should stand out: OREB and REB. Because each OREB is also a REB. So, if a player is a good offensive rebounder, then he'll likely be a good rebounder. in fact, it would be somewhat shocking if a team was very good at OREB, but bad at REB. Yet somehow, halfway through the season, I was something like 2nd in offensive rebounding (and had the highest rate by far), but was 9th in overall rebounding (with a rate that was about 7th). Eventually, things caught up a bit, and I finished 1st in OREB, and 4th in REB, but even that was a bit strange. So maybe it's reasonable to expect things to even out over time.

The current problem: 2009 O.N. Thugs, UPL Baseball.

If you think about the stats in baseball (R, HR, RBI, SB, OBP, SLG, W, L, SV, K, ERA, and WHIP), the ones that you'd figure to be the most correlated are HR with RBI and HR with R, since each HR you hit guarantees 1 R and 1 RBI. And you expect a little lesser correlation between HR and SLG. But take a look at this:

Team R HR RBI SB OBP SLG RBI/HR R/HR
'90 Reds 249 55 229 30 0.388 0.506 4.164 4.527
O.N. Thugs 256 47 253 43 0.397 0.471 5.383 5.447
IamJabrone 248 78 238 41 0.358 0.495 3.051 3.179
Westy's Sluggers 247 74 259 37 0.389 0.522 3.500 3.338
Black Sox 231 66 249 26 0.361 0.476 3.773 3.500
Cheeseheads 230 61 223 43 0.346 0.458 3.656 3.770
Muddy Mush Heads 239 51 195 54 0.360 0.444 3.824 4.686
IStillSuckCurveballs 234 58 244 14 0.382 0.489 4.207 4.034
Phatsnapper 189 54 219 23 0.346 0.456 4.056 3.500
TheJimmyDixLongballs 202 54 226 40 0.334 0.450 4.185 3.741
Benver Droncos 244 63 238 29 0.338 0.460 3.778 3.873
Hats for Bats 224 51 212 40 0.338 0.424 4.157 4.392

Somehow, I have managed to lead the league in R, and am a close 2nd in RBI. But I am dead last in HR. Compared to the Jabrones, I have 31 fewer HR, which means that I've managed to score 8 more R, despite giving away 31 R from my lack of HR. And historically, R:HR and RBI:HR ratios come in around 4 (just a quick glance suggests that 3.5 to 4.5 are reasonable values to expect. Note, the R:HR can be a lot further off, given the relatively less rare case of guys who score 90+ runs on only 10 or so HR (whereas someone like Adam Dunn, whose 100 RBI on 40 HR is about as low as you'd probably get). But overall, as you look at how UPL teams are put together, you see some stability in these ratios. And then you have the '09 O.N. Thugs, who are at about 5.4 to 1 for both R and RBI.

So what does this mean? I have no clue. Moving forward, you can either make the case that a) I'm due for a bunch of HR since my team is good, but just underachieving right now in power, or b) my team sucks and has been overachieving with everthing other than HR. I think that a) is more likely than b), although I'm definitely biased on this one.

I don't really know what to make of this right now, and will think about this more, but I definitely have had some interesting questions open up regarding how teams should be constructed, if you take a statistical look at the way the UPL is structured. Taking a look at some of these insights is interesting. For example, I'd bet that if you were to take a poll that asked fantasy players which of the 6 offensive categories would be the most useful in predicting fantasy success, you'd probably get HR as the overwhelming answer, with SB being the worst. However, my first look at the numbers suggests that if you were to use only one criteria in evaluating offense, you should look at RBI over anything else (though this is very preliminary, and restricted to historical UPL numbers).

-Chairman (aka O.N. Thugs)

6 comments:

Westy said...

Interesting. Especially considering that C-Lauff and I are at the other end of the spectrum, featuring the fewest R and RBI per HR. Are we also due for a correction? And which way will it go?

Chairman said...

Well, I'm pretty sure that Inge and Ibanez aren't going to combine for 105 HR this year (which is about what they're on pace for), so I'm guessing that your HR rate will slow down a bit and even things out. But your rate isn't that out of line. And like I said earlier in the season, your offense was going to be pretty good.

On the other hand, I have a feeling that C-Lauff's offense is living on borrowed time. Honestly, if you look at Aaron Hill's numbers so far this year, the best explanation is that he's on HGH, so I'm guessing that he's due for a serious correction. Mark Reynolds' numbers look to be a little inflated. And Damon probably doesn't end up with over 40 HR, either (though the new ball park is a mess...).

With my guys, I don't have a single hitter who's hit more HR's than you'd expect, so I'm figuring that I'm due... and even though I'm in 2nd place right now, I can't help but feel that my team's underachieved across the board. I mean, David's Wright's good for 35 HR, not 15, right?

On the bright side, my pitching's getting hot, and I'm assuming that some W's will start flowing.

clauff said...

Roland,

How many times have I told you to stop making my players into voodoo dolls for your own amusement?

Seriously.

Chairman said...

Well, so far, the voodoo has hit Greg's pitching and your hitting. Next up are Rup's pitching and Westy's hitting.

If my witch doctor keeps up the good work, I'll be on my way to a 130 point season :-)

clauff said...

So, if I can be serious for a moment...

What is interesting is that I'm leading the league in HRs, but I'm in like 8th place in RBIs. More than anything, I think it's probably a bit more believable than your OREB versus REB discrepancy, but I think there's an element of "being clutch" that players need to have in order to make your team successful. I've noticed most of my HRs have been the solo variety, which is very frustrating to say the least.

Is a stat like BA with RISP fairly consistent for an established player, year over year? If so, I believe that may be a stat I look for in assessing players in the future. Of course, RBI's are probably a better indicator of "clutch", but that's also a product of other things like where you hit in a lineup and what team you're on.

Clearly, I have very few "clutch" hitters on my team. I need to do something about it.

Chairman said...

I haven't looked at "clutch" stats super-closely. From what I seem to recall from reading various articles is that clutch is a myth, but there's enough support for clutch performance that there's likely something there (though probably a small impact). But all told, my advice is to forget about "clutch" and just go for "good."

As far as your RBI totals, I'd look at a couple things. HR is only one way of getting RBI. You can also get plenty of RBI with your hits (and occasionally walks). But Davis and Bruce have very low OBP (really, batting average is probably a better indicator). And Reynolds is on the low side, as well. So it's not surprising that they have low RBI totals. Given that even for Adam Dunn, 70% of his hits are not HR, it's these other hits that are making up a majority of the RBI total.

Of course, you also need people on base in front of you, so position in the batting order matters. Guys who hit 3rd through 6th will get more RBI, just by accident.

Hill bats 2nd. Napoli bats 7th. Ellsbury leads off. Fukudome bats 2nd. So you've got a bunch of non-RBI spots, plus a couple HR hitters w/ really bad OBP.

All told = low RBI totals.