Click To Redirect

Pages

Art: Correlation, Causation and More

So, after much chagrin from me you have finally become data analysts, you have finally started looking into your data, and started trusting your data. And the day you decide to do all this, you notice something creepy. You start looking into all those movies that Nicholas Cage starred in and started seeing if that mean something. Lo and behold, you come out with following chart:


You, as final hope for people drowning in pool decide to intervene and ensure that Nicholas Cage quits Hollywood altogether, after all you have seen the correlation, more movies he will do, more people will drown in a pool.

Needless to say that this is bad analytics. We need to understand something, correlation doesn't mean causation. Just because two data points are tightly correlated, they might not have any effect on each other. There are a lot of times when you would come across such situation, and you have to decide on such spurious data, which would only mean one thing, that is wrong decisions, and may be dealing a death blow to your own career.

Whenever you are analyzing data within any kind of analytics tool that you are using right now, ask yourself, if this data is explaining my business problem? Who has reported this data? Who has audited this data? Can you ascertain the quality of this data?

We all know that data collection is always a huge issue, we all get it, so now let's get over it. Once done that, now take a long and hard look at your data, and see if it is having any significance, relation to your business problem?

Now let's think if there can be a bias in data reporting? Who is getting benefited from the data that is provided to you? can there be any alternative hypothesis or reason for the same?

Have reasonable skepticism over the data that is provided to you, and run your tests to ensure that everything is as good as it looks. Once you have done your tests, and you are reasonably satisfied with the data provided to you, now run your correlations and trust your data to deliver valuable insights to you.

Ensure that while you are taking decisions on the basis of the insights that you have derived form this data, you are also improving your data quality by improving data collection methods, by optimizing your ETL operations and storing your data in the form and place from where you can easily extract.

Things don't go 100% according to your plan, so of course, best of business decisions (at that time) look stupid in retrospect, but don't let poor data understanding be one of the reasons of your failure.