The bias in our stars

3 min readJun 23, 2021

At t=$, when the results just start making sense and the data finally stops mysteriously changing formats, a silent whisper creeps its way into the head of the data wrangler, “Hey! What about the bias?”

Ah, here we go again. Bias, bias, bias! A word we hear so many times and yet don’t hear enough. The question of whether bias in algorithms is merely a reflection of bias in data, whether any dataset can be truly unbiased and the miscellaneous forms in which bias creeps into our lives wreaking havoc left and right is but a short summary of the bias headache. Identifying and correcting bias (if possible) is an important step in ensuring the reliability of results and has been discussed in detail in many works.

But recently while working on yet another mind boggling dataset I wondered if there was a type of bias that indirectly influences our results but hides in plain sight: the bias of intuition. As data scientists, we are often asked to find methods that intuitively make sense, while finding insights and tackling datasets. But can our intuition possibly lead us on the wrong path? Can it be possible that we assume our dataset will show results of a particular nature, and try to interpret the statistical insights to fit in with our preconceived notions but actually end up at the wrong answer? Or can this be blamed on insufficient exploration rather than a human bias?

I found this to be a great puzzle for a long time, so tried out an experiment in real life. We organized a data science competition and gave our participants a completely mislabeled dataset, with changed column names, values and problem definition, which intuitively made sense but was created on a completely imaginary situation. As a part of the data cleaning, we asked the participants to submit their notebooks and to my utter amusement, I saw participants (data scientists to be precise), submitting elaborate explanations for results that gave amazing results on the leaderboard which made sense according to them. No doubt, their algorithms were correct, we hadn’t changed the data distribution or prediction labels, so their math was right. What was wrong was their insights. I realized that they had simply fit interpretations on top of their results, which actually looked pretty decent, but were actually not, as we had the original dataset with us and we knew that their insights made 0 sense. Assuming a correlation between house prices and house neighborhood as a basis for fitting the algorithm makes sense right? But what if the columns were actually Indian guava market price and American city names? Would the insights still make sense? Even if you did obtain a 87% accuracy, would the results be correct?

This made me wonder about how our intuition based on common sense may mislead us as data scientists. Of course, in real life this wouldn’t happen. We wouldn't get an out of context manipulated dataset for applying our skills on, but the problem persists. As our motivation is to tell stories and bring the datasets to life, we may try to fit a not so linear line to the perfect graph of the story and distort the truth. While pondering on this problem, I also came up with another possibility. If our intuition can bring us to make “sensible results” even with fake datasets, could it also be used to get approaches to solve seemingly unsolvable problems? If thinking a certain way, induces a certain bias in following a data science procedure, could we leverage this bias with the help of better statistical approaches? Taking the same example as before, if we did in fact get excellent results with the Indian guava market price and American city names, tracing our algorithm back to the beginning, what if we find that the dataset was actually about the guava prices of guavas exported to American cities, perhaps our misguided correct predictions were actually correct? Is it truly the fault in our stars that guides our fate or is our destiny in our own hands?

The bias in our stars

Written by Aboli Marathe