Let’s say there’s a dam to store water.
And there are lots of pipes connected which deliver water to homes.
In the summer months, there’s less rain and there’s little water in the dam.
Water flows without any issues.
This is small data.
In the rainy season, there are torrential rains and now the dam has exceeded it’s capacity.
There’s so much water that it’s overflowing everywhere.
Your pipes are under tremendous stress. They can no longer hold fort and they start leaking.
This is big data without infrastructure.
You try to mitigate this problem. You distribute the way you store water using something called HDFS and use a technique called MapReduce to process water.
You call this Hadoop.
With this, you’re able to control the flow of water.
You now want to understand what’s flowing? Is it just water? Is there something else?
You take a sample of the water, apply something called Machine Learning on it.
Your research helps you understand there’s some industrial discharge into the water.
You approach the authorities and fix it.
And now, you are happy because big data, hadoop and ml has helped you organize everything!