Cleaning data is like cleaning the walls in your house, you clear any scribble, remove the dust, and filter out what is unnecessary that makes your walls ugly and get rid of it.
The same thing happens when cleaning your data, it’s filtering what we want and removing what we don’t want to make the raw data useful and not raw anymore.
You can do the cleaning with Python, R, or whatever language you prefer but in this tutorial, I’m going to explain how you can clean your text files at the command line files by giving insights from a paper researching clickbait and non-clickbait data.