From the course: Predictive Analytics Essential Training: Data Mining

Unlock the full course today

Join today to access over 24,100 courses taught by industry experts.

Selecting relevant data

Selecting relevant data

- [Instructor] This next topic is one of my favorites because there's so much confusion around it. Folks often think that it's either easier or better or both to use all of your data. When big data first became a popular phrase, a widely read book came out that tried to suggest that drawing a sample from a population was old fashioned, that the only reason we used a sample was that computers at the time couldn't handle large datasets. It creates this image that we just throw the data in and let the algorithm figure it out. We still sample for lots of reasons. One good one is you wouldn't want to drain the whole river to test the water, but there are other reasons you can't or shouldn't use all the data. So our next element is that you have to be thoughtful about the data that you select. And here, we're focused on selecting the cases or instances. In other words, the rows of the dataset. The most important reason that we…

Contents