From the course: Predictive Analytics Essential Training: Data Mining
Unlock the full course today
Join today to access over 24,100 courses taught by industry experts.
Selecting relevant data
From the course: Predictive Analytics Essential Training: Data Mining
Selecting relevant data
- [Instructor] This next topic is one of my favorites because there's so much confusion around it. Folks often think that it's either easier or better or both to use all of your data. When big data first became a popular phrase, a widely read book came out that tried to suggest that drawing a sample from a population was old fashioned, that the only reason we used a sample was that computers at the time couldn't handle large datasets. It creates this image that we just throw the data in and let the algorithm figure it out. We still sample for lots of reasons. One good one is you wouldn't want to drain the whole river to test the water, but there are other reasons you can't or shouldn't use all the data. So our next element is that you have to be thoughtful about the data that you select. And here, we're focused on selecting the cases or instances. In other words, the rows of the dataset. The most important reason that we…
Contents
-
-
-
-
-
Understanding data requirements1m 9s
-
(Locked)
Gathering historical data1m 45s
-
(Locked)
Meeting the flat file requirement1m 42s
-
(Locked)
Determining your target variable1m 40s
-
(Locked)
Selecting relevant data3m 14s
-
(Locked)
Hints on effective data integration2m 49s
-
(Locked)
Understanding feature engineering2m 45s
-
(Locked)
Developing your craft1m 20s
-
-
-
-
-
-
-