@ -10,8 +10,8 @@ Classification is a form of [supervised learning](https://wikipedia.org/wiki/Sup
Remember:
- **Linear regression**, helped you predict relationships between variables and make accurate predictions on where a new datapoint would fall in relationship to that line. So, you could predict _what price a pumpkin would be in September vs. December_, for example.
- **Logistic regression**, helped you discover "binary categories": at this price point, _is this pumpkin orange or not-orange_?
- **Linear regression** helped you predict relationships between variables and make accurate predictions on where a new datapoint would fall in relationship to that line. So, you could predict _what price a pumpkin would be in September vs. December_, for example.
- **Logistic regression** helped you discover "binary categories": at this price point, _is this pumpkin orange or not-orange_?
Classification uses various algorithms to determine other ways of determining a data point's label or class. Let's work with this cuisine data to see whether, by observing a group of ingredients, we can determine its cuisine of origin.
In this section you will use the techniques in the previous lessons to do some exploratory data analysis of a large dataset. Once you have a good understanding of the usefulness of the various columns, you will learn how to remove the unneeded columns, calculate some new data based on the existing columns, and save the resulting dataset for use in the final challenge.
@ -93,7 +93,7 @@ Here they are grouped in a way that might be easier to examine:
##### Examples
| Average Score | Total Number Reviews | Reviewer Score | Negative <br/>Review | Positive Review | Tags |
| 7.8 | 1945 | 2.5 | This is currently not a hotel but a construction site I was terroized from early morning and all day with unacceptable building noise while resting after a long trip and working in the room People were working all day i e with jackhammers in the adjacent rooms I asked for a room change but no silent room was available To make thinks worse I was overcharged I checked out in the evening since I had to leave very early flight and received an appropiate bill A day later the hotel made another charge without my concent in excess of booked price It s a terrible place Don t punish yourself by booking here | Nothing Terrible place Stay away | Business trip Couple Standard Double Room Stayed 2 nights |
As you can see from this guest, they did not have a happy stay at this hotel. The hotel has a good average score of 7.8 and 1945 reviews, but this reviewer gave it 2.5 and wrote 115 words about how negative their stay was. If they wrote nothing at all in the Positive_Review column, you might surmise there was nothing positive, but alas they wrote 7 words of warning. If we just counted words instead of the meaning, or sentiment of the words, we might have a skewed view of the reviewers intent. Strangely, their score of 2.5 is confusing, because if that hotel stay was so bad, why give it any points at all? Investigating the dataset closely, you'll see that the lowest possible score is 2.5, not 0. The highest possible score is 10.
@ -496,6 +496,16 @@ print("Saving results to Hotel_Reviews_Filtered.csv")
df.to_csv(r'Hotel_Reviews_Filtered.csv', index = False)