a few clarifications so notebooks will run

pull/73/head
Jen Looper 4 years ago
parent 6332008aa7
commit fb9ef0f02a

@ -202,7 +202,7 @@ Finally, and this is delightful (because it didn't take much processing at all),
| Family with older children | 26349 | | Family with older children | 26349 |
| With a pet | 1405 | | With a pet | 1405 |
You could argue that `Travellers with friends` is the same as `Group` more or less, and that would be fair to combine the two as above. The code for identifying the correct tags is [the Tags notebook](solution/notebook-tags.ipynb). You could argue that `Travellers with friends` is the same as `Group` more or less, and that would be fair to combine the two as above. The code for identifying the correct tags is [the Tags notebook](solution/1-notebook.ipynb).
The final step is to create new columns for each of these tags. Then, for every review row, if the `Tag` column matches one of the new columns, add a 1, if not, add a 0. The end result will be a count of how many reviewers chose this hotel (in aggregate) for, say, business vs leisure, or to bring a pet to, and this is useful information when recommending a hotel. The final step is to create new columns for each of these tags. Then, for every review row, if the `Tag` column matches one of the new columns, add a 1, if not, add a 0. The end result will be a count of how many reviewers chose this hotel (in aggregate) for, say, business vs leisure, or to bring a pet to, and this is useful information when recommending a hotel.
@ -227,11 +227,11 @@ df["With_a_pet"] = df.Tags.apply(lambda tag: 1 if "With a pet" in tag else 0)
Finally, save the dataset as it is now with a new name. Finally, save the dataset as it is now with a new name.
```python ```python
df.drop(["Tags", "Review_Total_Negative_Word_Counts", "Review_Total_Positive_Word_Counts", "days_since_review", "Total_Number_of_Reviews_Reviewer_Has_Given"], axis = 1, inplace=True) df.drop(["Review_Total_Negative_Word_Counts", "Review_Total_Positive_Word_Counts", "days_since_review", "Total_Number_of_Reviews_Reviewer_Has_Given"], axis = 1, inplace=True)
# Saving new data file with calculated columns # Saving new data file with calculated columns
print("Saving results to Hotel_Reviews_Filtered.csv") print("Saving results to Hotel_Reviews_Filtered.csv")
df.to_csv(r'Hotel_Reviews_Filtered.csv', index = False) df.to_csv(r'../data/Hotel_Reviews_Filtered.csv', index = False)
``` ```
## Sentiment Analysis Operations ## Sentiment Analysis Operations
@ -245,8 +245,10 @@ Note that now you are loading the filtered dataset that was saved in the previou
```python ```python
import time import time
import pandas as pd import pandas as pd
import nltk as nltk
from nltk.corpus import stopwords from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')
# Load the filtered hotel reviews from CSV # Load the filtered hotel reviews from CSV
df = pd.read_csv('../../data/Hotel_Reviews_Filtered.csv') df = pd.read_csv('../../data/Hotel_Reviews_Filtered.csv')
@ -256,7 +258,7 @@ df = pd.read_csv('../../data/Hotel_Reviews_Filtered.csv')
# Finally remember to save the hotel reviews with new NLP data added # Finally remember to save the hotel reviews with new NLP data added
print("Saving results to Hotel_Reviews_NLP.csv") print("Saving results to Hotel_Reviews_NLP.csv")
df.to_csv(r'../../data/Hotel_Reviews_NLP.csv', index = False) df.to_csv(r'../data/Hotel_Reviews_NLP.csv', index = False)
``` ```
### Removing stop words ### Removing stop words
@ -342,7 +344,7 @@ The very last thing to do with the file before using it in the challenge, is to
df = df.reindex(["Hotel_Name", "Hotel_Address", "Total_Number_of_Reviews", "Average_Score", "Reviewer_Score", "Negative_Sentiment", "Positive_Sentiment", "Reviewer_Nationality", "Leisure_trip", "Couple", "Solo_traveler", "Business_trip", "Group", "Family_with_young_children", "Family_with_older_children", "With_a_pet", "Negative_Review", "Positive_Review"], axis=1) df = df.reindex(["Hotel_Name", "Hotel_Address", "Total_Number_of_Reviews", "Average_Score", "Reviewer_Score", "Negative_Sentiment", "Positive_Sentiment", "Reviewer_Nationality", "Leisure_trip", "Couple", "Solo_traveler", "Business_trip", "Group", "Family_with_young_children", "Family_with_older_children", "With_a_pet", "Negative_Review", "Positive_Review"], axis=1)
print("Saving results to Hotel_Reviews_NLP.csv") print("Saving results to Hotel_Reviews_NLP.csv")
df.to_csv(r"Hotel_Reviews_NLP.csv", index = False) df.to_csv(r"../data/Hotel_Reviews_NLP.csv", index = False)
``` ```
You should run the entire code for [the analysis notebook](solution/notebook-sentiment-analysis.ipynb) (after you've run [your filtering notebook](solution/notebook-filtering.ipynb) to generate the Hotel_Reviews_Filtered.csv file). You should run the entire code for [the analysis notebook](solution/notebook-sentiment-analysis.ipynb) (after you've run [your filtering notebook](solution/notebook-filtering.ipynb) to generate the Hotel_Reviews_Filtered.csv file).

Loading…
Cancel
Save