@ -380,108 +380,7 @@ Treat the following questions as coding tasks and attempt to answer them without
You may have noticed that there are 127 rows that have both "No Negative" and "No Positive" values for the columns `Negative_Review` and `Positive_Review` respectively. That means that the reviewer gave the hotel a numerical score, but declined to write either a positive or negative review. Luckily this is a small amount of rows (127 out of 515738, or 0.02%), so it probably won't skew our model or results in any particular direction, but you might not have expected a data set of reviews to have rows with no reviews, so it's worth exploring the data to discover rows like this.
### Modifying the dataframe
Now that you've explored the dataset, you can see some issues with it. Some columns are are filled with useless information, others are just incorrect. If they are correct, it's unclear how they were calculated, and answers cannot be independently verified by your own calculations.
Next, you will add columns that will be useful later, change the values in other columns, and drop certain columns completely.
2. Replace Hotel_Address values with the following values (if the address contains the same of the city and the country, change it to just the city and the country).
These are the only cities and countries in the dataset:
Amsterdam, Netherlands
Barcelona, Spain
London, United Kingdom
Milan, Italy
Paris, France
Vienna, Austria
```python
def replace_address(row):
if "Netherlands" in row["Hotel_Address"]:
return "Amsterdam, Netherlands"
elif "Barcelona" in row["Hotel_Address"]:
return "Barcelona, Spain"
elif "United Kingdom" in row["Hotel_Address"]:
return "London, United Kingdom"
elif "Milan" in row["Hotel_Address"]:
return "Milan, Italy"
elif "France" in row["Hotel_Address"]:
return "Paris, France"
elif "Vienna" in row["Hotel_Address"]:
return "Vienna, Austria"
# Replace all the addresses with a shortened, more useful form