diff --git a/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/.ipynb_checkpoints/酒店推荐-checkpoint.ipynb b/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/.ipynb_checkpoints/酒店推荐-checkpoint.ipynb index 4c5ff63..14a53b2 100644 --- a/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/.ipynb_checkpoints/酒店推荐-checkpoint.ipynb +++ b/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/.ipynb_checkpoints/酒店推荐-checkpoint.ipynb @@ -12,7 +12,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 71, "metadata": {}, "outputs": [ { @@ -57,7 +57,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -144,7 +144,7 @@ "4 Situated amid incredible shopping and iconic a... " ] }, - "execution_count": 4, + "execution_count": 2, "metadata": {}, "output_type": "execute_result" } @@ -163,7 +163,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -172,7 +172,7 @@ "(152, 3)" ] }, - "execution_count": 5, + "execution_count": 3, "metadata": {}, "output_type": "execute_result" } @@ -183,7 +183,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -192,7 +192,7 @@ "\"Located on the southern tip of Lake Union, the Hilton Garden Inn Seattle Downtown hotel is perfectly located for business and leisure. \\nThe neighborhood is home to numerous major international companies including Amazon, Google and the Bill & Melinda Gates Foundation. A wealth of eclectic restaurants and bars make this area of Seattle one of the most sought out by locals and visitors. Our proximity to Lake Union allows visitors to take in some of the Pacific Northwest's majestic scenery and enjoy outdoor activities like kayaking and sailing. over 2,000 sq. ft. of versatile space and a complimentary business center. State-of-the-art A/V technology and our helpful staff will guarantee your conference, cocktail reception or wedding is a success. Refresh in the sparkling saltwater pool, or energize with the latest equipment in the 24-hour fitness center. Tastefully decorated and flooded with natural light, our guest rooms and suites offer everything you need to relax and stay productive. Unwind in the bar, and enjoy American cuisine for breakfast, lunch and dinner in our restaurant. The 24-hour Pavilion Pantry? stocks a variety of snacks, drinks and sundries.\"" ] }, - "execution_count": 6, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -211,7 +211,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -221,7 +221,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -236,7 +236,7 @@ " [0, 0, 0, ..., 1, 0, 0]], dtype=int64)" ] }, - "execution_count": 8, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -247,7 +247,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -256,7 +256,7 @@ "(152, 3200)" ] }, - "execution_count": 9, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -267,7 +267,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -276,7 +276,7 @@ "matrix([[ 1, 11, 11, ..., 2, 6, 2]], dtype=int64)" ] }, - "execution_count": 10, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -288,7 +288,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 9, "metadata": {}, "outputs": [ { @@ -1297,7 +1297,7 @@ " ...]" ] }, - "execution_count": 13, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -1309,7 +1309,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 10, "metadata": {}, "outputs": [ { @@ -2318,7 +2318,7 @@ " ...]" ] }, - "execution_count": 14, + "execution_count": 10, "metadata": {}, "output_type": "execute_result" } @@ -2337,7 +2337,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 11, "metadata": {}, "outputs": [], "source": [ @@ -2353,7 +2353,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 12, "metadata": {}, "outputs": [ { @@ -2381,7 +2381,7 @@ " ('on', 129)]" ] }, - "execution_count": 16, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -2393,7 +2393,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 13, "metadata": {}, "outputs": [ { @@ -2460,7 +2460,7 @@ "4 to 471" ] }, - "execution_count": 17, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -2472,7 +2472,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 14, "metadata": { "scrolled": false }, @@ -2483,7 +2483,7 @@ "Text(0.5, 1.0, 'top 20')" ] }, - "execution_count": 28, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, @@ -2516,7 +2516,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ @@ -2532,7 +2532,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 16, "metadata": {}, "outputs": [ { @@ -2599,7 +2599,7 @@ "4 free 123" ] }, - "execution_count": 30, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -2612,7 +2612,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 17, "metadata": {}, "outputs": [ { @@ -2621,7 +2621,7 @@ "Text(0.5, 1.0, 'top 20')" ] }, - "execution_count": 31, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, @@ -2654,7 +2654,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ @@ -2670,7 +2670,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 19, "metadata": {}, "outputs": [ { @@ -2679,7 +2679,7 @@ "Text(0.5, 1.0, 'top 20')" ] }, - "execution_count": 33, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, @@ -2712,6 +2712,730 @@ "这样所有的词都连起来了,第一个词Pike Place是西雅图的一个广场、以及wifi等关键字眼。" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 文本清洗\n", + "描述的一些统计信息" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "df['word_count'] = df['desc'].apply(lambda x:len(str(x).split())) # 计算每个描述的长度" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameaddressdescword_count
0Hilton Garden Seattle Downtown1821 Boren Avenue, Seattle Washington 98101 USALocated on the southern tip of Lake Union, the...184
1Sheraton Grand Seattle1400 6th Avenue, Seattle, Washington 98101 USALocated in the city's vibrant core, the Sherat...152
2Crowne Plaza Seattle Downtown1113 6th Ave, Seattle, WA 98101Located in the heart of downtown Seattle, the ...147
3Kimpton Hotel Monaco Seattle1101 4th Ave, Seattle, WA98101What?s near our hotel downtown Seattle locatio...150
4The Westin Seattle1900 5th Avenue, Seattle, Washington 98101 USASituated amid incredible shopping and iconic a...151
\n", + "
" + ], + "text/plain": [ + " name \\\n", + "0 Hilton Garden Seattle Downtown \n", + "1 Sheraton Grand Seattle \n", + "2 Crowne Plaza Seattle Downtown \n", + "3 Kimpton Hotel Monaco Seattle \n", + "4 The Westin Seattle \n", + "\n", + " address \\\n", + "0 1821 Boren Avenue, Seattle Washington 98101 USA \n", + "1 1400 6th Avenue, Seattle, Washington 98101 USA \n", + "2 1113 6th Ave, Seattle, WA 98101 \n", + "3 1101 4th Ave, Seattle, WA98101 \n", + "4 1900 5th Avenue, Seattle, Washington 98101 USA \n", + "\n", + " desc word_count \n", + "0 Located on the southern tip of Lake Union, the... 184 \n", + "1 Located in the city's vibrant core, the Sherat... 152 \n", + "2 Located in the heart of downtown Seattle, the ... 147 \n", + "3 What?s near our hotel downtown Seattle locatio... 150 \n", + "4 Situated amid incredible shopping and iconic a... 151 " + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAANR0lEQVR4nO3db4xld13H8ffHLv8K1QKdktoyThubBkKgJRNorVEsYBbaUB/0AQ1g1Zp5IloMCW5DIvFZjQbQaNCNrZDYFGOB0LRRuikQYoLF3bLAlm1pwRXWVrak/DFoLNWvD+ZsMw67M7P3npnZ79z3K7m59/zub+75/s7c+cyZc8/5TaoKSVI/P7HdBUiSJmOAS1JTBrgkNWWAS1JTBrgkNbVrK1d2zjnn1MLCwlauUpLaO3DgwHeqam51+5YG+MLCAvv379/KVUpSe0n+9UTtHkKRpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKa29EpM9bCw556TPnfklqu3sBJJa3EPXJKaMsAlqSkDXJKaMsAlqSkDXJKaMsAlqSkDXJKaMsAlqSkDXJKaMsAlqSkDXJKaWjfAk9yW5FiSQyva/ijJQ0m+nOQTSc7e3DIlSattZA/8w8DuVW37gFdU1SuBrwE3j1yXJGkd6wZ4VX0OeHJV271V9fSw+E/ABZtQmyRpDWMcA/8N4O9HeB1J0imYKsCTvBd4Grh9jT5LSfYn2f/EE09MszpJ0goTB3iSG4BrgLdVVZ2sX1XtrarFqlqcm5ubdHWSpFUm+o88SXYDvwf8YlX957glSZI2YiOnEd4BfB64JMnRJDcCfwacBexLcjDJX2xynZKkVdbdA6+q60/QfOsm1CJJOgVeiSlJTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTRngktSUAS5JTa0b4EluS3IsyaEVbS9Ksi/JI8P9Cze3TEnSahvZA/8wsHtV2x7gvqq6GLhvWJYkbaF1A7yqPgc8uar5WuAjw+OPAL8ycl2SpHVMegz8JVX1OMBwf+54JUmSNmLXZq8gyRKwBDA/P7/Zq9MJLOy554TtR265eosrkTSmSffAv53kPIDh/tjJOlbV3qparKrFubm5CVcnSVpt0gC/C7hheHwD8MlxypEkbdRGTiO8A/g8cEmSo0luBG4B3pjkEeCNw7IkaQutewy8qq4/yVOvH7kWSdIp8EpMSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpjZ9OllNbrOngT3Z60/yNZs9Na1T4ko/zj1wSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpgxwSWrKAJekpqYK8CS/m+TBJIeS3JHkuWMVJkla28QBnuR84HeAxap6BXAG8NaxCpMkrW3aQyi7gOcl2QWcCTw2fUmSpI2YeD7wqvq3JH8MfBP4L+Deqrp3db8kS8ASwPz8/KSr0w7j/N7S9KY5hPJC4FrgQuCngecnefvqflW1t6oWq2pxbm5u8kolSf/PNIdQ3gD8S1U9UVU/Aj4O/Nw4ZUmS1jNNgH8TuDzJmUkCvB44PE5ZkqT1TBzgVXU/cCfwAPCV4bX2jlSXJGkdU/1T46p6H/C+kWqRJJ0Cr8SUpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqygCXpKYMcElqaqpL6Ts62TzUcPK5qLvMXb3W2CTtPO6BS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNTVVgCc5O8mdSR5KcjjJFWMVJkla27SzEf4J8A9VdV2SZwNnjlCTJGkDJg7wJD8J/ALwawBV9RTw1DhlSZLWM80e+EXAE8BfJ3kVcAC4qap+uLJTkiVgCWB+fn6K1Z2a03Fu7C7zio/pVL8Pp+P3TTpdTXMMfBfwauBDVXUZ8ENgz+pOVbW3qharanFubm6K1UmSVpomwI8CR6vq/mH5TpYDXZK0BSYO8Kr6d+BbSS4Zml4PfHWUqiRJ65r2LJTfBm4fzkD5BvDr05ckSdqIqQK8qg4CiyPVIkk6BV6JKUlNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNGeCS1JQBLklNTR3gSc5I8sUkd49RkCRpY8bYA78JODzC60iSTsFUAZ7kAuBq4K/GKUeStFG7pvz6DwLvAc46WYckS8ASwPz8/JSr25kW9tyzqf13spNtiyO3XL2prz/mOqRJTbwHnuQa4FhVHVirX1XtrarFqlqcm5ubdHWSpFWmOYRyJfCWJEeAjwJXJfmbUaqSJK1r4gCvqpur6oKqWgDeCny6qt4+WmWSpDV5HrgkNTXth5gAVNVngc+O8VqSpI1xD1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJampUS6l3ymcl1tSJ+6BS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNWWAS1JTBrgkNTVxgCd5aZLPJDmc5MEkN41ZmCRpbdPMRvg08O6qeiDJWcCBJPuq6qsj1SZJWsPEe+BV9XhVPTA8/g/gMHD+WIVJktY2ynzgSRaAy4D7T/DcErAEMD8/P8bqdBrqMjf6yeo8csvVW1yJNL2pP8RM8gLgY8C7quoHq5+vqr1VtVhVi3Nzc9OuTpI0mCrAkzyL5fC+vao+Pk5JkqSNmOYslAC3Aoer6v3jlSRJ2ohp9sCvBN4BXJXk4HB780h1SZLWMfGHmFX1j0BGrEWSdAq8ElOSmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJakpA1ySmjLAJampUeYD3wpd5pvW6eFU3y9jvr9Odc7xsdbdaU7z021e9rHqWet7uRljcw9ckpoywCWpKQNckpoywCWpKQNckpoywCWpKQNckpoywCWpKQNckpoywCWpKQNckpoywCWpqakCPMnuJA8neTTJnrGKkiStb+IAT3IG8OfAm4CXA9cneflYhUmS1jbNHvhrgEer6htV9RTwUeDaccqSJK0nVTXZFybXAbur6jeH5XcAr62qd67qtwQsDYuXAA+veqlzgO9MVMTOMMvjn+Wxg+Of5fGf6th/pqrmVjdO8w8dcoK2H/ttUFV7gb0nfZFkf1UtTlFHa7M8/lkeOzj+WR7/WGOf5hDKUeClK5YvAB6brhxJ0kZNE+D/DFyc5MIkzwbeCtw1TlmSpPVMfAilqp5O8k7gU8AZwG1V9eAEL3XSwyszYpbHP8tjB8c/y+MfZewTf4gpSdpeXokpSU0Z4JLU1LYF+Cxchp/ktiTHkhxa0faiJPuSPDLcv3BoT5I/HbbHl5O8evsqH0eSlyb5TJLDSR5MctPQvuO3QZLnJvlCki8NY/+Dof3CJPcPY//b4QQAkjxnWH50eH5hO+sfS5Izknwxyd3D8syMP8mRJF9JcjDJ/qFt1Pf+tgT4DF2G/2Fg96q2PcB9VXUxcN+wDMvb4uLhtgR8aItq3ExPA++uqpcBlwO/NXyfZ2Eb/DdwVVW9CrgU2J3kcuAPgQ8MY/8ucOPQ/0bgu1X1s8AHhn47wU3A4RXLszb+X6qqS1ec8z3ue7+qtvwGXAF8asXyzcDN21HLFox1ATi0Yvlh4Lzh8XnAw8PjvwSuP1G/nXIDPgm8cda2AXAm8ADwWpavvts1tD/zc8Dy2VxXDI93Df2y3bVPOe4LhpC6Crib5Yv/Zmn8R4BzVrWN+t7frkMo5wPfWrF8dGibBS+pqscBhvtzh/YdvU2GP4kvA+5nRrbBcPjgIHAM2Ad8HfheVT09dFk5vmfGPjz/feDFW1vx6D4IvAf432H5xczW+Au4N8mBYUoRGPm9P82l9NPY0GX4M2bHbpMkLwA+Bryrqn6QnGioy11P0NZ2G1TV/wCXJjkb+ATwshN1G+531NiTXAMcq6oDSV53vPkEXXfk+AdXVtVjSc4F9iV5aI2+E41/u/bAZ/ky/G8nOQ9guD82tO/IbZLkWSyH9+1V9fGheaa2QVV9D/gsy58DnJ3k+I7TyvE9M/bh+Z8CntzaSkd1JfCWJEdYnqn0Kpb3yGdl/FTVY8P9MZZ/gb+Gkd/72xXgs3wZ/l3ADcPjG1g+Lny8/VeHT6MvB75//E+trrK8q30rcLiq3r/iqR2/DZLMDXveJHke8AaWP8z7DHDd0G312I9vk+uAT9dwMLSjqrq5qi6oqgWWf74/XVVvY0bGn+T5Sc46/hj4ZeAQY7/3t/EA/5uBr7F8XPC92/2BwyaN8Q7gceBHLP+GvZHl43r3AY8M9y8a+oblM3O+DnwFWNzu+kcY/8+z/Gfgl4GDw+3Ns7ANgFcCXxzGfgj4/aH9IuALwKPA3wHPGdqfOyw/Ojx/0XaPYcRt8Trg7lka/zDOLw23B49n3NjvfS+ll6SmvBJTkpoywCWpKQNckpoywCWpKQNckpoywCWpKQNckpr6PxsSaXwiGw21AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.hist(df['word_count'], bins=50)\n", + "plt.show() # 绝大多数是250内的,不会是太长的" + ] + }, + { + "cell_type": "code", + "execution_count": 132, + "metadata": {}, + "outputs": [], + "source": [ + "def clean_txt(text):\n", + " sub_replace = re.compile('[^0-9a-z]') # 去掉非数值及英文的\n", + " text = sub_replace.sub(' ', text)\n", + " return text" + ] + }, + { + "cell_type": "code", + "execution_count": 153, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameaddressdescword_countdesc_clean
0Hilton Garden Seattle Downtown1821 Boren Avenue, Seattle Washington 98101 USALocated on the southern tip of Lake Union, the...184located southern tip lake union hilton garden...
1Sheraton Grand Seattle1400 6th Avenue, Seattle, Washington 98101 USALocated in the city's vibrant core, the Sherat...152located city vibrant core sheraton grand seat...
2Crowne Plaza Seattle Downtown1113 6th Ave, Seattle, WA 98101Located in the heart of downtown Seattle, the ...147located heart downtown seattle award winning ...
3Kimpton Hotel Monaco Seattle1101 4th Ave, Seattle, WA98101What?s near our hotel downtown Seattle locatio...150near hotel downtown seattle location better ...
4The Westin Seattle1900 5th Avenue, Seattle, Washington 98101 USASituated amid incredible shopping and iconic a...151situated amid incredible shopping iconic attra...
\n", + "
" + ], + "text/plain": [ + " name \\\n", + "0 Hilton Garden Seattle Downtown \n", + "1 Sheraton Grand Seattle \n", + "2 Crowne Plaza Seattle Downtown \n", + "3 Kimpton Hotel Monaco Seattle \n", + "4 The Westin Seattle \n", + "\n", + " address \\\n", + "0 1821 Boren Avenue, Seattle Washington 98101 USA \n", + "1 1400 6th Avenue, Seattle, Washington 98101 USA \n", + "2 1113 6th Ave, Seattle, WA 98101 \n", + "3 1101 4th Ave, Seattle, WA98101 \n", + "4 1900 5th Avenue, Seattle, Washington 98101 USA \n", + "\n", + " desc word_count \\\n", + "0 Located on the southern tip of Lake Union, the... 184 \n", + "1 Located in the city's vibrant core, the Sherat... 152 \n", + "2 Located in the heart of downtown Seattle, the ... 147 \n", + "3 What?s near our hotel downtown Seattle locatio... 150 \n", + "4 Situated amid incredible shopping and iconic a... 151 \n", + "\n", + " desc_clean \n", + "0 located southern tip lake union hilton garden... \n", + "1 located city vibrant core sheraton grand seat... \n", + "2 located heart downtown seattle award winning ... \n", + "3 near hotel downtown seattle location better ... \n", + "4 situated amid incredible shopping iconic attra... " + ] + }, + "execution_count": 153, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 过滤掉不需要保留的\n", + "from nltk.corpus import stopwords\n", + "set_stopwords = set(stopwords.words('english'))\n", + "\n", + "df['desc_clean'] = df['desc'].str.lower() # 全部转小写\n", + "df['desc_clean'] = df['desc_clean'].apply(clean_txt)\n", + "df['desc_clean'] = df['desc_clean'].str.split(' ').apply(lambda x: ' '.join(k for k in x if k not in set_stopwords)) # 去掉停用词\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 154, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Located on the southern tip of Lake Union, the Hilton Garden Inn Seattle Downtown hotel is perfectly located for business and leisure. \\nThe neighborhood is home to numerous major international companies including Amazon, Google and the Bill & Melinda Gates Foundation. A wealth of eclectic restaurants and bars make this area of Seattle one of the most sought out by locals and visitors. Our proximity to Lake Union allows visitors to take in some of the Pacific Northwest's majestic scenery and enjoy outdoor activities like kayaking and sailing. over 2,000 sq. ft. of versatile space and a complimentary business center. State-of-the-art A/V technology and our helpful staff will guarantee your conference, cocktail reception or wedding is a success. Refresh in the sparkling saltwater pool, or energize with the latest equipment in the 24-hour fitness center. Tastefully decorated and flooded with natural light, our guest rooms and suites offer everything you need to relax and stay productive. Unwind in the bar, and enjoy American cuisine for breakfast, lunch and dinner in our restaurant. The 24-hour Pavilion Pantry? stocks a variety of snacks, drinks and sundries.\"" + ] + }, + "execution_count": 154, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['desc'][0] # 比较两者的差异" + ] + }, + { + "cell_type": "code", + "execution_count": 155, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'located southern tip lake union hilton garden inn seattle downtown hotel perfectly located business leisure neighborhood home numerous major international companies including amazon google bill melinda gates foundation wealth eclectic restaurants bars make area seattle one sought locals visitors proximity lake union allows visitors take pacific northwest majestic scenery enjoy outdoor activities like kayaking sailing 2 000 sq ft versatile space complimentary business center state art v technology helpful staff guarantee conference cocktail reception wedding success refresh sparkling saltwater pool energize latest equipment 24 hour fitness center tastefully decorated flooded natural light guest rooms suites offer everything need relax stay productive unwind bar enjoy american cuisine breakfast lunch dinner restaurant 24 hour pavilion pantry stocks variety snacks drinks sundries '" + ] + }, + "execution_count": 155, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df['desc_clean'][0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 相似度计算" + ] + }, + { + "cell_type": "code", + "execution_count": 156, + "metadata": {}, + "outputs": [], + "source": [ + "df.set_index('name', inplace=True) # 把name变成索引" + ] + }, + { + "cell_type": "code", + "execution_count": 158, + "metadata": {}, + "outputs": [], + "source": [ + "# 计算每个词的权重水平\n", + "tf = TfidfVectorizer(analyzer='word', ngram_range=(1,3), stop_words='english')\n", + "tfidf_matrix = tf.fit_transform(df['desc_clean']) # 转换当前数据" + ] + }, + { + "cell_type": "code", + "execution_count": 161, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(152, 26631)" + ] + }, + "execution_count": 161, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tfidf_matrix.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 165, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(152, 152)" + ] + }, + "execution_count": 165, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "consine_similarity = linear_kernel(tfidf_matrix, tfidf_matrix) # 计算相似度\n", + "consine_similarity.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 166, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1. , 0.01406466, 0.03391973, 0.00993816, 0.03246863,\n", + " 0.01501356, 0.02084233, 0.01581231, 0.00776991, 0.01999756,\n", + " 0.0182464 , 0.01231142, 0.01684817, 0.0119307 , 0.01085672,\n", + " 0.01791009, 0.0111671 , 0.04070581, 0.00971403, 0.02608081,\n", + " 0.03035044, 0.00885341, 0.01056546, 0.02009413, 0.01868132,\n", + " 0.02816165, 0.0321467 , 0.00681797, 0.02538754, 0.01969646,\n", + " 0.01638717, 0.04434173, 0.0167791 , 0.02169556, 0.03728075,\n", + " 0.03902235, 0.0069193 , 0.01352541, 0.04098731, 0.03227337,\n", + " 0.0172481 , 0.01166389, 0.01520804, 0.03544255, 0.04699436,\n", + " 0.01310661, 0.03274589, 0.0161937 , 0.03786155, 0.01421505,\n", + " 0.0266454 , 0.01830098, 0.03764235, 0.01329187, 0.02744756,\n", + " 0.01454037, 0.02460386, 0.03082779, 0.01229374, 0.02683908,\n", + " 0.03151467, 0.01008901, 0.04523004, 0.0312478 , 0.0323932 ,\n", + " 0.01846074, 0.03120115, 0.01118123, 0.02208553, 0.01201834,\n", + " 0.02355357, 0.01679123, 0.02597236, 0.02219805, 0.02335901,\n", + " 0.04484254, 0.00131829, 0.02258004, 0.01596417, 0.02875198,\n", + " 0.00728455, 0.01550146, 0.00586358, 0.00886017, 0.01505134,\n", + " 0.04805398, 0.01154452, 0.00439089, 0.00890586, 0.01341109,\n", + " 0.00761107, 0.00443603, 0.0146058 , 0.00493675, 0.01795282,\n", + " 0.01702045, 0.01116872, 0.02318485, 0.01508132, 0.02823554,\n", + " 0.01212307, 0.00548954, 0.00335406, 0.02440467, 0.00912747,\n", + " 0.02412254, 0.04179826, 0.02109056, 0.01228275, 0.03570519,\n", + " 0.05331295, 0.00886831, 0.0258668 , 0.01566466, 0.0267365 ,\n", + " 0.07529637, 0.01660016, 0.0371029 , 0.0114389 , 0.01876546,\n", + " 0.00671789, 0.01194306, 0.01871489, 0.00346884, 0.00876216,\n", + " 0.00946862, 0.04517183, 0.07370297, 0.00884079, 0.01411685,\n", + " 0.01406232, 0.0124469 , 0.02123197, 0.01859324, 0.02939583,\n", + " 0.00481356, 0.0358775 , 0.01307147, 0.0136874 , 0.01567845,\n", + " 0.01888209, 0.02270796, 0.02684905, 0.01715449, 0.00317041,\n", + " 0.00237712, 0.0237994 , 0.00739057, 0.00643772, 0.01595671,\n", + " 0.00239758, 0.00730286])" + ] + }, + "execution_count": 166, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "consine_similarity[0] # 第0个与全部矩阵内容的相似度计算" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 得出推荐结果" + ] + }, + { + "cell_type": "code", + "execution_count": 167, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Hilton Garden Seattle Downtown\n", + "1 Sheraton Grand Seattle\n", + "2 Crowne Plaza Seattle Downtown\n", + "3 Kimpton Hotel Monaco Seattle \n", + "4 The Westin Seattle\n", + "Name: name, dtype: object" + ] + }, + "execution_count": 167, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "indices = pd.Series(df.index) # 单独拿出做索引\n", + "indices[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "metadata": {}, + "outputs": [], + "source": [ + "def recommendations(name, consine_similarity):\n", + " \"\"\"\n", + " 推荐酒店\n", + " name: 用户浏览的酒店\n", + " consine_similarity: 酒店相似度信息\n", + " \"\"\"\n", + " recommended_hotels = [] # 推荐列表\n", + " idx = indices[indices == name].index[0] # 获取当前用户浏览的酒店\n", + " score_series = pd.Series(consine_similarity[idx]).sort_values(ascending=False) # 找到对应的酒店并降序\n", + " top_10_indexes = list(score_series[1:11].index) # 获取前10个索引,第一个是自己,需要排除掉\n", + " for i in top_10_indexes:\n", + " recommended_hotels.append(list(df.index)[i])\n", + " \n", + " return recommended_hotels" + ] + }, + { + "cell_type": "code", + "execution_count": 169, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
addressdescword_countdesc_clean
name
Hilton Garden Seattle Downtown1821 Boren Avenue, Seattle Washington 98101 USALocated on the southern tip of Lake Union, the...184located southern tip lake union hilton garden...
Sheraton Grand Seattle1400 6th Avenue, Seattle, Washington 98101 USALocated in the city's vibrant core, the Sherat...152located city vibrant core sheraton grand seat...
Crowne Plaza Seattle Downtown1113 6th Ave, Seattle, WA 98101Located in the heart of downtown Seattle, the ...147located heart downtown seattle award winning ...
Kimpton Hotel Monaco Seattle1101 4th Ave, Seattle, WA98101What?s near our hotel downtown Seattle locatio...150near hotel downtown seattle location better ...
The Westin Seattle1900 5th Avenue, Seattle, Washington 98101 USASituated amid incredible shopping and iconic a...151situated amid incredible shopping iconic attra...
\n", + "
" + ], + "text/plain": [ + " address \\\n", + "name \n", + "Hilton Garden Seattle Downtown 1821 Boren Avenue, Seattle Washington 98101 USA \n", + "Sheraton Grand Seattle 1400 6th Avenue, Seattle, Washington 98101 USA \n", + "Crowne Plaza Seattle Downtown 1113 6th Ave, Seattle, WA 98101 \n", + "Kimpton Hotel Monaco Seattle 1101 4th Ave, Seattle, WA98101 \n", + "The Westin Seattle 1900 5th Avenue, Seattle, Washington 98101 USA \n", + "\n", + " desc \\\n", + "name \n", + "Hilton Garden Seattle Downtown Located on the southern tip of Lake Union, the... \n", + "Sheraton Grand Seattle Located in the city's vibrant core, the Sherat... \n", + "Crowne Plaza Seattle Downtown Located in the heart of downtown Seattle, the ... \n", + "Kimpton Hotel Monaco Seattle What?s near our hotel downtown Seattle locatio... \n", + "The Westin Seattle Situated amid incredible shopping and iconic a... \n", + "\n", + " word_count \\\n", + "name \n", + "Hilton Garden Seattle Downtown 184 \n", + "Sheraton Grand Seattle 152 \n", + "Crowne Plaza Seattle Downtown 147 \n", + "Kimpton Hotel Monaco Seattle 150 \n", + "The Westin Seattle 151 \n", + "\n", + " desc_clean \n", + "name \n", + "Hilton Garden Seattle Downtown located southern tip lake union hilton garden... \n", + "Sheraton Grand Seattle located city vibrant core sheraton grand seat... \n", + "Crowne Plaza Seattle Downtown located heart downtown seattle award winning ... \n", + "Kimpton Hotel Monaco Seattle near hotel downtown seattle location better ... \n", + "The Westin Seattle situated amid incredible shopping iconic attra... " + ] + }, + "execution_count": 169, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head() # 看下大概信息,拿一个出来试试结果" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Silver Cloud Inn - Seattle Lake Union',\n", + " 'Staybridge Suites Seattle Downtown - Lake Union',\n", + " 'Residence Inn by Marriott Seattle Downtown/Lake Union',\n", + " 'The Loyal Inn',\n", + " 'The Arctic Club Seattle - a DoubleTree by Hilton Hotel',\n", + " 'Embassy Suites by Hilton Seattle Tacoma International Airport',\n", + " 'The Charter Hotel Seattle, Curio Collection by Hilton',\n", + " 'MarQueen Hotel',\n", + " 'Residence Inn by Marriott Seattle Downtown/Convention Center',\n", + " 'EVEN Hotel Seattle - South Lake Union']" + ] + }, + "execution_count": 173, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "recommendations('Hilton Garden Seattle Downtown', consine_similarity)" + ] + }, + { + "cell_type": "code", + "execution_count": 202, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Located on the southern tip of Lake Union, the Hilton Garden Inn Seattle Downtown hotel is perfectly located for business and leisure. \\nThe neighborhood is home to numerous major international companies including Amazon, Google and the Bill & Melinda Gates Foundation. A wealth of eclectic restaurants and bars make this area of Seattle one of the most sought out by locals and visitors. Our proximity to Lake Union allows visitors to take in some of the Pacific Northwest's majestic scenery and enjoy outdoor activities like kayaking and sailing. over 2,000 sq. ft. of versatile space and a complimentary business center. State-of-the-art A/V technology and our helpful staff will guarantee your conference, cocktail reception or wedding is a success. Refresh in the sparkling saltwater pool, or energize with the latest equipment in the 24-hour fitness center. Tastefully decorated and flooded with natural light, our guest rooms and suites offer everything you need to relax and stay productive. Unwind in the bar, and enjoy American cuisine for breakfast, lunch and dinner in our restaurant. The 24-hour Pavilion Pantry? stocks a variety of snacks, drinks and sundries.\"" + ] + }, + "execution_count": 202, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 我们拿最接近的两个来看下其中的描述\n", + "df.loc[['Hilton Garden Seattle Downtown'],[\"desc\"]]['desc'][0] # 浏览的酒店" + ] + }, + { + "cell_type": "code", + "execution_count": 203, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Located on the south end of scenic Lake Union, the Silver Cloud Hotel Seattle \\x96 Lake Union is two miles from downtown Seattle\\x92s finest shops and restaurants and just five miles from the University of Washington campus. The hotel is also located near Seattle Center and the Fred Hutchinson Cancer Research Center, and is only a 20 minute walk to the\\xa0iconic Space Needle. Most of the guestrooms have spectacular waterfront views of Lake Union and Seattle. All rooms have 55? high-definition flat-screen TVs complimentary high-speed wired and wireless Internet access, microwaves, refrigerators and laptop safes. A complimentary Silver Cloud breakfast is offered daily in addition to use of our expansive fitness center. The Silver Cloud Hotel \\x96 Lake Union features complimentary parking and a local area shuttle service. Located along the south shore of Lake Union, the Silver Cloud Inn is nestled in one of the fastest growing and most vibrant neighborhoods in the Seattle area. Offering a blend of the Pacific Northwest outdoors and modern urban living, Lake Union\\x92s distinctive neighborhood is home to several globally-recognized companies including Amazon, Fred Hutchinson Cancer Research Center and University of Washington Medicine. Whether you are traveling by streetcar or seaplane, visit Lake Union for a uniquely Seattle experience. Explore all the nearby attractions below.'" + ] + }, + "execution_count": 203, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[['Silver Cloud Inn - Seattle Lake Union'],[\"desc\"]]['desc'][0] # 推荐1的酒店" + ] + }, + { + "cell_type": "code", + "execution_count": 204, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"The Staybridge Suites Seattle- South Lake Union, opening in 2018, offers guests a home away from home in the heart of the Emerald City of Seattle. Our upscale residential style hotel is conveniently located in the South Lake Union neighborhood, steps away from beautiful Lake Union and a quick jaunt to the Downtown corridor. Our all-suite hotel features full kitchens and your choice of studio or one bedroom suites. We can also accommodate your private business meetings or special events. Be sure to take advantage of our complimentary Monday through Wednesday evening social reception and free breakfast buffet every morning. Experience the many attractions that the South Lake Union area has to offer including the Space Needle, MOPOP, Pacific Science Center and The Museum of History & Industry. With our proximity to Lake Union, you'll have easy access to fun excursions such as boat rentals or taking in the Pacific Northwest by sea plane! South Lake Union is home to many businesses including Amazon, Facebook, Fred Hutchinson Cancer Research Center, Seattle Cancer Care Alliance and many more within walking distance of the hotel. All of our comfortable suites are newly renovated and beautifully appointed with their furnishings. Each suite includes a full kitchen with all the supporting utensils to prepare food, complimentary high speed internet, flat screen televisions with a variety of programming options and daily housekeeping.\"" + ] + }, + "execution_count": 204, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[['Staybridge Suites Seattle Downtown - Lake Union'],[\"desc\"]]['desc'][0] # 推荐2的酒店" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "可以看到三者都有关键词:On the Lake Union、The Pacific Northwest's等" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 总结:\n", + "1. 先分析高频词,对高频词进行纠正。\n", + "2. 这里用到的技术有stopwords(停用词),对于一些非重要信息可以采用剔除的操作,以及re.compile对非信息的数据进行剔除。\n", + "2. 利用TfidfVectorizer计算每句话中词的权重,并使用linear_kernel进行相似度计算。\n", + "3. 最后在利用用户浏览的酒店的相关信息进行推荐。" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/酒店推荐.ipynb b/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/酒店推荐.ipynb index 53950ff..14a53b2 100644 --- a/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/酒店推荐.ipynb +++ b/机器学习竞赛实战_优胜解决方案/基于相似度的酒店推荐系统/酒店推荐.ipynb @@ -3151,6 +3151,291 @@ "consine_similarity[0] # 第0个与全部矩阵内容的相似度计算" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 得出推荐结果" + ] + }, + { + "cell_type": "code", + "execution_count": 167, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Hilton Garden Seattle Downtown\n", + "1 Sheraton Grand Seattle\n", + "2 Crowne Plaza Seattle Downtown\n", + "3 Kimpton Hotel Monaco Seattle \n", + "4 The Westin Seattle\n", + "Name: name, dtype: object" + ] + }, + "execution_count": 167, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "indices = pd.Series(df.index) # 单独拿出做索引\n", + "indices[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "metadata": {}, + "outputs": [], + "source": [ + "def recommendations(name, consine_similarity):\n", + " \"\"\"\n", + " 推荐酒店\n", + " name: 用户浏览的酒店\n", + " consine_similarity: 酒店相似度信息\n", + " \"\"\"\n", + " recommended_hotels = [] # 推荐列表\n", + " idx = indices[indices == name].index[0] # 获取当前用户浏览的酒店\n", + " score_series = pd.Series(consine_similarity[idx]).sort_values(ascending=False) # 找到对应的酒店并降序\n", + " top_10_indexes = list(score_series[1:11].index) # 获取前10个索引,第一个是自己,需要排除掉\n", + " for i in top_10_indexes:\n", + " recommended_hotels.append(list(df.index)[i])\n", + " \n", + " return recommended_hotels" + ] + }, + { + "cell_type": "code", + "execution_count": 169, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
addressdescword_countdesc_clean
name
Hilton Garden Seattle Downtown1821 Boren Avenue, Seattle Washington 98101 USALocated on the southern tip of Lake Union, the...184located southern tip lake union hilton garden...
Sheraton Grand Seattle1400 6th Avenue, Seattle, Washington 98101 USALocated in the city's vibrant core, the Sherat...152located city vibrant core sheraton grand seat...
Crowne Plaza Seattle Downtown1113 6th Ave, Seattle, WA 98101Located in the heart of downtown Seattle, the ...147located heart downtown seattle award winning ...
Kimpton Hotel Monaco Seattle1101 4th Ave, Seattle, WA98101What?s near our hotel downtown Seattle locatio...150near hotel downtown seattle location better ...
The Westin Seattle1900 5th Avenue, Seattle, Washington 98101 USASituated amid incredible shopping and iconic a...151situated amid incredible shopping iconic attra...
\n", + "
" + ], + "text/plain": [ + " address \\\n", + "name \n", + "Hilton Garden Seattle Downtown 1821 Boren Avenue, Seattle Washington 98101 USA \n", + "Sheraton Grand Seattle 1400 6th Avenue, Seattle, Washington 98101 USA \n", + "Crowne Plaza Seattle Downtown 1113 6th Ave, Seattle, WA 98101 \n", + "Kimpton Hotel Monaco Seattle 1101 4th Ave, Seattle, WA98101 \n", + "The Westin Seattle 1900 5th Avenue, Seattle, Washington 98101 USA \n", + "\n", + " desc \\\n", + "name \n", + "Hilton Garden Seattle Downtown Located on the southern tip of Lake Union, the... \n", + "Sheraton Grand Seattle Located in the city's vibrant core, the Sherat... \n", + "Crowne Plaza Seattle Downtown Located in the heart of downtown Seattle, the ... \n", + "Kimpton Hotel Monaco Seattle What?s near our hotel downtown Seattle locatio... \n", + "The Westin Seattle Situated amid incredible shopping and iconic a... \n", + "\n", + " word_count \\\n", + "name \n", + "Hilton Garden Seattle Downtown 184 \n", + "Sheraton Grand Seattle 152 \n", + "Crowne Plaza Seattle Downtown 147 \n", + "Kimpton Hotel Monaco Seattle 150 \n", + "The Westin Seattle 151 \n", + "\n", + " desc_clean \n", + "name \n", + "Hilton Garden Seattle Downtown located southern tip lake union hilton garden... \n", + "Sheraton Grand Seattle located city vibrant core sheraton grand seat... \n", + "Crowne Plaza Seattle Downtown located heart downtown seattle award winning ... \n", + "Kimpton Hotel Monaco Seattle near hotel downtown seattle location better ... \n", + "The Westin Seattle situated amid incredible shopping iconic attra... " + ] + }, + "execution_count": 169, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head() # 看下大概信息,拿一个出来试试结果" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Silver Cloud Inn - Seattle Lake Union',\n", + " 'Staybridge Suites Seattle Downtown - Lake Union',\n", + " 'Residence Inn by Marriott Seattle Downtown/Lake Union',\n", + " 'The Loyal Inn',\n", + " 'The Arctic Club Seattle - a DoubleTree by Hilton Hotel',\n", + " 'Embassy Suites by Hilton Seattle Tacoma International Airport',\n", + " 'The Charter Hotel Seattle, Curio Collection by Hilton',\n", + " 'MarQueen Hotel',\n", + " 'Residence Inn by Marriott Seattle Downtown/Convention Center',\n", + " 'EVEN Hotel Seattle - South Lake Union']" + ] + }, + "execution_count": 173, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "recommendations('Hilton Garden Seattle Downtown', consine_similarity)" + ] + }, + { + "cell_type": "code", + "execution_count": 202, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Located on the southern tip of Lake Union, the Hilton Garden Inn Seattle Downtown hotel is perfectly located for business and leisure. \\nThe neighborhood is home to numerous major international companies including Amazon, Google and the Bill & Melinda Gates Foundation. A wealth of eclectic restaurants and bars make this area of Seattle one of the most sought out by locals and visitors. Our proximity to Lake Union allows visitors to take in some of the Pacific Northwest's majestic scenery and enjoy outdoor activities like kayaking and sailing. over 2,000 sq. ft. of versatile space and a complimentary business center. State-of-the-art A/V technology and our helpful staff will guarantee your conference, cocktail reception or wedding is a success. Refresh in the sparkling saltwater pool, or energize with the latest equipment in the 24-hour fitness center. Tastefully decorated and flooded with natural light, our guest rooms and suites offer everything you need to relax and stay productive. Unwind in the bar, and enjoy American cuisine for breakfast, lunch and dinner in our restaurant. The 24-hour Pavilion Pantry? stocks a variety of snacks, drinks and sundries.\"" + ] + }, + "execution_count": 202, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 我们拿最接近的两个来看下其中的描述\n", + "df.loc[['Hilton Garden Seattle Downtown'],[\"desc\"]]['desc'][0] # 浏览的酒店" + ] + }, + { + "cell_type": "code", + "execution_count": 203, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Located on the south end of scenic Lake Union, the Silver Cloud Hotel Seattle \\x96 Lake Union is two miles from downtown Seattle\\x92s finest shops and restaurants and just five miles from the University of Washington campus. The hotel is also located near Seattle Center and the Fred Hutchinson Cancer Research Center, and is only a 20 minute walk to the\\xa0iconic Space Needle. Most of the guestrooms have spectacular waterfront views of Lake Union and Seattle. All rooms have 55? high-definition flat-screen TVs complimentary high-speed wired and wireless Internet access, microwaves, refrigerators and laptop safes. A complimentary Silver Cloud breakfast is offered daily in addition to use of our expansive fitness center. The Silver Cloud Hotel \\x96 Lake Union features complimentary parking and a local area shuttle service. Located along the south shore of Lake Union, the Silver Cloud Inn is nestled in one of the fastest growing and most vibrant neighborhoods in the Seattle area. Offering a blend of the Pacific Northwest outdoors and modern urban living, Lake Union\\x92s distinctive neighborhood is home to several globally-recognized companies including Amazon, Fred Hutchinson Cancer Research Center and University of Washington Medicine. Whether you are traveling by streetcar or seaplane, visit Lake Union for a uniquely Seattle experience. Explore all the nearby attractions below.'" + ] + }, + "execution_count": 203, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[['Silver Cloud Inn - Seattle Lake Union'],[\"desc\"]]['desc'][0] # 推荐1的酒店" + ] + }, + { + "cell_type": "code", + "execution_count": 204, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"The Staybridge Suites Seattle- South Lake Union, opening in 2018, offers guests a home away from home in the heart of the Emerald City of Seattle. Our upscale residential style hotel is conveniently located in the South Lake Union neighborhood, steps away from beautiful Lake Union and a quick jaunt to the Downtown corridor. Our all-suite hotel features full kitchens and your choice of studio or one bedroom suites. We can also accommodate your private business meetings or special events. Be sure to take advantage of our complimentary Monday through Wednesday evening social reception and free breakfast buffet every morning. Experience the many attractions that the South Lake Union area has to offer including the Space Needle, MOPOP, Pacific Science Center and The Museum of History & Industry. With our proximity to Lake Union, you'll have easy access to fun excursions such as boat rentals or taking in the Pacific Northwest by sea plane! South Lake Union is home to many businesses including Amazon, Facebook, Fred Hutchinson Cancer Research Center, Seattle Cancer Care Alliance and many more within walking distance of the hotel. All of our comfortable suites are newly renovated and beautifully appointed with their furnishings. Each suite includes a full kitchen with all the supporting utensils to prepare food, complimentary high speed internet, flat screen televisions with a variety of programming options and daily housekeeping.\"" + ] + }, + "execution_count": 204, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[['Staybridge Suites Seattle Downtown - Lake Union'],[\"desc\"]]['desc'][0] # 推荐2的酒店" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "可以看到三者都有关键词:On the Lake Union、The Pacific Northwest's等" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 总结:\n", + "1. 先分析高频词,对高频词进行纠正。\n", + "2. 这里用到的技术有stopwords(停用词),对于一些非重要信息可以采用剔除的操作,以及re.compile对非信息的数据进行剔除。\n", + "2. 利用TfidfVectorizer计算每句话中词的权重,并使用linear_kernel进行相似度计算。\n", + "3. 最后在利用用户浏览的酒店的相关信息进行推荐。" + ] + }, { "cell_type": "code", "execution_count": null,