|
|
|
@ -243,7 +243,7 @@
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"id": "f81f2de6",
|
|
|
|
|
"id": "6091bf47",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## 分类特征\n",
|
|
|
|
@ -253,7 +253,7 @@
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": 10,
|
|
|
|
|
"id": "bf7c9e8a",
|
|
|
|
|
"id": "6061ae00",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [
|
|
|
|
|
{
|
|
|
|
@ -280,7 +280,7 @@
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"id": "3412c82b",
|
|
|
|
|
"id": "94a95b95",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## Splitting\n",
|
|
|
|
@ -289,10 +289,49 @@
|
|
|
|
|
"例如,id_30诸如\"Mac OS X 10_9_5\"之类的字符串列可以拆分为操作系统\"Mac OS X\"和版本\"10_9_5\"。或者例如数字\"1230.45\"可以拆分为元\" 1230\"和分\"45\"。LGBM 无法单独看到这些片段,需要将它们拆分。"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"id": "87e8b887",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## 组合/转化/交互\n",
|
|
|
|
|
"两个(字符串或数字)列可以合并为一列。例如card1,card2可以成为一个新列"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"id": "92515211",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"df['uid'] = df[‘card1’].astype(str)+’_’+df[‘card2’].astype(str)"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"id": "b9c66a13",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"这有助于LGBM将card1和card2一起去与目标关联,并不会在树节点分裂他们。\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"但这种uid = card1_card2可能与目标相关,现在LGBM会将其拆分。数字列可以与加法、减法、乘法等组合使用。一个数字示例是"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"id": "d50f2c15",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": [
|
|
|
|
|
"df['x1_x2'] = df['x1'] * df['x2']"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"id": "e78b77c4",
|
|
|
|
|
"id": "3ce3cb5f",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": []
|
|
|
|
|