Add. Combining / Transforming / Interaction

master
benjas 4 years ago
parent a141ea68d7
commit 004fea8295

@ -243,7 +243,7 @@
},
{
"cell_type": "markdown",
"id": "f81f2de6",
"id": "6091bf47",
"metadata": {},
"source": [
"## 分类特征\n",
@ -253,7 +253,7 @@
{
"cell_type": "code",
"execution_count": 10,
"id": "bf7c9e8a",
"id": "6061ae00",
"metadata": {},
"outputs": [
{
@ -280,7 +280,7 @@
},
{
"cell_type": "markdown",
"id": "3412c82b",
"id": "94a95b95",
"metadata": {},
"source": [
"## Splitting\n",
@ -289,10 +289,49 @@
"例如id_30诸如\"Mac OS X 10_9_5\"之类的字符串列可以拆分为操作系统\"Mac OS X\"和版本\"10_9_5\"。或者例如数字\"1230.45\"可以拆分为元\" 1230\"和分\"45\"。LGBM 无法单独看到这些片段,需要将它们拆分。"
]
},
{
"cell_type": "markdown",
"id": "87e8b887",
"metadata": {},
"source": [
"## 组合/转化/交互\n",
"两个字符串或数字列可以合并为一列。例如card1card2可以成为一个新列"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92515211",
"metadata": {},
"outputs": [],
"source": [
"df['uid'] = df[card1].astype(str)+_+df[card2].astype(str)"
]
},
{
"cell_type": "markdown",
"id": "b9c66a13",
"metadata": {},
"source": [
"这有助于LGBM将card1和card2一起去与目标关联并不会在树节点分裂他们。\n",
"\n",
"但这种uid = card1_card2可能与目标相关现在LGBM会将其拆分。数字列可以与加法、减法、乘法等组合使用。一个数字示例是"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d50f2c15",
"metadata": {},
"outputs": [],
"source": [
"df['x1_x2'] = df['x1'] * df['x2']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e78b77c4",
"id": "3ce3cb5f",
"metadata": {},
"outputs": [],
"source": []

@ -243,7 +243,7 @@
},
{
"cell_type": "markdown",
"id": "f81f2de6",
"id": "6091bf47",
"metadata": {},
"source": [
"## 分类特征\n",
@ -253,7 +253,7 @@
{
"cell_type": "code",
"execution_count": 10,
"id": "bf7c9e8a",
"id": "6061ae00",
"metadata": {},
"outputs": [
{
@ -280,7 +280,7 @@
},
{
"cell_type": "markdown",
"id": "3412c82b",
"id": "94a95b95",
"metadata": {},
"source": [
"## Splitting\n",
@ -289,10 +289,49 @@
"例如id_30诸如\"Mac OS X 10_9_5\"之类的字符串列可以拆分为操作系统\"Mac OS X\"和版本\"10_9_5\"。或者例如数字\"1230.45\"可以拆分为元\" 1230\"和分\"45\"。LGBM 无法单独看到这些片段,需要将它们拆分。"
]
},
{
"cell_type": "markdown",
"id": "87e8b887",
"metadata": {},
"source": [
"## 组合/转化/交互\n",
"两个字符串或数字列可以合并为一列。例如card1card2可以成为一个新列"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92515211",
"metadata": {},
"outputs": [],
"source": [
"df['uid'] = df[card1].astype(str)+_+df[card2].astype(str)"
]
},
{
"cell_type": "markdown",
"id": "b9c66a13",
"metadata": {},
"source": [
"这有助于LGBM将card1和card2一起去与目标关联并不会在树节点分裂他们。\n",
"\n",
"但这种uid = card1_card2可能与目标相关现在LGBM会将其拆分。数字列可以与加法、减法、乘法等组合使用。一个数字示例是"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d50f2c15",
"metadata": {},
"outputs": [],
"source": [
"df['x1_x2'] = df['x1'] * df['x2']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e78b77c4",
"id": "3ce3cb5f",
"metadata": {},
"outputs": [],
"source": []

Loading…
Cancel
Save