diff --git a/机器学习竞赛实战_优胜解决方案/建筑能源利用率预测/.ipynb_checkpoints/建筑能源利用率预测-checkpoint.ipynb b/机器学习竞赛实战_优胜解决方案/建筑能源利用率预测/.ipynb_checkpoints/建筑能源利用率预测-checkpoint.ipynb
new file mode 100644
index 0000000..ad03c1d
--- /dev/null
+++ b/机器学习竞赛实战_优胜解决方案/建筑能源利用率预测/.ipynb_checkpoints/建筑能源利用率预测-checkpoint.ipynb
@@ -0,0 +1,879 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 建筑指标数据\n",
+ "目标:对每个建筑的能源利用率评分,1-100之间,回归任务"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 工作流程\n",
+ "1. 数据清洗与格式转换\n",
+ "2. 探索性数据分析\n",
+ "3. 特征工程\n",
+ "4. 建立基础模型,尝试多种算法\n",
+ "5. 模型调参\n",
+ "6. 评估与测试\n",
+ "7. 解释模型\n",
+ "8. 提交答案\n",
+ "\n",
+ "这些过程并不是完全的从头到尾,可能在4的时候发现1的数据清洗有问题,再回来做1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 导入所需的基本工具包\n",
+ "\n",
+ "有些默认参数可以设置"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "D:\\Anaconda3\\lib\\importlib\\_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject\n",
+ " return f(*args, **kwds)\n"
+ ]
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "\n",
+ "pd.options.mode.chained_assignment = None # 消除警告,比如说提示版本升级之类的\n",
+ "\n",
+ "pd.set_option('display.max_columns', 60) # 设置最大显示列为60\n",
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "plt.rcParams['font.size'] = 24 # 设置字体大小\n",
+ "\n",
+ "from IPython.core.pylabtools import figsize # 设置画图大小\n",
+ "\n",
+ "import seaborn as sns # 画图工具\n",
+ "sns.set(font_scale=2)\n",
+ "\n",
+ "from sklearn.model_selection import train_test_split # 切分数据集工具"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 数据清洗"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Order | \n",
+ " Property Id | \n",
+ " Property Name | \n",
+ " Parent Property Id | \n",
+ " Parent Property Name | \n",
+ " BBL - 10 digits | \n",
+ " NYC Borough, Block and Lot (BBL) self-reported | \n",
+ " NYC Building Identification Number (BIN) | \n",
+ " Address 1 (self-reported) | \n",
+ " Address 2 | \n",
+ " Postal Code | \n",
+ " Street Number | \n",
+ " Street Name | \n",
+ " Borough | \n",
+ " DOF Gross Floor Area | \n",
+ " Primary Property Type - Self Selected | \n",
+ " List of All Property Use Types at Property | \n",
+ " Largest Property Use Type | \n",
+ " Largest Property Use Type - Gross Floor Area (ft²) | \n",
+ " 2nd Largest Property Use Type | \n",
+ " 2nd Largest Property Use - Gross Floor Area (ft²) | \n",
+ " 3rd Largest Property Use Type | \n",
+ " 3rd Largest Property Use Type - Gross Floor Area (ft²) | \n",
+ " Year Built | \n",
+ " Number of Buildings - Self-reported | \n",
+ " Occupancy | \n",
+ " Metered Areas (Energy) | \n",
+ " Metered Areas (Water) | \n",
+ " ENERGY STAR Score | \n",
+ " Site EUI (kBtu/ft²) | \n",
+ " Weather Normalized Site EUI (kBtu/ft²) | \n",
+ " Weather Normalized Site Electricity Intensity (kWh/ft²) | \n",
+ " Weather Normalized Site Natural Gas Intensity (therms/ft²) | \n",
+ " Weather Normalized Source EUI (kBtu/ft²) | \n",
+ " Fuel Oil #1 Use (kBtu) | \n",
+ " Fuel Oil #2 Use (kBtu) | \n",
+ " Fuel Oil #4 Use (kBtu) | \n",
+ " Fuel Oil #5 & 6 Use (kBtu) | \n",
+ " Diesel #2 Use (kBtu) | \n",
+ " District Steam Use (kBtu) | \n",
+ " Natural Gas Use (kBtu) | \n",
+ " Weather Normalized Site Natural Gas Use (therms) | \n",
+ " Electricity Use - Grid Purchase (kBtu) | \n",
+ " Weather Normalized Site Electricity (kWh) | \n",
+ " Total GHG Emissions (Metric Tons CO2e) | \n",
+ " Direct GHG Emissions (Metric Tons CO2e) | \n",
+ " Indirect GHG Emissions (Metric Tons CO2e) | \n",
+ " Property GFA - Self-Reported (ft²) | \n",
+ " Water Use (All Water Sources) (kgal) | \n",
+ " Water Intensity (All Water Sources) (gal/ft²) | \n",
+ " Source EUI (kBtu/ft²) | \n",
+ " Release Date | \n",
+ " Water Required? | \n",
+ " DOF Benchmarking Submission Status | \n",
+ " Latitude | \n",
+ " Longitude | \n",
+ " Community Board | \n",
+ " Council District | \n",
+ " Census Tract | \n",
+ " NTA | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 13286 | \n",
+ " 201/205 | \n",
+ " 13286 | \n",
+ " 201/205 | \n",
+ " 1013160001 | \n",
+ " 1013160001 | \n",
+ " 1037549 | \n",
+ " 201/205 East 42nd st. | \n",
+ " Not Available | \n",
+ " 10017 | \n",
+ " 675 | \n",
+ " 3 AVENUE | \n",
+ " Manhattan | \n",
+ " 289356.0 | \n",
+ " Office | \n",
+ " Office | \n",
+ " Office | \n",
+ " 293447 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 1963 | \n",
+ " 2 | \n",
+ " 100 | \n",
+ " Whole Building | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 305.6 | \n",
+ " 303.1 | \n",
+ " 37.8 | \n",
+ " Not Available | \n",
+ " 614.2 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 5.15506751E7 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 38139374.2 | \n",
+ " 1.10827705E7 | \n",
+ " 6962.2 | \n",
+ " 0 | \n",
+ " 6962.2 | \n",
+ " 762051 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 619.4 | \n",
+ " 05/01/2017 05:32:03 PM | \n",
+ " No | \n",
+ " In Compliance | \n",
+ " 40.750791 | \n",
+ " -73.973963 | \n",
+ " 6.0 | \n",
+ " 4.0 | \n",
+ " 88.0 | \n",
+ " Turtle Bay-East Midtown ... | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 28400 | \n",
+ " NYP Columbia (West Campus) | \n",
+ " 28400 | \n",
+ " NYP Columbia (West Campus) | \n",
+ " 1021380040 | \n",
+ " 1-02138-0040 | \n",
+ " 1084198; 1084387;1084385; 1084386; 1084388; 10... | \n",
+ " 622 168th Street | \n",
+ " Not Available | \n",
+ " 10032 | \n",
+ " 180 | \n",
+ " FT WASHINGTON AVENUE | \n",
+ " Manhattan | \n",
+ " 3693539.0 | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " 3889181 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 1969 | \n",
+ " 12 | \n",
+ " 100 | \n",
+ " Whole Building | \n",
+ " Whole Building | \n",
+ " 55 | \n",
+ " 229.8 | \n",
+ " 228.8 | \n",
+ " 24.8 | \n",
+ " 2.4 | \n",
+ " 401.1 | \n",
+ " Not Available | \n",
+ " 1.96248472E7 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " -3.914148026E8 | \n",
+ " 933073441 | \n",
+ " 9330734.4 | \n",
+ " 332365924 | \n",
+ " 9.62613121E7 | \n",
+ " 55870.4 | \n",
+ " 51016.4 | \n",
+ " 4854.1 | \n",
+ " 3889181 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 404.3 | \n",
+ " 04/27/2017 11:23:27 AM | \n",
+ " No | \n",
+ " In Compliance | \n",
+ " 40.841402 | \n",
+ " -73.942568 | \n",
+ " 12.0 | \n",
+ " 10.0 | \n",
+ " 251.0 | \n",
+ " Washington Heights South ... | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 4778226 | \n",
+ " MSCHoNY North | \n",
+ " 28400 | \n",
+ " NYP Columbia (West Campus) | \n",
+ " 1021380030 | \n",
+ " 1-02138-0030 | \n",
+ " 1063380 | \n",
+ " 3975 Broadway | \n",
+ " Not Available | \n",
+ " 10032 | \n",
+ " 3975 | \n",
+ " BROADWAY | \n",
+ " Manhattan | \n",
+ " 152765.0 | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " 231342 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 1924 | \n",
+ " 1 | \n",
+ " 100 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 231342 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 04/27/2017 11:23:27 AM | \n",
+ " No | \n",
+ " In Compliance | \n",
+ " 40.840427 | \n",
+ " -73.940249 | \n",
+ " 12.0 | \n",
+ " 10.0 | \n",
+ " 251.0 | \n",
+ " Washington Heights South ... | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 4778267 | \n",
+ " Herbert Irving Pavilion & Millstein Hospital | \n",
+ " 28400 | \n",
+ " NYP Columbia (West Campus) | \n",
+ " 1021390001 | \n",
+ " 1-02139-0001 | \n",
+ " 1087281; 1076746 | \n",
+ " 161 Fort Washington Ave | \n",
+ " 177 Fort Washington Ave | \n",
+ " 10032 | \n",
+ " 161 | \n",
+ " FT WASHINGTON AVENUE | \n",
+ " Manhattan | \n",
+ " 891040.0 | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " 1305748 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 1971 | \n",
+ " 1 | \n",
+ " 100 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 1305748 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 04/27/2017 11:23:27 AM | \n",
+ " No | \n",
+ " In Compliance | \n",
+ " 40.840746 | \n",
+ " -73.942854 | \n",
+ " 12.0 | \n",
+ " 10.0 | \n",
+ " 255.0 | \n",
+ " Washington Heights South ... | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 4778288 | \n",
+ " Neuro Institute | \n",
+ " 28400 | \n",
+ " NYP Columbia (West Campus) | \n",
+ " 1021390085 | \n",
+ " 1-02139-0085 | \n",
+ " 1063403 | \n",
+ " 710 West 168th Street | \n",
+ " Not Available | \n",
+ " 10032 | \n",
+ " 193 | \n",
+ " FT WASHINGTON AVENUE | \n",
+ " Manhattan | \n",
+ " 211400.0 | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " Hospital (General Medical & Surgical) | \n",
+ " 179694 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 1932 | \n",
+ " 1 | \n",
+ " 100 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 179694 | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " Not Available | \n",
+ " 04/27/2017 11:23:27 AM | \n",
+ " No | \n",
+ " In Compliance | \n",
+ " 40.841559 | \n",
+ " -73.942528 | \n",
+ " 12.0 | \n",
+ " 10.0 | \n",
+ " 255.0 | \n",
+ " Washington Heights South ... | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Order Property Id Property Name \\\n",
+ "0 1 13286 201/205 \n",
+ "1 2 28400 NYP Columbia (West Campus) \n",
+ "2 3 4778226 MSCHoNY North \n",
+ "3 4 4778267 Herbert Irving Pavilion & Millstein Hospital \n",
+ "4 5 4778288 Neuro Institute \n",
+ "\n",
+ " Parent Property Id Parent Property Name BBL - 10 digits \\\n",
+ "0 13286 201/205 1013160001 \n",
+ "1 28400 NYP Columbia (West Campus) 1021380040 \n",
+ "2 28400 NYP Columbia (West Campus) 1021380030 \n",
+ "3 28400 NYP Columbia (West Campus) 1021390001 \n",
+ "4 28400 NYP Columbia (West Campus) 1021390085 \n",
+ "\n",
+ " NYC Borough, Block and Lot (BBL) self-reported \\\n",
+ "0 1013160001 \n",
+ "1 1-02138-0040 \n",
+ "2 1-02138-0030 \n",
+ "3 1-02139-0001 \n",
+ "4 1-02139-0085 \n",
+ "\n",
+ " NYC Building Identification Number (BIN) \\\n",
+ "0 1037549 \n",
+ "1 1084198; 1084387;1084385; 1084386; 1084388; 10... \n",
+ "2 1063380 \n",
+ "3 1087281; 1076746 \n",
+ "4 1063403 \n",
+ "\n",
+ " Address 1 (self-reported) Address 2 Postal Code \\\n",
+ "0 201/205 East 42nd st. Not Available 10017 \n",
+ "1 622 168th Street Not Available 10032 \n",
+ "2 3975 Broadway Not Available 10032 \n",
+ "3 161 Fort Washington Ave 177 Fort Washington Ave 10032 \n",
+ "4 710 West 168th Street Not Available 10032 \n",
+ "\n",
+ " Street Number Street Name Borough DOF Gross Floor Area \\\n",
+ "0 675 3 AVENUE Manhattan 289356.0 \n",
+ "1 180 FT WASHINGTON AVENUE Manhattan 3693539.0 \n",
+ "2 3975 BROADWAY Manhattan 152765.0 \n",
+ "3 161 FT WASHINGTON AVENUE Manhattan 891040.0 \n",
+ "4 193 FT WASHINGTON AVENUE Manhattan 211400.0 \n",
+ "\n",
+ " Primary Property Type - Self Selected \\\n",
+ "0 Office \n",
+ "1 Hospital (General Medical & Surgical) \n",
+ "2 Hospital (General Medical & Surgical) \n",
+ "3 Hospital (General Medical & Surgical) \n",
+ "4 Hospital (General Medical & Surgical) \n",
+ "\n",
+ " List of All Property Use Types at Property \\\n",
+ "0 Office \n",
+ "1 Hospital (General Medical & Surgical) \n",
+ "2 Hospital (General Medical & Surgical) \n",
+ "3 Hospital (General Medical & Surgical) \n",
+ "4 Hospital (General Medical & Surgical) \n",
+ "\n",
+ " Largest Property Use Type \\\n",
+ "0 Office \n",
+ "1 Hospital (General Medical & Surgical) \n",
+ "2 Hospital (General Medical & Surgical) \n",
+ "3 Hospital (General Medical & Surgical) \n",
+ "4 Hospital (General Medical & Surgical) \n",
+ "\n",
+ " Largest Property Use Type - Gross Floor Area (ft²) \\\n",
+ "0 293447 \n",
+ "1 3889181 \n",
+ "2 231342 \n",
+ "3 1305748 \n",
+ "4 179694 \n",
+ "\n",
+ " 2nd Largest Property Use Type \\\n",
+ "0 Not Available \n",
+ "1 Not Available \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " 2nd Largest Property Use - Gross Floor Area (ft²) \\\n",
+ "0 Not Available \n",
+ "1 Not Available \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " 3rd Largest Property Use Type \\\n",
+ "0 Not Available \n",
+ "1 Not Available \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " 3rd Largest Property Use Type - Gross Floor Area (ft²) Year Built \\\n",
+ "0 Not Available 1963 \n",
+ "1 Not Available 1969 \n",
+ "2 Not Available 1924 \n",
+ "3 Not Available 1971 \n",
+ "4 Not Available 1932 \n",
+ "\n",
+ " Number of Buildings - Self-reported Occupancy Metered Areas (Energy) \\\n",
+ "0 2 100 Whole Building \n",
+ "1 12 100 Whole Building \n",
+ "2 1 100 Not Available \n",
+ "3 1 100 Not Available \n",
+ "4 1 100 Not Available \n",
+ "\n",
+ " Metered Areas (Water) ENERGY STAR Score Site EUI (kBtu/ft²) \\\n",
+ "0 Not Available Not Available 305.6 \n",
+ "1 Whole Building 55 229.8 \n",
+ "2 Not Available Not Available Not Available \n",
+ "3 Not Available Not Available Not Available \n",
+ "4 Not Available Not Available Not Available \n",
+ "\n",
+ " Weather Normalized Site EUI (kBtu/ft²) \\\n",
+ "0 303.1 \n",
+ "1 228.8 \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " Weather Normalized Site Electricity Intensity (kWh/ft²) \\\n",
+ "0 37.8 \n",
+ "1 24.8 \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " Weather Normalized Site Natural Gas Intensity (therms/ft²) \\\n",
+ "0 Not Available \n",
+ "1 2.4 \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " Weather Normalized Source EUI (kBtu/ft²) Fuel Oil #1 Use (kBtu) \\\n",
+ "0 614.2 Not Available \n",
+ "1 401.1 Not Available \n",
+ "2 Not Available Not Available \n",
+ "3 Not Available Not Available \n",
+ "4 Not Available Not Available \n",
+ "\n",
+ " Fuel Oil #2 Use (kBtu) Fuel Oil #4 Use (kBtu) Fuel Oil #5 & 6 Use (kBtu) \\\n",
+ "0 Not Available Not Available Not Available \n",
+ "1 1.96248472E7 Not Available Not Available \n",
+ "2 Not Available Not Available Not Available \n",
+ "3 Not Available Not Available Not Available \n",
+ "4 Not Available Not Available Not Available \n",
+ "\n",
+ " Diesel #2 Use (kBtu) District Steam Use (kBtu) Natural Gas Use (kBtu) \\\n",
+ "0 Not Available 5.15506751E7 Not Available \n",
+ "1 Not Available -3.914148026E8 933073441 \n",
+ "2 Not Available Not Available Not Available \n",
+ "3 Not Available Not Available Not Available \n",
+ "4 Not Available Not Available Not Available \n",
+ "\n",
+ " Weather Normalized Site Natural Gas Use (therms) \\\n",
+ "0 Not Available \n",
+ "1 9330734.4 \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " Electricity Use - Grid Purchase (kBtu) \\\n",
+ "0 38139374.2 \n",
+ "1 332365924 \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " Weather Normalized Site Electricity (kWh) \\\n",
+ "0 1.10827705E7 \n",
+ "1 9.62613121E7 \n",
+ "2 Not Available \n",
+ "3 Not Available \n",
+ "4 Not Available \n",
+ "\n",
+ " Total GHG Emissions (Metric Tons CO2e) \\\n",
+ "0 6962.2 \n",
+ "1 55870.4 \n",
+ "2 0 \n",
+ "3 0 \n",
+ "4 0 \n",
+ "\n",
+ " Direct GHG Emissions (Metric Tons CO2e) \\\n",
+ "0 0 \n",
+ "1 51016.4 \n",
+ "2 0 \n",
+ "3 0 \n",
+ "4 0 \n",
+ "\n",
+ " Indirect GHG Emissions (Metric Tons CO2e) \\\n",
+ "0 6962.2 \n",
+ "1 4854.1 \n",
+ "2 0 \n",
+ "3 0 \n",
+ "4 0 \n",
+ "\n",
+ " Property GFA - Self-Reported (ft²) Water Use (All Water Sources) (kgal) \\\n",
+ "0 762051 Not Available \n",
+ "1 3889181 Not Available \n",
+ "2 231342 Not Available \n",
+ "3 1305748 Not Available \n",
+ "4 179694 Not Available \n",
+ "\n",
+ " Water Intensity (All Water Sources) (gal/ft²) Source EUI (kBtu/ft²) \\\n",
+ "0 Not Available 619.4 \n",
+ "1 Not Available 404.3 \n",
+ "2 Not Available Not Available \n",
+ "3 Not Available Not Available \n",
+ "4 Not Available Not Available \n",
+ "\n",
+ " Release Date Water Required? DOF Benchmarking Submission Status \\\n",
+ "0 05/01/2017 05:32:03 PM No In Compliance \n",
+ "1 04/27/2017 11:23:27 AM No In Compliance \n",
+ "2 04/27/2017 11:23:27 AM No In Compliance \n",
+ "3 04/27/2017 11:23:27 AM No In Compliance \n",
+ "4 04/27/2017 11:23:27 AM No In Compliance \n",
+ "\n",
+ " Latitude Longitude Community Board Council District Census Tract \\\n",
+ "0 40.750791 -73.973963 6.0 4.0 88.0 \n",
+ "1 40.841402 -73.942568 12.0 10.0 251.0 \n",
+ "2 40.840427 -73.940249 12.0 10.0 251.0 \n",
+ "3 40.840746 -73.942854 12.0 10.0 255.0 \n",
+ "4 40.841559 -73.942528 12.0 10.0 255.0 \n",
+ "\n",
+ " NTA \n",
+ "0 Turtle Bay-East Midtown ... \n",
+ "1 Washington Heights South ... \n",
+ "2 Washington Heights South ... \n",
+ "3 Washington Heights South ... \n",
+ "4 Washington Heights South ... "
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Read in data into a dataframe\n",
+ "data = pd.read_csv('data/Energy_and_Water_Data_Disclosure_for_Local_Law_84_2017__Data_for_Calendar_Year_2016_.csv')\n",
+ "\n",
+ "data.head() # display top of dataframe"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "数据具体情况在数据文件夹下的pdf里"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 数据类型与缺失值\n",
+ "\n",
+ "dataframe.info 可以快速查看数据类型与缺失值"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "RangeIndex: 11746 entries, 0 to 11745\n",
+ "Data columns (total 60 columns):\n",
+ "Order 11746 non-null int64\n",
+ "Property Id 11746 non-null int64\n",
+ "Property Name 11746 non-null object\n",
+ "Parent Property Id 11746 non-null object\n",
+ "Parent Property Name 11746 non-null object\n",
+ "BBL - 10 digits 11735 non-null object\n",
+ "NYC Borough, Block and Lot (BBL) self-reported 11746 non-null object\n",
+ "NYC Building Identification Number (BIN) 11746 non-null object\n",
+ "Address 1 (self-reported) 11746 non-null object\n",
+ "Address 2 11746 non-null object\n",
+ "Postal Code 11746 non-null object\n",
+ "Street Number 11622 non-null object\n",
+ "Street Name 11624 non-null object\n",
+ "Borough 11628 non-null object\n",
+ "DOF Gross Floor Area 11628 non-null float64\n",
+ "Primary Property Type - Self Selected 11746 non-null object\n",
+ "List of All Property Use Types at Property 11746 non-null object\n",
+ "Largest Property Use Type 11746 non-null object\n",
+ "Largest Property Use Type - Gross Floor Area (ft²) 11746 non-null object\n",
+ "2nd Largest Property Use Type 11746 non-null object\n",
+ "2nd Largest Property Use - Gross Floor Area (ft²) 11746 non-null object\n",
+ "3rd Largest Property Use Type 11746 non-null object\n",
+ "3rd Largest Property Use Type - Gross Floor Area (ft²) 11746 non-null object\n",
+ "Year Built 11746 non-null int64\n",
+ "Number of Buildings - Self-reported 11746 non-null int64\n",
+ "Occupancy 11746 non-null int64\n",
+ "Metered Areas (Energy) 11746 non-null object\n",
+ "Metered Areas (Water) 11746 non-null object\n",
+ "ENERGY STAR Score 11746 non-null object\n",
+ "Site EUI (kBtu/ft²) 11746 non-null object\n",
+ "Weather Normalized Site EUI (kBtu/ft²) 11746 non-null object\n",
+ "Weather Normalized Site Electricity Intensity (kWh/ft²) 11746 non-null object\n",
+ "Weather Normalized Site Natural Gas Intensity (therms/ft²) 11746 non-null object\n",
+ "Weather Normalized Source EUI (kBtu/ft²) 11746 non-null object\n",
+ "Fuel Oil #1 Use (kBtu) 11746 non-null object\n",
+ "Fuel Oil #2 Use (kBtu) 11746 non-null object\n",
+ "Fuel Oil #4 Use (kBtu) 11746 non-null object\n",
+ "Fuel Oil #5 & 6 Use (kBtu) 11746 non-null object\n",
+ "Diesel #2 Use (kBtu) 11746 non-null object\n",
+ "District Steam Use (kBtu) 11746 non-null object\n",
+ "Natural Gas Use (kBtu) 11746 non-null object\n",
+ "Weather Normalized Site Natural Gas Use (therms) 11746 non-null object\n",
+ "Electricity Use - Grid Purchase (kBtu) 11746 non-null object\n",
+ "Weather Normalized Site Electricity (kWh) 11746 non-null object\n",
+ "Total GHG Emissions (Metric Tons CO2e) 11746 non-null object\n",
+ "Direct GHG Emissions (Metric Tons CO2e) 11746 non-null object\n",
+ "Indirect GHG Emissions (Metric Tons CO2e) 11746 non-null object\n",
+ "Property GFA - Self-Reported (ft²) 11746 non-null int64\n",
+ "Water Use (All Water Sources) (kgal) 11746 non-null object\n",
+ "Water Intensity (All Water Sources) (gal/ft²) 11746 non-null object\n",
+ "Source EUI (kBtu/ft²) 11746 non-null object\n",
+ "Release Date 11746 non-null object\n",
+ "Water Required? 11628 non-null object\n",
+ "DOF Benchmarking Submission Status 11716 non-null object\n",
+ "Latitude 9483 non-null float64\n",
+ "Longitude 9483 non-null float64\n",
+ "Community Board 9483 non-null float64\n",
+ "Council District 9483 non-null float64\n",
+ "Census Tract 9483 non-null float64\n",
+ "NTA 9483 non-null object\n",
+ "dtypes: float64(6), int64(6), object(48)\n",
+ "memory usage: 5.4+ MB\n"
+ ]
+ }
+ ],
+ "source": [
+ "data.info()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "上面都是non-null,不一定是没有缺失值(np.nan),可能是缺失值的标记符号不一样,查看上面的数据,中间有很大部分是Not Available,所以Not Available应该就是缺失值"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Replace all occurrences of Not Available with numpy not a number\n",
+ "data = data.replace({'Not Available':np.nan})"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}