Create 建筑能源利用率预测-checkpoint.ipynb

pull/2/head
benjas 5 years ago
parent ce271085d0
commit 68dc09c179

@ -0,0 +1,879 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 建筑指标数据\n",
"目标对每个建筑的能源利用率评分1-100之间回归任务"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 工作流程\n",
"1. 数据清洗与格式转换\n",
"2. 探索性数据分析\n",
"3. 特征工程\n",
"4. 建立基础模型,尝试多种算法\n",
"5. 模型调参\n",
"6. 评估与测试\n",
"7. 解释模型\n",
"8. 提交答案\n",
"\n",
"这些过程并不是完全的从头到尾可能在4的时候发现1的数据清洗有问题再回来做1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 导入所需的基本工具包\n",
"\n",
"有些默认参数可以设置"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"D:\\Anaconda3\\lib\\importlib\\_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject\n",
" return f(*args, **kwds)\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"pd.options.mode.chained_assignment = None # 消除警告,比如说提示版本升级之类的\n",
"\n",
"pd.set_option('display.max_columns', 60) # 设置最大显示列为60\n",
"\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"plt.rcParams['font.size'] = 24 # 设置字体大小\n",
"\n",
"from IPython.core.pylabtools import figsize # 设置画图大小\n",
"\n",
"import seaborn as sns # 画图工具\n",
"sns.set(font_scale=2)\n",
"\n",
"from sklearn.model_selection import train_test_split # 切分数据集工具"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据清洗"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Order</th>\n",
" <th>Property Id</th>\n",
" <th>Property Name</th>\n",
" <th>Parent Property Id</th>\n",
" <th>Parent Property Name</th>\n",
" <th>BBL - 10 digits</th>\n",
" <th>NYC Borough, Block and Lot (BBL) self-reported</th>\n",
" <th>NYC Building Identification Number (BIN)</th>\n",
" <th>Address 1 (self-reported)</th>\n",
" <th>Address 2</th>\n",
" <th>Postal Code</th>\n",
" <th>Street Number</th>\n",
" <th>Street Name</th>\n",
" <th>Borough</th>\n",
" <th>DOF Gross Floor Area</th>\n",
" <th>Primary Property Type - Self Selected</th>\n",
" <th>List of All Property Use Types at Property</th>\n",
" <th>Largest Property Use Type</th>\n",
" <th>Largest Property Use Type - Gross Floor Area (ft²)</th>\n",
" <th>2nd Largest Property Use Type</th>\n",
" <th>2nd Largest Property Use - Gross Floor Area (ft²)</th>\n",
" <th>3rd Largest Property Use Type</th>\n",
" <th>3rd Largest Property Use Type - Gross Floor Area (ft²)</th>\n",
" <th>Year Built</th>\n",
" <th>Number of Buildings - Self-reported</th>\n",
" <th>Occupancy</th>\n",
" <th>Metered Areas (Energy)</th>\n",
" <th>Metered Areas (Water)</th>\n",
" <th>ENERGY STAR Score</th>\n",
" <th>Site EUI (kBtu/ft²)</th>\n",
" <th>Weather Normalized Site EUI (kBtu/ft²)</th>\n",
" <th>Weather Normalized Site Electricity Intensity (kWh/ft²)</th>\n",
" <th>Weather Normalized Site Natural Gas Intensity (therms/ft²)</th>\n",
" <th>Weather Normalized Source EUI (kBtu/ft²)</th>\n",
" <th>Fuel Oil #1 Use (kBtu)</th>\n",
" <th>Fuel Oil #2 Use (kBtu)</th>\n",
" <th>Fuel Oil #4 Use (kBtu)</th>\n",
" <th>Fuel Oil #5 &amp; 6 Use (kBtu)</th>\n",
" <th>Diesel #2 Use (kBtu)</th>\n",
" <th>District Steam Use (kBtu)</th>\n",
" <th>Natural Gas Use (kBtu)</th>\n",
" <th>Weather Normalized Site Natural Gas Use (therms)</th>\n",
" <th>Electricity Use - Grid Purchase (kBtu)</th>\n",
" <th>Weather Normalized Site Electricity (kWh)</th>\n",
" <th>Total GHG Emissions (Metric Tons CO2e)</th>\n",
" <th>Direct GHG Emissions (Metric Tons CO2e)</th>\n",
" <th>Indirect GHG Emissions (Metric Tons CO2e)</th>\n",
" <th>Property GFA - Self-Reported (ft²)</th>\n",
" <th>Water Use (All Water Sources) (kgal)</th>\n",
" <th>Water Intensity (All Water Sources) (gal/ft²)</th>\n",
" <th>Source EUI (kBtu/ft²)</th>\n",
" <th>Release Date</th>\n",
" <th>Water Required?</th>\n",
" <th>DOF Benchmarking Submission Status</th>\n",
" <th>Latitude</th>\n",
" <th>Longitude</th>\n",
" <th>Community Board</th>\n",
" <th>Council District</th>\n",
" <th>Census Tract</th>\n",
" <th>NTA</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>13286</td>\n",
" <td>201/205</td>\n",
" <td>13286</td>\n",
" <td>201/205</td>\n",
" <td>1013160001</td>\n",
" <td>1013160001</td>\n",
" <td>1037549</td>\n",
" <td>201/205 East 42nd st.</td>\n",
" <td>Not Available</td>\n",
" <td>10017</td>\n",
" <td>675</td>\n",
" <td>3 AVENUE</td>\n",
" <td>Manhattan</td>\n",
" <td>289356.0</td>\n",
" <td>Office</td>\n",
" <td>Office</td>\n",
" <td>Office</td>\n",
" <td>293447</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>1963</td>\n",
" <td>2</td>\n",
" <td>100</td>\n",
" <td>Whole Building</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>305.6</td>\n",
" <td>303.1</td>\n",
" <td>37.8</td>\n",
" <td>Not Available</td>\n",
" <td>614.2</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>5.15506751E7</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>38139374.2</td>\n",
" <td>1.10827705E7</td>\n",
" <td>6962.2</td>\n",
" <td>0</td>\n",
" <td>6962.2</td>\n",
" <td>762051</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>619.4</td>\n",
" <td>05/01/2017 05:32:03 PM</td>\n",
" <td>No</td>\n",
" <td>In Compliance</td>\n",
" <td>40.750791</td>\n",
" <td>-73.973963</td>\n",
" <td>6.0</td>\n",
" <td>4.0</td>\n",
" <td>88.0</td>\n",
" <td>Turtle Bay-East Midtown ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>28400</td>\n",
" <td>NYP Columbia (West Campus)</td>\n",
" <td>28400</td>\n",
" <td>NYP Columbia (West Campus)</td>\n",
" <td>1021380040</td>\n",
" <td>1-02138-0040</td>\n",
" <td>1084198; 1084387;1084385; 1084386; 1084388; 10...</td>\n",
" <td>622 168th Street</td>\n",
" <td>Not Available</td>\n",
" <td>10032</td>\n",
" <td>180</td>\n",
" <td>FT WASHINGTON AVENUE</td>\n",
" <td>Manhattan</td>\n",
" <td>3693539.0</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>3889181</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>1969</td>\n",
" <td>12</td>\n",
" <td>100</td>\n",
" <td>Whole Building</td>\n",
" <td>Whole Building</td>\n",
" <td>55</td>\n",
" <td>229.8</td>\n",
" <td>228.8</td>\n",
" <td>24.8</td>\n",
" <td>2.4</td>\n",
" <td>401.1</td>\n",
" <td>Not Available</td>\n",
" <td>1.96248472E7</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>-3.914148026E8</td>\n",
" <td>933073441</td>\n",
" <td>9330734.4</td>\n",
" <td>332365924</td>\n",
" <td>9.62613121E7</td>\n",
" <td>55870.4</td>\n",
" <td>51016.4</td>\n",
" <td>4854.1</td>\n",
" <td>3889181</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>404.3</td>\n",
" <td>04/27/2017 11:23:27 AM</td>\n",
" <td>No</td>\n",
" <td>In Compliance</td>\n",
" <td>40.841402</td>\n",
" <td>-73.942568</td>\n",
" <td>12.0</td>\n",
" <td>10.0</td>\n",
" <td>251.0</td>\n",
" <td>Washington Heights South ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>4778226</td>\n",
" <td>MSCHoNY North</td>\n",
" <td>28400</td>\n",
" <td>NYP Columbia (West Campus)</td>\n",
" <td>1021380030</td>\n",
" <td>1-02138-0030</td>\n",
" <td>1063380</td>\n",
" <td>3975 Broadway</td>\n",
" <td>Not Available</td>\n",
" <td>10032</td>\n",
" <td>3975</td>\n",
" <td>BROADWAY</td>\n",
" <td>Manhattan</td>\n",
" <td>152765.0</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>231342</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>1924</td>\n",
" <td>1</td>\n",
" <td>100</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>231342</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>04/27/2017 11:23:27 AM</td>\n",
" <td>No</td>\n",
" <td>In Compliance</td>\n",
" <td>40.840427</td>\n",
" <td>-73.940249</td>\n",
" <td>12.0</td>\n",
" <td>10.0</td>\n",
" <td>251.0</td>\n",
" <td>Washington Heights South ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4778267</td>\n",
" <td>Herbert Irving Pavilion &amp; Millstein Hospital</td>\n",
" <td>28400</td>\n",
" <td>NYP Columbia (West Campus)</td>\n",
" <td>1021390001</td>\n",
" <td>1-02139-0001</td>\n",
" <td>1087281; 1076746</td>\n",
" <td>161 Fort Washington Ave</td>\n",
" <td>177 Fort Washington Ave</td>\n",
" <td>10032</td>\n",
" <td>161</td>\n",
" <td>FT WASHINGTON AVENUE</td>\n",
" <td>Manhattan</td>\n",
" <td>891040.0</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>1305748</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>1971</td>\n",
" <td>1</td>\n",
" <td>100</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1305748</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>04/27/2017 11:23:27 AM</td>\n",
" <td>No</td>\n",
" <td>In Compliance</td>\n",
" <td>40.840746</td>\n",
" <td>-73.942854</td>\n",
" <td>12.0</td>\n",
" <td>10.0</td>\n",
" <td>255.0</td>\n",
" <td>Washington Heights South ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>4778288</td>\n",
" <td>Neuro Institute</td>\n",
" <td>28400</td>\n",
" <td>NYP Columbia (West Campus)</td>\n",
" <td>1021390085</td>\n",
" <td>1-02139-0085</td>\n",
" <td>1063403</td>\n",
" <td>710 West 168th Street</td>\n",
" <td>Not Available</td>\n",
" <td>10032</td>\n",
" <td>193</td>\n",
" <td>FT WASHINGTON AVENUE</td>\n",
" <td>Manhattan</td>\n",
" <td>211400.0</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>Hospital (General Medical &amp; Surgical)</td>\n",
" <td>179694</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>1932</td>\n",
" <td>1</td>\n",
" <td>100</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>179694</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>Not Available</td>\n",
" <td>04/27/2017 11:23:27 AM</td>\n",
" <td>No</td>\n",
" <td>In Compliance</td>\n",
" <td>40.841559</td>\n",
" <td>-73.942528</td>\n",
" <td>12.0</td>\n",
" <td>10.0</td>\n",
" <td>255.0</td>\n",
" <td>Washington Heights South ...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Order Property Id Property Name \\\n",
"0 1 13286 201/205 \n",
"1 2 28400 NYP Columbia (West Campus) \n",
"2 3 4778226 MSCHoNY North \n",
"3 4 4778267 Herbert Irving Pavilion & Millstein Hospital \n",
"4 5 4778288 Neuro Institute \n",
"\n",
" Parent Property Id Parent Property Name BBL - 10 digits \\\n",
"0 13286 201/205 1013160001 \n",
"1 28400 NYP Columbia (West Campus) 1021380040 \n",
"2 28400 NYP Columbia (West Campus) 1021380030 \n",
"3 28400 NYP Columbia (West Campus) 1021390001 \n",
"4 28400 NYP Columbia (West Campus) 1021390085 \n",
"\n",
" NYC Borough, Block and Lot (BBL) self-reported \\\n",
"0 1013160001 \n",
"1 1-02138-0040 \n",
"2 1-02138-0030 \n",
"3 1-02139-0001 \n",
"4 1-02139-0085 \n",
"\n",
" NYC Building Identification Number (BIN) \\\n",
"0 1037549 \n",
"1 1084198; 1084387;1084385; 1084386; 1084388; 10... \n",
"2 1063380 \n",
"3 1087281; 1076746 \n",
"4 1063403 \n",
"\n",
" Address 1 (self-reported) Address 2 Postal Code \\\n",
"0 201/205 East 42nd st. Not Available 10017 \n",
"1 622 168th Street Not Available 10032 \n",
"2 3975 Broadway Not Available 10032 \n",
"3 161 Fort Washington Ave 177 Fort Washington Ave 10032 \n",
"4 710 West 168th Street Not Available 10032 \n",
"\n",
" Street Number Street Name Borough DOF Gross Floor Area \\\n",
"0 675 3 AVENUE Manhattan 289356.0 \n",
"1 180 FT WASHINGTON AVENUE Manhattan 3693539.0 \n",
"2 3975 BROADWAY Manhattan 152765.0 \n",
"3 161 FT WASHINGTON AVENUE Manhattan 891040.0 \n",
"4 193 FT WASHINGTON AVENUE Manhattan 211400.0 \n",
"\n",
" Primary Property Type - Self Selected \\\n",
"0 Office \n",
"1 Hospital (General Medical & Surgical) \n",
"2 Hospital (General Medical & Surgical) \n",
"3 Hospital (General Medical & Surgical) \n",
"4 Hospital (General Medical & Surgical) \n",
"\n",
" List of All Property Use Types at Property \\\n",
"0 Office \n",
"1 Hospital (General Medical & Surgical) \n",
"2 Hospital (General Medical & Surgical) \n",
"3 Hospital (General Medical & Surgical) \n",
"4 Hospital (General Medical & Surgical) \n",
"\n",
" Largest Property Use Type \\\n",
"0 Office \n",
"1 Hospital (General Medical & Surgical) \n",
"2 Hospital (General Medical & Surgical) \n",
"3 Hospital (General Medical & Surgical) \n",
"4 Hospital (General Medical & Surgical) \n",
"\n",
" Largest Property Use Type - Gross Floor Area (ft²) \\\n",
"0 293447 \n",
"1 3889181 \n",
"2 231342 \n",
"3 1305748 \n",
"4 179694 \n",
"\n",
" 2nd Largest Property Use Type \\\n",
"0 Not Available \n",
"1 Not Available \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" 2nd Largest Property Use - Gross Floor Area (ft²) \\\n",
"0 Not Available \n",
"1 Not Available \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" 3rd Largest Property Use Type \\\n",
"0 Not Available \n",
"1 Not Available \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" 3rd Largest Property Use Type - Gross Floor Area (ft²) Year Built \\\n",
"0 Not Available 1963 \n",
"1 Not Available 1969 \n",
"2 Not Available 1924 \n",
"3 Not Available 1971 \n",
"4 Not Available 1932 \n",
"\n",
" Number of Buildings - Self-reported Occupancy Metered Areas (Energy) \\\n",
"0 2 100 Whole Building \n",
"1 12 100 Whole Building \n",
"2 1 100 Not Available \n",
"3 1 100 Not Available \n",
"4 1 100 Not Available \n",
"\n",
" Metered Areas (Water) ENERGY STAR Score Site EUI (kBtu/ft²) \\\n",
"0 Not Available Not Available 305.6 \n",
"1 Whole Building 55 229.8 \n",
"2 Not Available Not Available Not Available \n",
"3 Not Available Not Available Not Available \n",
"4 Not Available Not Available Not Available \n",
"\n",
" Weather Normalized Site EUI (kBtu/ft²) \\\n",
"0 303.1 \n",
"1 228.8 \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" Weather Normalized Site Electricity Intensity (kWh/ft²) \\\n",
"0 37.8 \n",
"1 24.8 \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" Weather Normalized Site Natural Gas Intensity (therms/ft²) \\\n",
"0 Not Available \n",
"1 2.4 \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" Weather Normalized Source EUI (kBtu/ft²) Fuel Oil #1 Use (kBtu) \\\n",
"0 614.2 Not Available \n",
"1 401.1 Not Available \n",
"2 Not Available Not Available \n",
"3 Not Available Not Available \n",
"4 Not Available Not Available \n",
"\n",
" Fuel Oil #2 Use (kBtu) Fuel Oil #4 Use (kBtu) Fuel Oil #5 & 6 Use (kBtu) \\\n",
"0 Not Available Not Available Not Available \n",
"1 1.96248472E7 Not Available Not Available \n",
"2 Not Available Not Available Not Available \n",
"3 Not Available Not Available Not Available \n",
"4 Not Available Not Available Not Available \n",
"\n",
" Diesel #2 Use (kBtu) District Steam Use (kBtu) Natural Gas Use (kBtu) \\\n",
"0 Not Available 5.15506751E7 Not Available \n",
"1 Not Available -3.914148026E8 933073441 \n",
"2 Not Available Not Available Not Available \n",
"3 Not Available Not Available Not Available \n",
"4 Not Available Not Available Not Available \n",
"\n",
" Weather Normalized Site Natural Gas Use (therms) \\\n",
"0 Not Available \n",
"1 9330734.4 \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" Electricity Use - Grid Purchase (kBtu) \\\n",
"0 38139374.2 \n",
"1 332365924 \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" Weather Normalized Site Electricity (kWh) \\\n",
"0 1.10827705E7 \n",
"1 9.62613121E7 \n",
"2 Not Available \n",
"3 Not Available \n",
"4 Not Available \n",
"\n",
" Total GHG Emissions (Metric Tons CO2e) \\\n",
"0 6962.2 \n",
"1 55870.4 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"\n",
" Direct GHG Emissions (Metric Tons CO2e) \\\n",
"0 0 \n",
"1 51016.4 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"\n",
" Indirect GHG Emissions (Metric Tons CO2e) \\\n",
"0 6962.2 \n",
"1 4854.1 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"\n",
" Property GFA - Self-Reported (ft²) Water Use (All Water Sources) (kgal) \\\n",
"0 762051 Not Available \n",
"1 3889181 Not Available \n",
"2 231342 Not Available \n",
"3 1305748 Not Available \n",
"4 179694 Not Available \n",
"\n",
" Water Intensity (All Water Sources) (gal/ft²) Source EUI (kBtu/ft²) \\\n",
"0 Not Available 619.4 \n",
"1 Not Available 404.3 \n",
"2 Not Available Not Available \n",
"3 Not Available Not Available \n",
"4 Not Available Not Available \n",
"\n",
" Release Date Water Required? DOF Benchmarking Submission Status \\\n",
"0 05/01/2017 05:32:03 PM No In Compliance \n",
"1 04/27/2017 11:23:27 AM No In Compliance \n",
"2 04/27/2017 11:23:27 AM No In Compliance \n",
"3 04/27/2017 11:23:27 AM No In Compliance \n",
"4 04/27/2017 11:23:27 AM No In Compliance \n",
"\n",
" Latitude Longitude Community Board Council District Census Tract \\\n",
"0 40.750791 -73.973963 6.0 4.0 88.0 \n",
"1 40.841402 -73.942568 12.0 10.0 251.0 \n",
"2 40.840427 -73.940249 12.0 10.0 251.0 \n",
"3 40.840746 -73.942854 12.0 10.0 255.0 \n",
"4 40.841559 -73.942528 12.0 10.0 255.0 \n",
"\n",
" NTA \n",
"0 Turtle Bay-East Midtown ... \n",
"1 Washington Heights South ... \n",
"2 Washington Heights South ... \n",
"3 Washington Heights South ... \n",
"4 Washington Heights South ... "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Read in data into a dataframe\n",
"data = pd.read_csv('data/Energy_and_Water_Data_Disclosure_for_Local_Law_84_2017__Data_for_Calendar_Year_2016_.csv')\n",
"\n",
"data.head() # display top of dataframe"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"数据具体情况在数据文件夹下的pdf里"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据类型与缺失值\n",
"\n",
"dataframe.info 可以快速查看数据类型与缺失值"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 11746 entries, 0 to 11745\n",
"Data columns (total 60 columns):\n",
"Order 11746 non-null int64\n",
"Property Id 11746 non-null int64\n",
"Property Name 11746 non-null object\n",
"Parent Property Id 11746 non-null object\n",
"Parent Property Name 11746 non-null object\n",
"BBL - 10 digits 11735 non-null object\n",
"NYC Borough, Block and Lot (BBL) self-reported 11746 non-null object\n",
"NYC Building Identification Number (BIN) 11746 non-null object\n",
"Address 1 (self-reported) 11746 non-null object\n",
"Address 2 11746 non-null object\n",
"Postal Code 11746 non-null object\n",
"Street Number 11622 non-null object\n",
"Street Name 11624 non-null object\n",
"Borough 11628 non-null object\n",
"DOF Gross Floor Area 11628 non-null float64\n",
"Primary Property Type - Self Selected 11746 non-null object\n",
"List of All Property Use Types at Property 11746 non-null object\n",
"Largest Property Use Type 11746 non-null object\n",
"Largest Property Use Type - Gross Floor Area (ft²) 11746 non-null object\n",
"2nd Largest Property Use Type 11746 non-null object\n",
"2nd Largest Property Use - Gross Floor Area (ft²) 11746 non-null object\n",
"3rd Largest Property Use Type 11746 non-null object\n",
"3rd Largest Property Use Type - Gross Floor Area (ft²) 11746 non-null object\n",
"Year Built 11746 non-null int64\n",
"Number of Buildings - Self-reported 11746 non-null int64\n",
"Occupancy 11746 non-null int64\n",
"Metered Areas (Energy) 11746 non-null object\n",
"Metered Areas (Water) 11746 non-null object\n",
"ENERGY STAR Score 11746 non-null object\n",
"Site EUI (kBtu/ft²) 11746 non-null object\n",
"Weather Normalized Site EUI (kBtu/ft²) 11746 non-null object\n",
"Weather Normalized Site Electricity Intensity (kWh/ft²) 11746 non-null object\n",
"Weather Normalized Site Natural Gas Intensity (therms/ft²) 11746 non-null object\n",
"Weather Normalized Source EUI (kBtu/ft²) 11746 non-null object\n",
"Fuel Oil #1 Use (kBtu) 11746 non-null object\n",
"Fuel Oil #2 Use (kBtu) 11746 non-null object\n",
"Fuel Oil #4 Use (kBtu) 11746 non-null object\n",
"Fuel Oil #5 & 6 Use (kBtu) 11746 non-null object\n",
"Diesel #2 Use (kBtu) 11746 non-null object\n",
"District Steam Use (kBtu) 11746 non-null object\n",
"Natural Gas Use (kBtu) 11746 non-null object\n",
"Weather Normalized Site Natural Gas Use (therms) 11746 non-null object\n",
"Electricity Use - Grid Purchase (kBtu) 11746 non-null object\n",
"Weather Normalized Site Electricity (kWh) 11746 non-null object\n",
"Total GHG Emissions (Metric Tons CO2e) 11746 non-null object\n",
"Direct GHG Emissions (Metric Tons CO2e) 11746 non-null object\n",
"Indirect GHG Emissions (Metric Tons CO2e) 11746 non-null object\n",
"Property GFA - Self-Reported (ft²) 11746 non-null int64\n",
"Water Use (All Water Sources) (kgal) 11746 non-null object\n",
"Water Intensity (All Water Sources) (gal/ft²) 11746 non-null object\n",
"Source EUI (kBtu/ft²) 11746 non-null object\n",
"Release Date 11746 non-null object\n",
"Water Required? 11628 non-null object\n",
"DOF Benchmarking Submission Status 11716 non-null object\n",
"Latitude 9483 non-null float64\n",
"Longitude 9483 non-null float64\n",
"Community Board 9483 non-null float64\n",
"Council District 9483 non-null float64\n",
"Census Tract 9483 non-null float64\n",
"NTA 9483 non-null object\n",
"dtypes: float64(6), int64(6), object(48)\n",
"memory usage: 5.4+ MB\n"
]
}
],
"source": [
"data.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上面都是non-null不一定是没有缺失值np.nan可能是缺失值的标记符号不一样查看上面的数据中间有很大部分是Not Available所以Not Available应该就是缺失值"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Replace all occurrences of Not Available with numpy not a number\n",
"data = data.replace({'Not Available':np.nan})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading…
Cancel
Save