diff --git a/机器学习竞赛实战_优胜解决方案/建筑能源利用率预测/.ipynb_checkpoints/建筑能源利用率预测-checkpoint.ipynb b/机器学习竞赛实战_优胜解决方案/建筑能源利用率预测/.ipynb_checkpoints/建筑能源利用率预测-checkpoint.ipynb new file mode 100644 index 0000000..ad03c1d --- /dev/null +++ b/机器学习竞赛实战_优胜解决方案/建筑能源利用率预测/.ipynb_checkpoints/建筑能源利用率预测-checkpoint.ipynb @@ -0,0 +1,879 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 建筑指标数据\n", + "目标:对每个建筑的能源利用率评分,1-100之间,回归任务" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 工作流程\n", + "1. 数据清洗与格式转换\n", + "2. 探索性数据分析\n", + "3. 特征工程\n", + "4. 建立基础模型,尝试多种算法\n", + "5. 模型调参\n", + "6. 评估与测试\n", + "7. 解释模型\n", + "8. 提交答案\n", + "\n", + "这些过程并不是完全的从头到尾,可能在4的时候发现1的数据清洗有问题,再回来做1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 导入所需的基本工具包\n", + "\n", + "有些默认参数可以设置" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "D:\\Anaconda3\\lib\\importlib\\_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject\n", + " return f(*args, **kwds)\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "pd.options.mode.chained_assignment = None # 消除警告,比如说提示版本升级之类的\n", + "\n", + "pd.set_option('display.max_columns', 60) # 设置最大显示列为60\n", + "\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "\n", + "plt.rcParams['font.size'] = 24 # 设置字体大小\n", + "\n", + "from IPython.core.pylabtools import figsize # 设置画图大小\n", + "\n", + "import seaborn as sns # 画图工具\n", + "sns.set(font_scale=2)\n", + "\n", + "from sklearn.model_selection import train_test_split # 切分数据集工具" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 数据清洗" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
OrderProperty IdProperty NameParent Property IdParent Property NameBBL - 10 digitsNYC Borough, Block and Lot (BBL) self-reportedNYC Building Identification Number (BIN)Address 1 (self-reported)Address 2Postal CodeStreet NumberStreet NameBoroughDOF Gross Floor AreaPrimary Property Type - Self SelectedList of All Property Use Types at PropertyLargest Property Use TypeLargest Property Use Type - Gross Floor Area (ft²)2nd Largest Property Use Type2nd Largest Property Use - Gross Floor Area (ft²)3rd Largest Property Use Type3rd Largest Property Use Type - Gross Floor Area (ft²)Year BuiltNumber of Buildings - Self-reportedOccupancyMetered Areas (Energy)Metered Areas (Water)ENERGY STAR ScoreSite EUI (kBtu/ft²)Weather Normalized Site EUI (kBtu/ft²)Weather Normalized Site Electricity Intensity (kWh/ft²)Weather Normalized Site Natural Gas Intensity (therms/ft²)Weather Normalized Source EUI (kBtu/ft²)Fuel Oil #1 Use (kBtu)Fuel Oil #2 Use (kBtu)Fuel Oil #4 Use (kBtu)Fuel Oil #5 & 6 Use (kBtu)Diesel #2 Use (kBtu)District Steam Use (kBtu)Natural Gas Use (kBtu)Weather Normalized Site Natural Gas Use (therms)Electricity Use - Grid Purchase (kBtu)Weather Normalized Site Electricity (kWh)Total GHG Emissions (Metric Tons CO2e)Direct GHG Emissions (Metric Tons CO2e)Indirect GHG Emissions (Metric Tons CO2e)Property GFA - Self-Reported (ft²)Water Use (All Water Sources) (kgal)Water Intensity (All Water Sources) (gal/ft²)Source EUI (kBtu/ft²)Release DateWater Required?DOF Benchmarking Submission StatusLatitudeLongitudeCommunity BoardCouncil DistrictCensus TractNTA
0113286201/20513286201/205101316000110131600011037549201/205 East 42nd st.Not Available100176753 AVENUEManhattan289356.0OfficeOfficeOffice293447Not AvailableNot AvailableNot AvailableNot Available19632100Whole BuildingNot AvailableNot Available305.6303.137.8Not Available614.2Not AvailableNot AvailableNot AvailableNot AvailableNot Available5.15506751E7Not AvailableNot Available38139374.21.10827705E76962.206962.2762051Not AvailableNot Available619.405/01/2017 05:32:03 PMNoIn Compliance40.750791-73.9739636.04.088.0Turtle Bay-East Midtown ...
1228400NYP Columbia (West Campus)28400NYP Columbia (West Campus)10213800401-02138-00401084198; 1084387;1084385; 1084386; 1084388; 10...622 168th StreetNot Available10032180FT WASHINGTON AVENUEManhattan3693539.0Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)3889181Not AvailableNot AvailableNot AvailableNot Available196912100Whole BuildingWhole Building55229.8228.824.82.4401.1Not Available1.96248472E7Not AvailableNot AvailableNot Available-3.914148026E89330734419330734.43323659249.62613121E755870.451016.44854.13889181Not AvailableNot Available404.304/27/2017 11:23:27 AMNoIn Compliance40.841402-73.94256812.010.0251.0Washington Heights South ...
234778226MSCHoNY North28400NYP Columbia (West Campus)10213800301-02138-003010633803975 BroadwayNot Available100323975BROADWAYManhattan152765.0Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)231342Not AvailableNot AvailableNot AvailableNot Available19241100Not AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot Available000231342Not AvailableNot AvailableNot Available04/27/2017 11:23:27 AMNoIn Compliance40.840427-73.94024912.010.0251.0Washington Heights South ...
344778267Herbert Irving Pavilion & Millstein Hospital28400NYP Columbia (West Campus)10213900011-02139-00011087281; 1076746161 Fort Washington Ave177 Fort Washington Ave10032161FT WASHINGTON AVENUEManhattan891040.0Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)1305748Not AvailableNot AvailableNot AvailableNot Available19711100Not AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot Available0001305748Not AvailableNot AvailableNot Available04/27/2017 11:23:27 AMNoIn Compliance40.840746-73.94285412.010.0255.0Washington Heights South ...
454778288Neuro Institute28400NYP Columbia (West Campus)10213900851-02139-00851063403710 West 168th StreetNot Available10032193FT WASHINGTON AVENUEManhattan211400.0Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)Hospital (General Medical & Surgical)179694Not AvailableNot AvailableNot AvailableNot Available19321100Not AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot AvailableNot Available000179694Not AvailableNot AvailableNot Available04/27/2017 11:23:27 AMNoIn Compliance40.841559-73.94252812.010.0255.0Washington Heights South ...
\n", + "
" + ], + "text/plain": [ + " Order Property Id Property Name \\\n", + "0 1 13286 201/205 \n", + "1 2 28400 NYP Columbia (West Campus) \n", + "2 3 4778226 MSCHoNY North \n", + "3 4 4778267 Herbert Irving Pavilion & Millstein Hospital \n", + "4 5 4778288 Neuro Institute \n", + "\n", + " Parent Property Id Parent Property Name BBL - 10 digits \\\n", + "0 13286 201/205 1013160001 \n", + "1 28400 NYP Columbia (West Campus) 1021380040 \n", + "2 28400 NYP Columbia (West Campus) 1021380030 \n", + "3 28400 NYP Columbia (West Campus) 1021390001 \n", + "4 28400 NYP Columbia (West Campus) 1021390085 \n", + "\n", + " NYC Borough, Block and Lot (BBL) self-reported \\\n", + "0 1013160001 \n", + "1 1-02138-0040 \n", + "2 1-02138-0030 \n", + "3 1-02139-0001 \n", + "4 1-02139-0085 \n", + "\n", + " NYC Building Identification Number (BIN) \\\n", + "0 1037549 \n", + "1 1084198; 1084387;1084385; 1084386; 1084388; 10... \n", + "2 1063380 \n", + "3 1087281; 1076746 \n", + "4 1063403 \n", + "\n", + " Address 1 (self-reported) Address 2 Postal Code \\\n", + "0 201/205 East 42nd st. Not Available 10017 \n", + "1 622 168th Street Not Available 10032 \n", + "2 3975 Broadway Not Available 10032 \n", + "3 161 Fort Washington Ave 177 Fort Washington Ave 10032 \n", + "4 710 West 168th Street Not Available 10032 \n", + "\n", + " Street Number Street Name Borough DOF Gross Floor Area \\\n", + "0 675 3 AVENUE Manhattan 289356.0 \n", + "1 180 FT WASHINGTON AVENUE Manhattan 3693539.0 \n", + "2 3975 BROADWAY Manhattan 152765.0 \n", + "3 161 FT WASHINGTON AVENUE Manhattan 891040.0 \n", + "4 193 FT WASHINGTON AVENUE Manhattan 211400.0 \n", + "\n", + " Primary Property Type - Self Selected \\\n", + "0 Office \n", + "1 Hospital (General Medical & Surgical) \n", + "2 Hospital (General Medical & Surgical) \n", + "3 Hospital (General Medical & Surgical) \n", + "4 Hospital (General Medical & Surgical) \n", + "\n", + " List of All Property Use Types at Property \\\n", + "0 Office \n", + "1 Hospital (General Medical & Surgical) \n", + "2 Hospital (General Medical & Surgical) \n", + "3 Hospital (General Medical & Surgical) \n", + "4 Hospital (General Medical & Surgical) \n", + "\n", + " Largest Property Use Type \\\n", + "0 Office \n", + "1 Hospital (General Medical & Surgical) \n", + "2 Hospital (General Medical & Surgical) \n", + "3 Hospital (General Medical & Surgical) \n", + "4 Hospital (General Medical & Surgical) \n", + "\n", + " Largest Property Use Type - Gross Floor Area (ft²) \\\n", + "0 293447 \n", + "1 3889181 \n", + "2 231342 \n", + "3 1305748 \n", + "4 179694 \n", + "\n", + " 2nd Largest Property Use Type \\\n", + "0 Not Available \n", + "1 Not Available \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " 2nd Largest Property Use - Gross Floor Area (ft²) \\\n", + "0 Not Available \n", + "1 Not Available \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " 3rd Largest Property Use Type \\\n", + "0 Not Available \n", + "1 Not Available \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " 3rd Largest Property Use Type - Gross Floor Area (ft²) Year Built \\\n", + "0 Not Available 1963 \n", + "1 Not Available 1969 \n", + "2 Not Available 1924 \n", + "3 Not Available 1971 \n", + "4 Not Available 1932 \n", + "\n", + " Number of Buildings - Self-reported Occupancy Metered Areas (Energy) \\\n", + "0 2 100 Whole Building \n", + "1 12 100 Whole Building \n", + "2 1 100 Not Available \n", + "3 1 100 Not Available \n", + "4 1 100 Not Available \n", + "\n", + " Metered Areas (Water) ENERGY STAR Score Site EUI (kBtu/ft²) \\\n", + "0 Not Available Not Available 305.6 \n", + "1 Whole Building 55 229.8 \n", + "2 Not Available Not Available Not Available \n", + "3 Not Available Not Available Not Available \n", + "4 Not Available Not Available Not Available \n", + "\n", + " Weather Normalized Site EUI (kBtu/ft²) \\\n", + "0 303.1 \n", + "1 228.8 \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " Weather Normalized Site Electricity Intensity (kWh/ft²) \\\n", + "0 37.8 \n", + "1 24.8 \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " Weather Normalized Site Natural Gas Intensity (therms/ft²) \\\n", + "0 Not Available \n", + "1 2.4 \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " Weather Normalized Source EUI (kBtu/ft²) Fuel Oil #1 Use (kBtu) \\\n", + "0 614.2 Not Available \n", + "1 401.1 Not Available \n", + "2 Not Available Not Available \n", + "3 Not Available Not Available \n", + "4 Not Available Not Available \n", + "\n", + " Fuel Oil #2 Use (kBtu) Fuel Oil #4 Use (kBtu) Fuel Oil #5 & 6 Use (kBtu) \\\n", + "0 Not Available Not Available Not Available \n", + "1 1.96248472E7 Not Available Not Available \n", + "2 Not Available Not Available Not Available \n", + "3 Not Available Not Available Not Available \n", + "4 Not Available Not Available Not Available \n", + "\n", + " Diesel #2 Use (kBtu) District Steam Use (kBtu) Natural Gas Use (kBtu) \\\n", + "0 Not Available 5.15506751E7 Not Available \n", + "1 Not Available -3.914148026E8 933073441 \n", + "2 Not Available Not Available Not Available \n", + "3 Not Available Not Available Not Available \n", + "4 Not Available Not Available Not Available \n", + "\n", + " Weather Normalized Site Natural Gas Use (therms) \\\n", + "0 Not Available \n", + "1 9330734.4 \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " Electricity Use - Grid Purchase (kBtu) \\\n", + "0 38139374.2 \n", + "1 332365924 \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " Weather Normalized Site Electricity (kWh) \\\n", + "0 1.10827705E7 \n", + "1 9.62613121E7 \n", + "2 Not Available \n", + "3 Not Available \n", + "4 Not Available \n", + "\n", + " Total GHG Emissions (Metric Tons CO2e) \\\n", + "0 6962.2 \n", + "1 55870.4 \n", + "2 0 \n", + "3 0 \n", + "4 0 \n", + "\n", + " Direct GHG Emissions (Metric Tons CO2e) \\\n", + "0 0 \n", + "1 51016.4 \n", + "2 0 \n", + "3 0 \n", + "4 0 \n", + "\n", + " Indirect GHG Emissions (Metric Tons CO2e) \\\n", + "0 6962.2 \n", + "1 4854.1 \n", + "2 0 \n", + "3 0 \n", + "4 0 \n", + "\n", + " Property GFA - Self-Reported (ft²) Water Use (All Water Sources) (kgal) \\\n", + "0 762051 Not Available \n", + "1 3889181 Not Available \n", + "2 231342 Not Available \n", + "3 1305748 Not Available \n", + "4 179694 Not Available \n", + "\n", + " Water Intensity (All Water Sources) (gal/ft²) Source EUI (kBtu/ft²) \\\n", + "0 Not Available 619.4 \n", + "1 Not Available 404.3 \n", + "2 Not Available Not Available \n", + "3 Not Available Not Available \n", + "4 Not Available Not Available \n", + "\n", + " Release Date Water Required? DOF Benchmarking Submission Status \\\n", + "0 05/01/2017 05:32:03 PM No In Compliance \n", + "1 04/27/2017 11:23:27 AM No In Compliance \n", + "2 04/27/2017 11:23:27 AM No In Compliance \n", + "3 04/27/2017 11:23:27 AM No In Compliance \n", + "4 04/27/2017 11:23:27 AM No In Compliance \n", + "\n", + " Latitude Longitude Community Board Council District Census Tract \\\n", + "0 40.750791 -73.973963 6.0 4.0 88.0 \n", + "1 40.841402 -73.942568 12.0 10.0 251.0 \n", + "2 40.840427 -73.940249 12.0 10.0 251.0 \n", + "3 40.840746 -73.942854 12.0 10.0 255.0 \n", + "4 40.841559 -73.942528 12.0 10.0 255.0 \n", + "\n", + " NTA \n", + "0 Turtle Bay-East Midtown ... \n", + "1 Washington Heights South ... \n", + "2 Washington Heights South ... \n", + "3 Washington Heights South ... \n", + "4 Washington Heights South ... " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Read in data into a dataframe\n", + "data = pd.read_csv('data/Energy_and_Water_Data_Disclosure_for_Local_Law_84_2017__Data_for_Calendar_Year_2016_.csv')\n", + "\n", + "data.head() # display top of dataframe" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "数据具体情况在数据文件夹下的pdf里" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 数据类型与缺失值\n", + "\n", + "dataframe.info 可以快速查看数据类型与缺失值" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 11746 entries, 0 to 11745\n", + "Data columns (total 60 columns):\n", + "Order 11746 non-null int64\n", + "Property Id 11746 non-null int64\n", + "Property Name 11746 non-null object\n", + "Parent Property Id 11746 non-null object\n", + "Parent Property Name 11746 non-null object\n", + "BBL - 10 digits 11735 non-null object\n", + "NYC Borough, Block and Lot (BBL) self-reported 11746 non-null object\n", + "NYC Building Identification Number (BIN) 11746 non-null object\n", + "Address 1 (self-reported) 11746 non-null object\n", + "Address 2 11746 non-null object\n", + "Postal Code 11746 non-null object\n", + "Street Number 11622 non-null object\n", + "Street Name 11624 non-null object\n", + "Borough 11628 non-null object\n", + "DOF Gross Floor Area 11628 non-null float64\n", + "Primary Property Type - Self Selected 11746 non-null object\n", + "List of All Property Use Types at Property 11746 non-null object\n", + "Largest Property Use Type 11746 non-null object\n", + "Largest Property Use Type - Gross Floor Area (ft²) 11746 non-null object\n", + "2nd Largest Property Use Type 11746 non-null object\n", + "2nd Largest Property Use - Gross Floor Area (ft²) 11746 non-null object\n", + "3rd Largest Property Use Type 11746 non-null object\n", + "3rd Largest Property Use Type - Gross Floor Area (ft²) 11746 non-null object\n", + "Year Built 11746 non-null int64\n", + "Number of Buildings - Self-reported 11746 non-null int64\n", + "Occupancy 11746 non-null int64\n", + "Metered Areas (Energy) 11746 non-null object\n", + "Metered Areas (Water) 11746 non-null object\n", + "ENERGY STAR Score 11746 non-null object\n", + "Site EUI (kBtu/ft²) 11746 non-null object\n", + "Weather Normalized Site EUI (kBtu/ft²) 11746 non-null object\n", + "Weather Normalized Site Electricity Intensity (kWh/ft²) 11746 non-null object\n", + "Weather Normalized Site Natural Gas Intensity (therms/ft²) 11746 non-null object\n", + "Weather Normalized Source EUI (kBtu/ft²) 11746 non-null object\n", + "Fuel Oil #1 Use (kBtu) 11746 non-null object\n", + "Fuel Oil #2 Use (kBtu) 11746 non-null object\n", + "Fuel Oil #4 Use (kBtu) 11746 non-null object\n", + "Fuel Oil #5 & 6 Use (kBtu) 11746 non-null object\n", + "Diesel #2 Use (kBtu) 11746 non-null object\n", + "District Steam Use (kBtu) 11746 non-null object\n", + "Natural Gas Use (kBtu) 11746 non-null object\n", + "Weather Normalized Site Natural Gas Use (therms) 11746 non-null object\n", + "Electricity Use - Grid Purchase (kBtu) 11746 non-null object\n", + "Weather Normalized Site Electricity (kWh) 11746 non-null object\n", + "Total GHG Emissions (Metric Tons CO2e) 11746 non-null object\n", + "Direct GHG Emissions (Metric Tons CO2e) 11746 non-null object\n", + "Indirect GHG Emissions (Metric Tons CO2e) 11746 non-null object\n", + "Property GFA - Self-Reported (ft²) 11746 non-null int64\n", + "Water Use (All Water Sources) (kgal) 11746 non-null object\n", + "Water Intensity (All Water Sources) (gal/ft²) 11746 non-null object\n", + "Source EUI (kBtu/ft²) 11746 non-null object\n", + "Release Date 11746 non-null object\n", + "Water Required? 11628 non-null object\n", + "DOF Benchmarking Submission Status 11716 non-null object\n", + "Latitude 9483 non-null float64\n", + "Longitude 9483 non-null float64\n", + "Community Board 9483 non-null float64\n", + "Council District 9483 non-null float64\n", + "Census Tract 9483 non-null float64\n", + "NTA 9483 non-null object\n", + "dtypes: float64(6), int64(6), object(48)\n", + "memory usage: 5.4+ MB\n" + ] + } + ], + "source": [ + "data.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "上面都是non-null,不一定是没有缺失值(np.nan),可能是缺失值的标记符号不一样,查看上面的数据,中间有很大部分是Not Available,所以Not Available应该就是缺失值" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# Replace all occurrences of Not Available with numpy not a number\n", + "data = data.replace({'Not Available':np.nan})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}