data sources

pull/38/head
Jasmine 4 years ago
parent 26277a1a14
commit 11ec381b78

@ -1,15 +1,14 @@
# Defining Data
## Introduction
This lesson focuses on identifying and classifying data by its characteristics and its sources.
Data are facts, information, observations and measurements that are used to make discoveries and to support informed decisions. A data point is a single unit of data with in a dataset, which is collection of data points. Datasets may come in different formats and structures, and will usually be based on its source, or where the data came from. For example, a company's monthly earnings might be in a spreadsheet but hourly heart rate data from a smartwatch may be in [JSON](https://stackoverflow.com/a/383699) format. It's common for data scientists to work with different types of data within a dataset.
This lesson focuses on identifying and classifying data by its characteristics and its sources.
## Pre-Lecture Quiz
[Pre-lecture quiz]()
## How Data is Described
*Raw data* are data that has come from its source in its initial state and has not been analyzed or organized. In order to make sense of what is happening with a dataset, it needs to be organized into a format that can be understood by humans as well as the technology they may use to analyze it further. The structure of a dataset describes how it's organized and can be classified at structured, unstructured and semi-structured. These types of structure will vary, depending on the source but will ultimately fit in these three categories.
@ -36,11 +35,11 @@ Semi-structured data has features that make it a combination of structured and u
Examples of unstructured data: HTML, CSV files, JavaScript Object Notation (JSON)
## Sources of Data
### Internet
#### APIs
#### Scraping
### Spreadsheets
## Sources of Data
A data source is the initial location of where the data was generated, or where it "lives" and will vary based on how and when it was collected. Data generated by its user(s) are known as primary data while secondary data comes from a source that has collected data for general use. For example, a group of scientists collecting observations in a rainforest would be considered primary and if they decide to share it with other scientists it would be considered secondary to those that use it.
Databases are a common source and rely on a database management system to host and maintain the data where users use commands called queries to explore the data. Files as data sources can be audio, image, and video files as well as spreadsheets like Excel. Internet sources are a common location for hosting data, where databases as well as files can be found. Application programming interfaces, also known as APIs allow programmers to create ways to share data with external users through the internet, while the process of web scraping extracts data from a web page. The [lessons in Working with Data](/2-Working-With-Data) focus on how to use various data sources.
## 🚀 Challenge

Loading…
Cancel
Save