You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Data-Science-For-Beginners/translations/en/2-Working-With-Data/06-non-relational/README.md

13 KiB

Working with Data: Non-Relational Data

 Sketchnote by (@sketchthedocs)
Working with NoSQL Data - Sketchnote by @nitya

Pre-Lecture Quiz

Data isn't limited to relational databases. This lesson focuses on non-relational data and introduces the basics of spreadsheets and NoSQL.

Spreadsheets

Spreadsheets are a widely used tool for storing and analyzing data because they are easy to set up and use. In this lesson, you'll learn the fundamental components of a spreadsheet, along with formulas and functions. The examples will use Microsoft Excel, but most spreadsheet software has similar features and terminology.

An empty Microsoft Excel workbook with two worksheets

A spreadsheet is a file that can be accessed on a computer, device, or cloud-based file system. The software itself might be browser-based or require installation as an application or app. In Excel, these files are referred to as workbooks, and this term will be used throughout the lesson.

A workbook contains one or more worksheets, which are labeled with tabs. Each worksheet consists of rectangles called cells, where the actual data is stored. A cell is the intersection of a row and column, with columns labeled alphabetically and rows labeled numerically. Some spreadsheets include headers in the first few rows to describe the data in the cells.

Using these basic elements of an Excel workbook, we'll explore an example from Microsoft Templates focused on inventory management to demonstrate additional spreadsheet features.

Managing an Inventory

The spreadsheet file named "InventoryExample" is a formatted inventory sheet containing three worksheets, with tabs labeled "Inventory List," "Inventory Pick List," and "Bin Lookup." Row 4 of the Inventory List worksheet serves as the header, describing the value of each cell in the corresponding column.

A highlighted formula from an example inventory list in Microsoft Excel

Sometimes, a cell's value depends on other cells to calculate its value. For example, the Inventory List spreadsheet tracks the cost of each item, but what if we need to calculate the total value of the inventory? Formulas perform operations on cell data, and in this case, a formula is used in the Inventory Value column to calculate the value of each item by multiplying the quantity (under the QTY header) by the cost (under the COST header). Double-clicking or highlighting a cell reveals the formula. Formulas always start with an equals sign, followed by the calculation or operation.

A highlighted function from an example inventory list in Microsoft Excel

To find the total inventory value, we can use another formula to sum up all the values in the Inventory Value column. While adding each cell manually is possible, it would be tedious. Excel provides functions, which are predefined formulas for performing calculations. Functions require arguments, which are the values needed for the calculation. If a function requires multiple arguments, they must be listed in the correct order to ensure accurate results. In this example, the SUM function is used to add up the values in the Inventory Value column, with the total displayed in row 3, column B (B3).

NoSQL

NoSQL is a broad term for various methods of storing non-relational data. It can mean "non-SQL," "non-relational," or "not only SQL." These database systems are generally categorized into four types.

Graphical representation of a key-value data store showing 4 unique numerical keys that are associated with 4 various values

Source from Michał Białecki Blog

Key-value databases store data as pairs of unique keys and their associated values. These pairs are organized using a hash table with a suitable hashing function.

Graphical representation of a graph data store showing the relationships between people, their interests and locations

Source from Microsoft

Graph databases represent relationships between data as nodes and edges. A node represents an entity, such as a student or a bank statement, while edges represent the relationships between entities. Both nodes and edges have properties that provide additional information.

Graphical representation of a columnar data store showing a customer database with two column families named Identity and Contact Info

Columnar data stores organize data into rows and columns, similar to relational databases. However, columns are grouped into column families, where all data within a column family is related and can be retrieved or modified as a single unit.

Document Data Stores with the Azure Cosmos DB

Document data stores expand on the concept of key-value stores by using fields and objects. This section explores document databases using the Cosmos DB emulator.

Cosmos DB fits the "Not Only SQL" definition, as its document database uses SQL for querying data. The previous lesson on SQL covers the basics of the language, and we'll apply similar queries to a document database here. The Cosmos DB Emulator allows you to create and explore a document database locally. Learn more about the Emulator here.

A document consists of fields and object values, where fields describe the object values. Below is an example of a document.

{
    "firstname": "Eva",
    "age": 44,
    "id": "8c74a315-aebf-4a16-bb38-2430a9896ce5",
    "_rid": "bHwDAPQz8s0BAAAAAAAAAA==",
    "_self": "dbs/bHwDAA==/colls/bHwDAPQz8s0=/docs/bHwDAPQz8s0BAAAAAAAAAA==/",
    "_etag": "\"00000000-0000-0000-9f95-010a691e01d7\"",
    "_attachments": "attachments/",
    "_ts": 1630544034
}

Key fields in this document include firstname, id, and age. The other fields with underscores are automatically generated by Cosmos DB.

Exploring Data with the Cosmos DB Emulator

You can download and install the emulator for Windows here. For macOS and Linux, refer to this documentation.

The Emulator opens in a browser window, where the Explorer view lets you navigate documents.

The Explorer view of the Cosmos DB Emulator

If you're following along, click "Start with Sample" to generate a sample database called SampleDB. Expanding SampleDB reveals a container named Persons. A container holds a collection of items, which are the documents within it. You can explore the four individual documents under Items.

Exploring sample data in the Cosmos DB Emulator

Querying Document Data with the Cosmos DB Emulator

You can query the sample data by clicking the "New SQL Query" button (second button from the left).

SELECT * FROM c retrieves all documents in the container. To find everyone younger than 40, add a WHERE clause:

SELECT * FROM c where c.age < 40

Running a SELECT query on sample data in the Cosmos DB Emulator to find documents that have an age field value that is less than 40

The query returns two documents, both with age values less than 40.

JSON and Documents

If you're familiar with JavaScript Object Notation (JSON), you'll notice that documents resemble JSON. A PersonsData.json file in this directory contains additional data that you can upload to the Persons container in the Emulator using the Upload Item button.

In many cases, APIs that return JSON data can directly store this data in document databases. Below is another document, representing tweets from the Microsoft Twitter account. This data was retrieved using the Twitter API and inserted into Cosmos DB.

{
    "created_at": "2021-08-31T19:03:01.000Z",
    "id": "1432780985872142341",
    "text": "Blank slate. Like this tweet if youve ever painted in Microsoft Paint before. https://t.co/cFeEs8eOPK",
    "_rid": "dhAmAIUsA4oHAAAAAAAAAA==",
    "_self": "dbs/dhAmAA==/colls/dhAmAIUsA4o=/docs/dhAmAIUsA4oHAAAAAAAAAA==/",
    "_etag": "\"00000000-0000-0000-9f84-a0958ad901d7\"",
    "_attachments": "attachments/",
    "_ts": 1630537000

Key fields in this document include created_at, id, and text.

🚀 Challenge

A TwitterData.json file is available for upload to the SampleDB database. It's recommended to add it to a separate container. To do this:

  1. Click the "New Container" button in the top right.
  2. Select the existing database (SampleDB), create a container ID for the new container.
  3. Set the partition key to /id.
  4. Click OK (you can ignore the other settings since this is a small dataset running locally).
  5. Open your new container and upload the Twitter Data file using the Upload Item button.

Try running a few SELECT queries to find documents containing "Microsoft" in the text field. Hint: Use the LIKE keyword.

Post-lecture quiz

Review & Self Study

Assignment

Soda Profits


Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we strive for accuracy, please note that automated translations may contain errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is recommended. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.