# Store location data Add a sketchnote if possible/appropriate ![Embed a video here if available](video-url) ## Pre-lecture quiz [Pre-lecture quiz](https://brave-island-0b7c7f50f.azurestaticapps.net/quiz/23) ## Introduction In the last lesson, you learned how to use a GPS sensor to capture location data. To use this data to visualize the location of a truck laden with food, and it's journey, it needs to be sent to an IoT service in the cloud, and then stored somewhere. In this lesson you will learn about the different ways to store IoT data, and learn how to store data from your IoT service using serverless code. In this lesson we'll cover: * [Structured and unstructured data](#structured-and-unstructured-data) * [Send GPS data to an IoT Hub](#send-gps-data-to-an-iot-hub) * [Hot, warm, and cold paths](#hot-warm-and-cold-paths) * [Handle GPS events using serverless code](#handle-gps-events-using-serverless-code) * [Azure Storage Accounts](#azure-storage-accounts) * [Connect your serverless code to storage](#connect-your-serverless-code-to-storage) ## Structured and unstructured data Computer systems deal with data, and this data comes in all manner of different shapes and sizes. It can vary from single numbers, to large amounts of text, to videos and images, and to IoT data. Data can usually be divided into one of two categories - *structured* data and *unstructured* data. * **Structured data** is data with a well-defined, rigid structure that doesn't change and usually maps to tables of data with relationships. One example is a persons details including their name, date of birth and address. * **Unstructured data** is data without a well-defined, rigid structure, including data that can change structure frequently. One example is documents such as written documents or spreadsheets. ✅ Do some research: Can you think of some other examples of structured and unstructured data? > 💁 There is also semi-structured data that is structured but doesn't fit into fixed tables of data IoT data is usually considered to be unstructured data. Imagine you were adding IoT devices to a fleet of vehicles for a large commercial farm. You might want to use different devices for different types of vehicle. For example: * For farm vehicles like tractors you want GPS data to ensure they are working on the correct fields * For delivery trucks transporting food to warehouses you want GPS data as well as speed and acceleration data to ensure the driver is driving safely, and drive identity and start/stop data to ensure drive compliance with local laws on working hours * For refrigerated trucks you also want temperature data to ensure the food doesn't get too hot or cold and spoil in transit This data can change constantly. For example, if the IoT device is in a truck cab, then the data it sends may change as the trailer changes, for example only sending temperature data when a refrigerated trailer is used. ✅ What other IoT data might be captured? Think about the kinds of loads trucks can carry, as well as maintenance data. This data varies from vehicle to vehicle, but it all gets sent to the same IoT service for processing. The IoT service needs to be able to process this unstructured data, storing it in a way that allows it to be searched or analyzed, but works with different structures to this data. ### SQL vs NoSQL storage Databases are services that allow you to store and query data. Database come in two types - SQL and NoSQL #### SQL databases The first databases were Relational Database Management Systems (RDBMS), or relational database. These are also known as SQL databases after the Structured Query Language (SQL) used to interact with them to add, remove, update or query data. These database consist of a schema - a well-defined set of tables of data, similar to a spreadsheet. Each table has multiple named columns. When you insert data, you add a row to the table, putting values into each of the columns. This keeps the data in a very rigid structure - although you can leave columns empty, if you want to add a new column you have to do this on the database, populating values for the existing rows. These databases are relational - in that one table can have a relationship to another. ![A relational database with the ID of the User table relating to the user ID column of the purchases table, and the ID of the products table relating to the product ID of the purchases table](../../../images/sql-database.png) For example, if you stored a users personal details in a table, you would have some kind of internal unique ID per user that is used in a row in a table that contains the users name and address. If you then wanted to store other details about that user, such as their purchases, in another table, you would have one column in the new table for that users ID. When you look up a user, you can use their ID to get their personal details from one table, and their purchases from another. SQL databases are ideal for storing structured data, and for when you want to ensure the data matches your schema. ✅ If you haven't used SQL before, take a moment to read up on it on the [SQL page on Wikipedia](https://wikipedia.org/wiki/SQL). Some well known SQL databases are Microsoft SQL Server, MySQL, and PostgreSQL. ✅ Do some research: Read up on some of these SQL databases and their capabilities. #### NoSQL database NoSQL databases are called NoSQL because they don't have the same rigid structure of SQL databases. They are also known as document databases as they can store unstructured data such as documents. > 💁 Despite their name, some NoSQL databases allow you to use SQL to query the data. ![Documents in folders in a NoSQL database](../../../images/noqsl-database.png) NoSQL database do not have a pre-defined schema that limits how data is stored, instead you can insert any unstructured data, usually using JSON documents. These documents can be organized into folders, similar to files on your computer. Each document can have different fields from other documents - for example if you were storing IoT data from your farm vehicles, some may have fields for accelerometer and speed data, others may have fields for the temperature in the trailer. If you were to add a new truck type, such as one with built in scales to track the weight of produce carried, then your IoT device could add this new field and it could be stored without any changes to the database. Some well known NoSQL databases include Azure CosmosDB, MongoDB, and CouchDB. ✅ Do some research: Read up on some of these NoSQL databases and their capabilities. In this lesson, you will be using NoSQL storage to store IoT data. ## Send GPS data to an IoT Hub In the last lesson you captured GPS data from a GPS sensor connected to your IoT device. To store this IoT data in the cloud, you need to send it to an IoT service. Once again, you will be using Azure IoT Hub, the same IoT cloud service you used in the previous project. ![Sending GPS telemetry from an IoT device to IoT Hub](../../../images/gps-telemetry-iot-hub.png) ***Sending GPS telemetry from an IoT device to IoT Hub. GPS by mim studio / Microcontroller by Template - all from the [Noun Project](https://thenounproject.com)*** ### Task - send GPS data to an IoT Hub 1. Create a new IoT Hub using the free tier. > ⚠️ You can refer to the [instructions for creating an IoT Hub from project 2, lesson 4](../../../2-farm/lessons/4-migrate-your-plant-to-the-cloud/README.md#create-an-iot-service-in-the-cloud) if needed. Remember to create a new Resource Group. Name the new Resource Group `gps-sensor`, and the new IoT Hub a unique name based on `gps-sensor`, such as `gps-sensor-`. > 💁 If you still have your IoT Hub from the previous project, you can re-use it. Remember to use the name of this IoT Hub and the Resource Group it is in when creating other services. 1. Add a new device to the IoT Hub. Call this device `gps-sensor`. Grab the connection string for the device. 1. Update your device code to send the GPS data to the new IoT Hub using the device connection string from the previous step. > ⚠️ You can refer to the [instructions for connecting your device to an IoT from project 2, lesson 4](../../../2-farm/lessons/4-migrate-your-plant-to-the-cloud/README.md#connect-your-device-to-the-iot-service) if needed. 1. When you send the GPS data, do it as JSON in the following format: ```json { "gps" : { "lat" : , "lon" : } } ``` 1. Send GPS data every minute so you don't use up your daily message allocation. If you are using the Wio Terminal, remember to add all the necessary libraries, and set the time using an NTP server. Your code will also need to ensure that it has read all the data from the serial port before sending the GPS location, using the existing code from the last lesson. Use the following code to construct the JSON document: ```cpp DynamicJsonDocument doc(1024); doc["gps"]["lat"] = gps.location.lat(); doc["gps"]["lon"] = gps.location.lng(); ``` If you are using a Virtual IoT device, remember to install all the needed libraries using a virtual environment. For both the Raspberry Pi and Virtual IoT device, use the existing code from the last lesson to get the latitude and longitude values, then send them in the correct JSON format with the following code: ```python message_json = { "gps" : { "lat":lat, "lon":lon } } print("Sending telemetry", message_json) message = Message(json.dumps(message_json)) ``` > 💁 You can find this code in the [code/wio-terminal](code/wio-terminal), [code/pi](code/pi) or [code/virtual-device](code/virtual-device) folder. Run your device code and ensure messages are flowing into IoT Hub using the `az iot hub monitor-events` CLI command. ## Hot, warm, and cold paths Data that flows from an IoT device to the cloud is not always processed in real time. Some data needs real time processing, other data can be processed a short time later, and other data can be processed much later. The flow of data to different services that process the data at different times is referred to hot, warm and cold paths. ### Hot path The hot path refers to data that needs to be processed in real time or near real time. You would use hot path data for alerts, such as getting alerts that a vehicle is approaching a depot, or that the temperature in a refrigerated truck is too high. To use hot path data, your code would respond to events as soon as they are received by your cloud services. ### Warm path The warm path refers to data that can be processed a short while after being received, for example for reporting or short term analytics. You would use warm path data for daily reports on vehicle mileage, using data gathered the previous day. Warm path data is stored once it is received by the cloud service inside some kind of storage that can be quickly accessed. ### Cold path THe cold path refers to historic data, storing data for the long term to be processed whenever needed. For example, you could use the cold path to get annual mileage reports for vehicles, or run analytics on routes to find the most optimal route to reduce fuel costs. Cold path data is stored in data warehouses - databases designed for storing large amounts of data that will never change and can be queried quickly and easily. You would normally have a regular job in your cloud application that would run at a regular time each day, week, or month to move data from warm path storage into the data warehouse. ✅ Think about the data you have captured so far in these lessons. Is it hot, warm or cold path data? ## Handle GPS events using serverless code Once data is flowing into your IoT Hub, you can write some serverless code to listen for events published to the Event-Hub compatible endpoint. This is the warm path - this data will be stored and used in the next lesson for reporting on the journey. ![Sending GPS telemetry from an IoT device to IoT Hub, then to Azure Functions via an event hub trigger](../../../images/gps-telemetry-iot-hub-functions.png) ***Sending GPS telemetry from an IoT device to IoT Hub, then to Azure Functions via an event hub trigger. GPS by mim studio / Microcontroller by Template - all from the [Noun Project](https://thenounproject.com)*** ### Task - handle GPS events using serverless code 1. Create an Azure Functions app using the Azure Functions CLI. Use the Python runtime, and create it in a folder called `gps-trigger`, and use the same name for the Functions App project name. Make sure you create a virtual environment to use for this. > ⚠️ You can refer to the [instructions for creating an Azure Functions Project from project 2, lesson 5](../../../2-farm/lessons/5-migrate-application-to-the-cloud/README.md#create-a-serverless-application) if needed. 1. Add an IoT Hub event trigger that uses the IoT Hub's Event Hub compatible endpoint. > ⚠️ You can refer to the [instructions for creating an IoT Hub event trigger from project 2, lesson 5](../../../2-farm/lessons/5-migrate-application-to-the-cloud/README.md#create-an-iot-hub-event-trigger) if needed. 1. Set the Event Hub compatible endpoint connection string in the `local.settings.json` file, and use the key for that entry in the `function.json` file. 1. Use the Azurite app as a local storage emulator 1. Run your functions app to ensure it is receiving events from your GPS device. Make sure your IoT device is also running and sending GPS data. ```output Python EventHub trigger processed an event: {"gps": {"lat": 47.73481, "lon": -122.25701}} ``` ## Azure Storage Accounts ![The Azure Storage logo](../../../images/azure-storage-logo.png) Azure Storage Accounts is a general purpose storage service that can store data in a variety of different ways. You can store data as blobs, in queues, in tables, or as files, and all at the same time. ### Blob storage The word *Blob* means binary large objects, but has become the term for any unstructured data. You can store any data in blob storage, from JSON documents containing IoT data, to image and movie files. Blob storage has the concept of *containers*, named buckets that you can store data in, similar to tables in a relational database. These containers can have one or more folders to store blobs, and each folder can contain other folders, similar to how files are stored on your computer hard disk. You will use blob storage in this lesson to store IoT data. ✅ Do some research: Read up on [Azure Blob Storage](https://docs.microsoft.com/azure/storage/blobs/storage-blobs-overview?WT.mc_id=academic-17441-jabenn) ### Table storage Table storage allows you to store semi-structured data. Table storage is actually a NoSQL database, so doesn't require a defined set of tables up front, but it is designed to store data in one or more tables, with unique keys to define each row. ✅ Do some research: Read up on [Azure Table Storage](https://docs.microsoft.com/azure/storage/tables/table-storage-overview?WT.mc_id=academic-17441-jabenn) ### Queue storage Queue storage allows you to store messages of up to 64KB in size in a queue. You can add messages to the back of the queue, and read them off the front. Queues store messages indefinitely as long as there is still storage space, so it allows messages to be stored long term. then read off when needed. For example, if you wanted to run a monthly job to process GPS data you could add it to a queue every day for a month, then at the end of the month process all the messages off the queue. ✅ Do some research: Read up on [Azure Queue Storage](https://docs.microsoft.com/azure/storage/queues/storage-queues-introduction?WT.mc_id=academic-17441-jabenn) ### File storage File storage is storage of files in the cloud, and any apps or devices can connect using industry standard protocols. You can write files to file storage, then mount it as a drive on your PC or Mac. ✅ Do some research: Read up on [Azure File Storage](https://docs.microsoft.com/azure/storage/files/storage-files-introduction?WT.mc_id=academic-17441-jabenn) ## Connect your serverless code to storage Your function app now needs to connect to blob storage to store the messages from the IoT Hub. There's 2 ways to do this: * Inside the function code, connect to blob storage using the blob storage Python SDK and write the data as blobs * Use an output function binding to bind the return value of the function to blob storage and have the blob saved automatically In this lesson, you will use the Python SDK to see how to interact with blob storage. ![Sending GPS telemetry from an IoT device to IoT Hub, then to Azure Functions via an event hub trigger, then saving it to blob storage](../../../images/save-telemetry-to-storage-from-functions.png) ***Sending GPS telemetry from an IoT device to IoT Hub, then to Azure Functions via an event hub trigger, then saving it to blob storage. GPS by mim studio / Microcontroller by Template - all from the [Noun Project](https://thenounproject.com)*** The data will be saved as a JSON blob with the following format: ```json { "device_id": , "timestamp" :