adding introduction to using computing power

pull/681/head
Simsuk 2 years ago
parent 7f34728f2f
commit c78b2ecca9

@ -32,22 +32,7 @@
- [Saving and Loading Models](https://aman.ai/primers/pytorch/#saving-and-loading-models)
- [Using the GPU](https://aman.ai/primers/pytorch/#using-the-gpu)
- [Painless Debugging](https://aman.ai/primers/pytorch/#painless-debugging)
- [Vision: Predicting Labels from Images of Hand Signs](https://aman.ai/primers/pytorch/#vision-predicting-labels-from-images-of-hand-signs)
- [Goals of This Tutorial](https://aman.ai/primers/pytorch/#goals-of-this-tutorial-1)
- [Problem Setup](https://aman.ai/primers/pytorch/#problem-setup)
- [Structure of the Dataset](https://aman.ai/primers/pytorch/#structure-of-the-dataset)
- [Creating a PyTorch Dataset](https://aman.ai/primers/pytorch/#creating-a-pytorch-dataset)
- [Loading Data Batches](https://aman.ai/primers/pytorch/#loading-data-batches)
- [Convolutional Network Model](https://aman.ai/primers/pytorch/#convolutional-network-model)
- [Resources](https://aman.ai/primers/pytorch/#resources-1)
- [NLP: Named Entity Recognition (NER) Tagging](https://aman.ai/primers/pytorch/#nlp-named-entity-recognition-ner-tagging)
- [Goals of This Tutorial](https://aman.ai/primers/pytorch/#goals-of-this-tutorial-2)
- [Problem Setup](https://aman.ai/primers/pytorch/#problem-setup-1)
- [Structure of the Dataset](https://aman.ai/primers/pytorch/#structure-of-the-dataset-1)
- [Loading Text Data](https://aman.ai/primers/pytorch/#loading-text-data)
- [Preparing a Batch](https://aman.ai/primers/pytorch/#preparing-a-batch)
- [Recurrent Network Model](https://aman.ai/primers/pytorch/#recurrent-network-model)
- [Writing a Custom Loss Function](https://aman.ai/primers/pytorch/#writing-a-custom-loss-function)
- [Selected Methods](https://aman.ai/primers/pytorch/#selected-methods)
- [Tensor Shape/size](https://aman.ai/primers/pytorch/#tensor-shapesize)
- [Initialization](https://aman.ai/primers/pytorch/#initialization)
@ -85,154 +70,40 @@
[![Colab Notebook](https://aman.ai/primers/assets/colab-open.svg)](https://colab.research.google.com/github/amanchadha/aman-ai/blob/master/pytorch.ipynb)
## Introduction
- This tutorial offers an overview of the preliminary setup, training process, loss functions and optimizers in PyTorch.
- We cover a practical demonstration of PyTorch with an example from Vision and another from NLP.
## Getting Started
### Creating a Virtual Environment
- To accommodate the fact that different projects youll be working on utilize different versions of Python modules, it is a good practice to have multiple virtual environments to work on different projects.
- [Python Setup: Remote vs. Local](https://aman.ai/primers/python-setup) offers an in-depth coverage of the various remote and local options available.
### Using a GPU?
- Note that your GPU needs to be set up first (drivers, CUDA and CuDNN).
- For PyTorch, code changes are needed to support a GPU (unlike TensorFlow which can transparently handle GPU-usage) follow the instructions [here](https://pytorch.org/docs/stable/notes/cuda.html).
### Recommended Code Structure
- We recommend the following code hierarchy to organize your data, model code, experiments, results and logs:
```
data/
train/
dev/
test/
experiments/
model/
*.py
build_dataset.py
train.py
search_hyperparams.py
synthesize_results.py
evaluate.py
```
- Purpose each file or directory serves:
- `data/`: will contain all the data of the project (generally not stored on GitHub), with an explicit train/dev/test split.
- `experiments`: contains the different experiments (will be explained in the [following](https://aman.ai/primers/pytorch/#running-experiments) section).
- `model/`: module defining the model and functions used in train or eval. Different for our PyTorch and TensorFlow examples.
- `build_dataset.py`: creates or transforms the dataset, build the split into train/dev/test.
- `train.py`: train the model on the input data, and evaluate each epoch on the dev set.
- `search_hyperparams.py`: run `train.py` multiple times with different hyperparameters.
- `synthesize_results.py`: explore different experiments in a directory and display a nice table of the results.
- `evaluate.py`: evaluate the model on the test set (should be run once at the end of your project).
### Running Experiments
- To train a model on the data, the recommended user-interface for `train.py` should be:
```
python train.py --model_dir experiments/base_model
```
- We need to pass the model directory in argument, where the hyperparameters are stored in a JSON file named `params.json`. Different experiments will be stored in different directories, each with their own `params.json` file. Here is an example:
`experiments/base_model/params.json`:
## Overview
```
{
"learning_rate": 1e-3,
"batch_size": 32,
"num_epochs": 20
}
```
PyTorch is an open-source machine learning library developed by Facebook's AI Research lab. It provides a flexible platform for building deep learning models and is known for its dynamic computational graph, which makes it particularly suitable for research. Originating as a research-centric tool, PyTorch has gained immense popularity among researchers and developers alike due to its ease of use, flexibility, and powerful capabilities.
The structure of `experiments` after running a few different models might look like this (try to give meaningful names to the directories depending on what experiment you are running):
- This tutorial gives an overview of the Pytorch for deep learning model building, trainng and evaluation with practical examples as well as descrition of possible project environments
```
experiments/
base_model/
params.json
...
learning_rate/
lr_0.1/
params.json
lr_0.01/
params.json
batch_norm/
params.json
```
## Options for utilising GPU
Working Environments for PyTorch
When diving into PyTorch, you have multiple options in terms of development environments. Two of the most popular choices are Visual Studio Code (VSCode) and Google Colab:
Each directory after training will contain multiple things:
### Visual Studio Code (VSCode)
Description: VSCode is a free, open-source code editor developed by Microsoft. It supports a variety of programming languages and has a rich ecosystem of extensions, including support for Python and PyTorch.
- `params.json`: the list of hyperparameters, in JSON format
- `train.log`: the training log (everything we print to the console)
- `train_summaries`: train summaries for TensorBoard (TensorFlow only)
- `eval_summaries`: eval summaries for TensorBoard (TensorFlow only)
- `last_weights`: weights saved from the 5 last epochs
- `best_weights`: best weights (based on dev accuracy)
### Google Colab:
Offers cloud-based GPUs, which can be especially beneficial if your local machine doesn't have a powerful GPU or any GPU at all. However, there are usage limits to be aware of, as prolonged or heavy usage might lead to temporary restrictions, a longer training might stop unexpectedly due to lack of memory. It's essentially a Jupyter notebook environment that requires no setup. **_Unfortunatelly, there is only limited amount of computing power without credits and there is no free trial._**
#### Training and Evaluation
**GPU Usage**: Google Colab provides free GPU access. To enable it, go to Runtime > Change runtime type > and select GPU under hardware accelerator.
- To train a model with the parameters provided in the configuration file `experiments/base_model/params.json`, the recommended user-interface is:
```
python train.py --model_dir experiments/base_model
```
- Once training is done, we can evaluate on the test set using:
```
python evaluate.py --model_dir experiments/base_model
```
### GPU Differences Between VSCode and Google Colab
VSCode: Utilizes your local machine's GPU. This means the performance is dependent on your hardware. Setting up GPU support might require additional configurations, especially for CUDA compatibility.
#### Hyperparameter Search
GPU Usage: If you have a local GPU, you can set up PyTorch to utilize it directly from VSCode. This requires proper drivers and CUDA installation.
- We provide an example that will call `train.py` with different values of learning rate. We first create a directory with a `params.json` file that contains the other hyperparameters.
```
experiments/
learning_rate/
params.json
```
- Next, call `python python search_hyperparams.py --parent_dir experiments/learning_rate` to train and evaluate a model with different values of learning rate defined in `search_hyperparams.py`. This will create a new directory for each experiment under `experiments/learning_rate/`.
- The output would resemble the hierarchy below:
```
experiments/
learning_rate/
learning_rate_0.001/
metrics_eval_best_weights.json
learning_rate_0.01/
metrics_eval_best_weights.json
...
```
#### Display the Results of Multiple Experiments
- If you want to aggregate the metrics computed in each experiment (the `metrics_eval_best_weights.json` files), the recommended user-interface is:
```
python synthesize_results.py --parent_dir experiments/learning_rate
```
- It will display a table synthesizing the results like this that is compatible with markdown:
|   | accuracy | loss |
| --- | --- | --- |
| base\_model | 0.989 | 0.0550 |
| learning\_rate/learning\_rate\_0.01 | 0.939 | 0.0324 |
| learning\_rate/learning\_rate\_0.001 | 0.979 | 0.0623 |
Should you prefer using VSCode or other IDEs surhc as Pycharm, but you need GPU performance from Google Colab, there are ways to connect from VSCode to you account. For this please refer to other tutorials.
- [Connecting to Google Colab from VSCode](https://saturncloud.io/blog/is-it-possible-to-connect-vscode-on-a-local-machine-with-google-colab-the-free-service-runtime/#:~:text=In%20conclusion%2C%20it%20is%20possible,GPU%20runtime%20offered%20by%20Colab.)
### Free power of GPU? Use Azure student account! 👌👌👌
Microsoft Azure gves free 100$ for students which migh come helpful when the working with larger models. Making mutliple accounts for this purpose can cover most or all processing needs of the group in AI Track. It is again possibel to connect to the remote copute while using these credits and simply run on Azure GPU while still using project in VSCode:
- [Tutorial on remotecomputes using VSCode and Microsoft Azure](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-launch-vs-code-remote?view=azureml-api-2&tabs=extension)
Alternatively avoid training large models from sratch, this will be adressed in the next tutorial.
## PyTorch Introduction
### Goals of This Tutorial
- Learn more about PyTorch.
@ -247,37 +118,6 @@ python synthesize_results.py --parent_dir experiments/learning_rate
- Justin Johnsons [repository](https://github.com/jcjohnson/pytorch-examples) introduces fundamental PyTorch concepts through self-contained examples.
- Tons of resources in this [list](https://github.com/ritchieng/the-incredible-pytorch).
### Code Layout
- We recommend the following code hierarchy to organize your data, model code, experiments, results and logs:
```
data/
experiments/
model/
net.py
data_loader.py
train.py
evaluate.py
search_hyperparams.py
synthesize_results.py
evaluate.py
utils.py
```
- `model/net.py`: specifies the neural network architecture, the loss function and evaluation metrics
- `model/data_loader.py`: specifies how the data should be fed to the network
- `train.py`: contains the main training loop
- `evaluate.py`: contains the main loop for evaluating the model
- `utils.py`: utility functions for handling hyperparams/logging/storing model
- We recommend reading through `train.py` to get a high-level overview.
- Once you get the high-level idea, depending on your task and dataset, you might want to modify:
- `model/net.py` to change the model, i.e., how you transform your input into your prediction as well as your loss, etc.
- `model/data_loader.py` to change the way you feed data to the model.
- `train.py` and `evaluate.py` to make changes specific to your problem, if required
## Tensors and Variables
- Before going further, we strongly suggest going through [60 Minute Blitz with PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) to gain an understanding of PyTorch basics. This section offers a sneak peak into the same concepts.
@ -811,435 +651,11 @@ if params.cuda:
- That concludes the introduction to the PyTorch code examples. Next, we take upon an example from [vision](https://aman.ai/primers/pytorch/#vision-predicting-labels-from-images-of-hand-signs) and [NLP](https://aman.ai/primers/pytorch/#nlp-named-entity-recognition-ner-tagging) to understand how we load data and define models specific to each domain.
## Vision: Predicting Labels from Images of Hand Signs
### Goals of This Tutorial
- Learn how to use PyTorch to load image data efficiently.
- Formulate a convolutional neural network in code.
- Understand the key aspects of the code well-enough to modify it to suit your needs.
### Problem Setup
- Well use the SIGNS dataset from [deeplearning.ai](https://www.deeplearning.ai/). The dataset consists of 1080 training images and 120 test images.
- Each image from this dataset is a picture of a hand making a sign that represents a number between 1 and 6. For our particular use-case, well scale down images to size $64 \times 64$.
### Structure of the Dataset
- For the vision example, we will used the **SIGNS dataset** created for the Coursera Deep Learning Specialization. The dataset is hosted on google drive, download it [here](https://drive.google.com/file/d/1ufiR6hUKhXoAyiBNsySPkUwlvE_wfEHC/view).
- This will download the SIGNS dataset (`~1.1 GB`) containing photos of hands signs representing numbers between `0` and `5`. Here is the structure of the data:
```
SIGNS/
train_signs/
0_IMG_5864.jpg
...
test_signs/
0_IMG_5942.jpg
...
```
- The images are named following `{label}_IMG_{id}.jpg` where the label is in $\text{[0, 5]}$.
- Once the download is complete, move the dataset into the `data/SIGNS` folder. Run `python build_dataset.py` which will resize the images to size $(64, 64)$. The new resized dataset will be located by default in `data/64x64_SIGNS`.
### Creating a PyTorch Dataset
- `torch.utils.data` provides some nifty functionality for loading data. We use `torch.utils.data.Dataset`, which is an abstract class representing a dataset. To make our own `SIGNSDataset` class, we need to inherit the `Dataset` class and override the following methods:
- `__len__`: so that `len(dataset)` returns the size of the dataset
- `__getitem__`: to support indexing using `dataset[i]` to get the ith image
- We then define our class as below:
```
from PIL import Image
from torch.utils.data import Dataset, DataLoader
class SIGNSDataset(Dataset):
def __init__(self, data_dir, transform):
# store filenames
# self.filenames = os.listdir(data_dir) or ...
self.filenames = [os.path.join(data_dir, f) for f in self.filenames]
# the first character of the filename contains the label
self.labels = [int(filename.split('/')[-1][0]) for filename in self.filenames]
self.transform = transform
def __len__(self):
# return size of dataset
return len(self.filenames)
def __getitem__(self, idx):
# open image, apply transforms and return with label
image = Image.open(self.filenames[idx]) # PIL image
image = self.transform(image)
return image, self.labels[idx]
```
- Notice that when we return an image-label pair using `__getitem__` we apply a `transform` on the image. These transformations are a part of the `torchvision.transforms` [package](https://pytorch.org/docs/master/torchvision/transforms.html), that allow us to manipulate images easily. Consider the following composition of multiple transforms:
```
train_transformer = transforms.Compose([
transforms.Resize(64), # resize the image to 64x64
transforms.RandomHorizontalFlip(), # randomly flip image horizontally
transforms.ToTensor()]) # transform it into a PyTorch Tensor
```
- When we apply `self.transform(image)` in `__getitem__`, we pass it through the above transformations before using it as a training example. The final output is a PyTorch Tensor. To augment the dataset during training, we also use the `RandomHorizontalFlip` transform when loading the image.
- We can specify a similar `eval_transformer` for evaluation without the random flip. To load a `Dataset` object for the different splits of our data, we simply use:
```
train_dataset = SIGNSDataset(train_data_path, train_transformer)
val_dataset = SIGNSDataset(val_data_path, eval_transformer)
test_dataset = SIGNSDataset(test_data_path, eval_transformer)
```
### Loading Data Batches
- `torch.utils.data.DataLoader` provides an iterator that takes in a `Dataset` object and performs **batching**, **shuffling** and **loading** of the data. This is crucial when images are big in size and take time to load. In such cases, the GPU can be left idling while the CPU fetches the images from file and then applies the transforms.
- In contrast, the DataLoader class (using multiprocessing) fetches the data asynchronously and prefetches batches to be sent to the GPU. Initializing the `DataLoader` is quite easy:
```
train_dataloader = DataLoader(SIGNSDataset(train_data_path, train_transformer),
batch_size=hyperparams.batch_size, shuffle=True,
num_workers=hyperparams.num_workers)
```
- We can then iterate through batches of examples as follows:
```
for train_batch, labels_batch in train_dataloader:
# wrap Tensors in Variables
train_batch, labels_batch = Variable(train_batch), Variable(labels_batch)
# pass through model, perform backpropagation and updates
output_batch = model(train_batch)
...
```
- Applying transformations on the data loads them as PyTorch Tensors. We wrap them in PyTorch Variables before passing them into the model. The `for` loop ends after one pass over the data, i.e., after one epoch. It can be reused again for another epoch without any changes. We can use similar data loaders for validation and test data.
- To read more on splitting the dataset into train/dev/test, see our tutorial on [splitting datasets](https://aman.ai/primers/ai/data-split).
### Convolutional Network Model
- Now that weve figured out how to load our images, lets have a look at the pièce de résistance the CNN model. As mentioned in the section on [tensors and variables](https://aman.ai/primers/pytorch/#tensors-and-variables), we first define the components of our model, followed by its functional form. Lets have a look at the `__init__` function for our model that takes in a $3 \times 64 \times 64$ image:
```
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
# we define convolutional layers
self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size = 3, stride = 1, padding = 1)
self.bn1 = nn.BatchNorm2d(32)
self.conv2 = nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size = 3, stride = 1, padding = 1)
self.bn2 = nn.BatchNorm2d(64)
self.conv3 = nn.Conv2d(in_channels = 64, in_channels = 128, kernel_size = 3, stride = 1, padding = 1)
self.bn3 = nn.BatchNorm2d(128)
# 2 fully connected layers to transform the output of the convolution layers to the final output
self.fc1 = nn.Linear(in_features = 8*8*128, out_features = 128)
self.fcbn1 = nn.BatchNorm1d(128)
self.fc2 = nn.Linear(in_features = 128, out_features = 6)
self.dropout_rate = hyperparams.dropout_rate
```
- The first parameter to the convolutional filter `nn.Conv2d` is the number of input channels, the second is the number of output channels, and the third is the size of the square filter ($3 \times 3$ in this case). Similarly, the batch normalization layer takes as input the number of channels for 2D images and the number of features in the 1D case. The fully connected `Linear` layers take the input and output dimensions.
- In this example, we explicitly specify each of the values. In order to make the initialization of the model more flexible, you can pass in parameters such as image size to the `__init__` function and use that to specify the sizes. You must be very careful when specifying parameter dimensions, since mismatches will lead to errors in the forward propagation. Lets now look at the forward propagation:
```
def forward(self, s):
# We apply the convolution layers, followed by batch normalisation,
# MaxPool and ReLU x 3
s = self.bn1(self.conv1(s)) # batch_size x 32 x 64 x 64
s = F.relu(F.max_pool2d(s, 2)) # batch_size x 32 x 32 x 32
s = self.bn2(self.conv2(s)) # batch_size x 64 x 32 x 32
s = F.relu(F.max_pool2d(s, 2)) # batch_size x 64 x 16 x 16
s = self.bn3(self.conv3(s)) # batch_size x 128 x 16 x 16
s = F.relu(F.max_pool2d(s, 2)) # batch_size x 128 x 8 x 8
# Flatten the output for each image
s = s.view(-1, 8*8*128) # batch_size x 8*8*128
# Apply 2 fully connected layers with dropout
s = F.dropout(F.relu(self.fcbn1(self.fc1(s))),
p = self.dropout_rate, training=self.training) # batch_size x 128
s = self.fc2(s) # batch_size x 6
return F.log_softmax(s, dim=1)
```
- We pass the image through 3 layers of `conv > bn > max_pool > relu`, followed by flattening the image and then applying 2 fully connected layers. In flattening the output of the convolution layers to a single vector per image, we use `s.view(-1, 8*8*128)`. Here the size `-1` is [implicitly inferred](https://aman.ai/primers/numpy/#-1-in-reshape) from the other dimension (batch size in this case). The output is a `log_softmax` over the 6 labels for each example in the batch. We use `log_softmax` since it is numerically more stable than first taking the softmax and then the log.
- And thats it! We use an appropriate loss function (Negative Loss Likelihood, since the output is already softmax-ed and log-ed) and train the model as discussed in the previous post. Remember, you can set a breakpoint using `import pdb; pdb.set_trace()` at any place in the forward function, examine the dimensions of variables, tinker around and diagnose whats wrong. Thats the beauty of PyTorch :).
### Resources
- [Data Loading and Processing Tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html): an official tutorial from the PyTorch website
- [ImageNet](https://github.com/pytorch/examples/blob/master/imagenet/main.py): Code for training on ImageNet in PyTorch
## NLP: Named Entity Recognition (NER) Tagging
### Goals of This Tutorial
- Learn how to use PyTorch to load sequential data.
- Define a recurrent neural network that operates on text (or more generally, sequential data).
- Understand the key aspects of the code well-enough to modify it to suit your needs
### Problem Setup
- We explore the problem of [Named Entity Recognition](https://en.wikipedia.org/wiki/Named-entity_recognition) (NER) tagging of sentences.
- The task is to tag each token in a given sentence with an appropriate tag such as Person, Location, etc.
```
John lives in New York
B-PER O O B-LOC I-LOC
```
- Our dataset will thus need to load both the sentences and labels. We will store those in 2 different files, a `sentence.txt` file containing the sentences (one per line) and a `labels.txt` containing the labels. For example:
```
# sentences.txt
John lives in New York
Where is John ?
```
```
# labels.txt
B-PER O O B-LOC I-LOC
O O B-PER O
```
- Here we assume that we ran the `build_vocab.py` script that creates a vocabulary file in our `/data` directory. Running the script gives us one file for the words and one file for the labels. They will contain one token per line. For instance,
```
# words.txt
John
lives
in
...
```
and
```
#tags.txt
B-PER
B-LOC
...
```
### Structure of the Dataset
- Download the original version on the [Kaggle](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/data) website.
- **Download the dataset:** `ner_dataset.csv` on [Kaggle](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/data) and save it under the `nlp/data/kaggle` directory. Make sure you download the simple version `ner_dataset.csv` and NOT the full version `ner.csv`.
- **Build the dataset:** Run the following script:
```
python build_kaggle_dataset.py
```
- It will extract the sentences and labels from the dataset, split it into train / test / dev and save it in a convenient format for our model. Here is the structure of the data
```
kaggle/
train/
sentences.txt
labels.txt
test/
sentences.txt
labels.txt
dev/
sentences.txt
labels.txt
```
- If this errors out, check that you downloaded the right file and saved it in the right directory. If you have issues with encoding, try running the script with Python 2.7.
- **Build the vocabulary:** For both datasets, `data/small` and `data/kaggle` you need to build the vocabulary, with:
```
python build_vocab.py --data_dir data/small
```
or
```
python build_vocab.py --data_dir data/kaggle
```
### Loading Text Data
- In NLP applications, a sentence is represented by the sequence of indices of the words in the sentence. For example if our vocabulary is `{'is':1, 'John':2, 'Where':3, '.':4, '?':5}` then the sentence “Where is John ?” is represented as `[3,1,2,5]`. We read the `words.txt` file and populate our vocabulary:
```
vocab = {}
with open(words_path) as f:
for i, l in enumerate(f.read().splitlines()):
vocab[l] = i
```
- In a similar way, we load a mapping `tag_map` from our labels from `tags.txt` to indices. Doing so gives us indices for labels in the range $\text{[0, 1, …, NUM_TAGS-1]}$.
- In addition to words read from English sentences, `words.txt` contains two special tokens: an `UNK` token to represent any word that is not present in the vocabulary, and a `PAD` token that is used as a filler token at the end of a sentence when one batch has sentences of unequal lengths.
- We are now ready to load our data. We read the sentences in our dataset (either train, validation or test) and convert them to a sequence of indices by looking up the vocabulary:
```
train_sentences = []
train_labels = []
with open(train_sentences_file) as f:
for sentence in f.read().splitlines():
# replace each token by its index if it is in vocab
# else use index of UNK
s = [vocab[token] if token in self.vocab
else vocab['UNK']
for token in sentence.split(' ')]
train_sentences.append(s)
with open(train_labels_file) as f:
for sentence in f.read().splitlines():
# replace each label by its index
l = [tag_map[label] for label in sentence.split(' ')]
train_labels.append(l)
```
- We can load the validation and test data in a similar fashion.
### Preparing a Batch
- This is where it gets fun. When we sample a batch of sentences, not all the sentences usually have the same length. Lets say we have a batch of sentences `batch_sentences` that is a Python list of lists, with its corresponding `batch_tags` which has a tag for each token in `batch_sentences`. We convert them into a batch of PyTorch Variables as follows:
```
# compute length of longest sentence in batch
batch_max_len = max([len(s) for s in batch_sentences])
# prepare a numpy array with the data, initializing the data with 'PAD'
# and all labels with -1; initializing labels to -1 differentiates tokens
# with tags from 'PAD' tokens
batch_data = vocab['PAD']*np.ones((len(batch_sentences), batch_max_len))
batch_labels = -1*np.ones((len(batch_sentences), batch_max_len))
# copy the data to the numpy array
for j in range(len(batch_sentences)):
cur_len = len(batch_sentences[j])
batch_data[j][:cur_len] = batch_sentences[j]
batch_labels[j][:cur_len] = batch_tags[j]
# since all data are indices, we convert them to torch LongTensors
batch_data, batch_labels = torch.LongTensor(batch_data), torch.LongTensor(batch_labels)
# convert Tensors to Variables
batch_data, batch_labels = Variable(batch_data), Variable(batch_labels)
```
- A lot of things happened in the above code. We first calculated the length of the longest sentence in the batch. We then initialized NumPy arrays of dimension `(num_sentences, batch_max_len)` for the sentence and labels, and filled them in from the lists.
- Since the values are indices (and not floats), PyTorchs Embedding layer expects inputs to be of the `Long` type. We hence convert them to `LongTensor`.
- After filling them in, we observe that the sentences that are shorter than the longest sentence in the batch have the special token `PAD` to fill in the remaining space. Moreover, the `PAD` tokens, introduced as a result of packaging the sentences in a matrix, are assigned a label of `-1`. Doing so differentiates them from other tokens that have label indices in the range $\text{[0, 1, …, NUM_TAGS-1]}$. This will be crucial when we calculate the loss for our models prediction, and well come to that in a bit.
- In our code, we package the above code in a custom `data_iterator` function. Hyperparameters are stored in a data structure called “params”. We can then use the generator as follows:
```
# train_data contains train_sentences and train_labels
# params contains batch_size
train_iterator = data_iterator(train_data, params, shuffle=True)
for _ in range(num_training_steps):
batch_sentences, batch_labels = next(train_iterator)
# pass through model, perform backpropagation and updates
output_batch = model(train_batch)
...
```
### Recurrent Network Model
- Now that we have figured out how to load our sentences and tags, lets have a look at the Recurrent Neural Network model. As mentioned in the section on [tensors and variables](https://aman.ai/primers/pytorch/#tensors-and-variables), we first define the components of our model, followed by its functional form. Lets have a look at the `__init__` function for our model that takes in `(batch_size, batch_max_len)` dimensional data:
```
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self, params):
super(Net, self).__init__()
# maps each token to an embedding_dim vector
self.embedding = nn.Embedding(params.vocab_size, params.embedding_dim)
# the LSTM takens embedded sentence
self.lstm = nn.LSTM(params.embedding_dim, params.lstm_hidden_dim, batch_first=True)
# FC layer transforms the output to give the final output layer
self.fc = nn.Linear(params.lstm_hidden_dim, params.number_of_tags)
```
- We use an LSTM for the recurrent network. Before running the LSTM, we first transform each word in our sentence to a vector of dimension `embedding_dim`. We then run the LSTM over this sentence. Finally, we have a fully connected layer that transforms the output of the LSTM for each token to a distribution over tags. This is implemented in the forward propagation function:
```
def forward(self, s):
# apply the embedding layer that maps each token to its embedding
s = self.embedding(s) # dim: batch_size x batch_max_len x embedding_dim
# run the LSTM along the sentences of length batch_max_len
s, _ = self.lstm(s) # dim: batch_size x batch_max_len x lstm_hidden_dim
# reshape the Variable so that each row contains one token
s = s.view(-1, s.shape[2]) # dim: batch_size*batch_max_len x lstm_hidden_dim
# apply the fully connected layer and obtain the output for each token
s = self.fc(s) # dim: batch_size*batch_max_len x num_tags
return F.log_softmax(s, dim=1) # dim: batch_size*batch_max_len x num_tags
```
- The embedding layer augments an extra dimension to our input which then has shape `(batch_size, batch_max_len, embedding_dim)`. We run it through the LSTM which gives an output for each token of length `lstm_hidden_dim`. In the next step, we open up the 3D Variable and reshape it such that we get the hidden state for each token, i.e., the new dimension is `(batch_size*batch_max_len, lstm_hidden_dim)`. Here the `-1` is implicitly inferred to be equal to `batch_size*batch_max_len`. The reason behind this reshaping is that the fully connected layer assumes a 2D input, with one example along each row.
- After the reshaping, we apply the fully connected layer which gives a vector of `NUM_TAGS` for each token in each sentence. The output is a `log_softmax` over the tags for each token. We use `log_softmax` since it is numerically more stable than first taking the softmax and then the log.
- All that is left is to compute the loss. But theres a catch - we cant use a `torch.nn.loss` function straight out of the box because that would add the loss from the `PAD` tokens as well. Heres where the power of PyTorch comes into play - we can write our own custom loss function!
### Writing a Custom Loss Function
- In the section on [loading data batches](https://aman.ai/primers/pytorch/#loading-data-batches), we ensured that the labels for the `PAD` tokens were set to `-1`. We can leverage this to filter out the `PAD` tokens when we compute the loss. Let us see how:
```
def loss_fn(outputs, labels):
# reshape labels to give a flat vector of length batch_size*seq_len
labels = labels.view(-1)
# mask out 'PAD' tokens
mask = (labels >= 0).float()
# the number of tokens is the sum of elements in mask
num_tokens = int(torch.sum(mask).data[0])
# pick the values corresponding to labels and multiply by mask
outputs = outputs[range(outputs.shape[0]), labels]*mask
# cross entropy loss for all non 'PAD' tokens
return -torch.sum(outputs)/num_tokens
```
- The input labels has dimension `(batch_size, batch_max_len)` while outputs has dimension `(batch_size*batch_max_len, NUM_TAGS)`. We compute a mask using the fact that all `PAD` tokens in `labels` have the value `-1`. We then compute the Negative Log Likelihood Loss (remember the output from the network is already softmax-ed and log-ed!) for all the non `PAD` tokens. We can now compute derivates by simply calling `.backward()` on the loss returned by this function.
- Remember, you can set a breakpoint using `import pdb; pdb.set_trace()` at any place in the forward function, loss function or virtually anywhere and examine the dimensions of the Variables, tinker around and diagnose whats wrong. Thats the beauty of PyTorch :).
## Selected Methods
- PyTorch provides a host of useful functions for performing computations on arrays. Below, weve touched upon some of the most useful ones that youll encounter regularly in projects.
@ -1981,14 +1397,7 @@ Estimated Total Size (MB): 746.96
## Citation
If you found our work useful, please cite it as:
This tutorial was partially adopted from:
```
@article{Chadha2020PyTorchPrimer,
title = {PyTorch Primer},
author = {Chadha, Aman},
journal = {Distilled AI},
year = {2020},
note = {\url{https://aman.ai}}
}
Chadha, A. (2020). PyTorch Primer. Distilled AI. https://aman.ai
```

@ -11,6 +11,7 @@ Embrace the future of AI with confidence. Let's embark on this transformative jo
**✍️ Hearty thanks to our authors** ........
**🎨 Thanks as well to our illustrators** .....
@ -19,39 +20,12 @@ Embrace the future of AI with confidence. Let's embark on this transformative jo
# Getting Started
**[Students](https://aka.ms/student-page)**, to use this curriculum, fork the entire repo to your own GitHub account and complete the exercises on your own or with a group:
- Start with a pre-lecture quiz.
- Read the lecture and complete the activities, pausing and reflecting at each knowledge check.
- Try to create the projects by comprehending the lessons rather than running the solution code; however that code is available in the `/solution` folders in each project-oriented lesson.
- Take the post-lecture quiz.
- Complete the challenge.
- Complete the assignment.
- After completing a lesson group, visit the [Discussion Board](https://github.com/microsoft/ML-For-Beginners/discussions) and "learn out loud" by filling out the appropriate PAT rubric. A 'PAT' is a Progress Assessment Tool that is a rubric you fill out to further your learning. You can also react to other PATs so we can learn together.
> For further study, we recommend following these [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/k7o7tg1gp306q4?WT.mc_id=academic-77952-leestott) modules and learning paths.
**Teachers**, we have [included some suggestions](for-teachers.md) on how to use this curriculum.
This repository includes pages with introductory tutiruals and materials that are crucial for the AI Track projects.
The curriculum does not include all but necessary materials on deep leanring in Python as well as useful tips from the 2022-2023 HA AI Track members. In case you see there is more to be added, feel free to contribute and extend the materials or link your own project for reference in the topics.
---fdfd
> For further study, we recommend following these
....................[Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/k7o7tg1gp306q4?WT.mc_id=academic-77952-leestott) modules and learning paths.
## Video walkthroughs
Some of the lessons are available as short form video. You can find all these in-line in the lessons, or on the [ML for Beginners playlist on the Microsoft Developer YouTube channel](https://aka.ms/ml-beginners-videos) by clicking the image below.
[![ML for beginners banner](./ml-for-beginners-video-banner.png)](https://aka.ms/ml-beginners-videos)
---
## Meet the Team
[![Promo video](ml.gif)](https://youtu.be/Tj1XWrDSYJU "Promo video")
**Gif by** [Mohit Jaisal](https://linkedin.com/in/mohitjaisal)
> 🎥 Click the image above for a video about the project and the folks who created it!
---
## Pedagogy
@ -61,53 +35,12 @@ By ensuring that the content aligns with projects, the process is made more enga
> Find our [Code of Conduct](CODE_OF_CONDUCT.md), [Contributing](CONTRIBUTING.md), and [Translation](TRANSLATIONS.md) guidelines. We welcome your constructive feedback!
## Each lesson includes:
- optional sketchnote
- optional supplemental video
- video walkthrough (some lessons only)
- pre-lecture warmup quiz
- written lesson
- for project-based lessons, step-by-step guides on how to build the project
- knowledge checks
- a challenge
- supplemental reading
- assignment
- post-lecture quiz
> **A note about languages**: These lessons are primarily written in Python, but many are also available in R. To complete an R lesson, go to the `/solution` folder and look for R lessons. They include an .rmd extension that represents an **R Markdown** file which can be simply defined as an embedding of `code chunks` (of R or other languages) and a `YAML header` (that guides how to format outputs such as PDF) in a `Markdown document`. As such, it serves as an exemplary authoring framework for data science since it allows you to combine your code, its output, and your thoughts by allowing you to write them down in Markdown. Moreover, R Markdown documents can be rendered to output formats such as PDF, HTML, or Word.
> **A note about quizzes**: All quizzes are contained [in this app](https://gray-sand-07a10f403.1.azurestaticapps.net/), for 52 total quizzes of three questions each. They are linked from within the lessons but the quiz app can be run locally; follow the instruction in the `quiz-app` folder.
| Lesson Number | Topic | Lesson Grouping | Learning Objectives | Linked Lesson | Author |
| :-----------: | :------------------------------------------------------------: | :-------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------: |
| 01 | Introduction to machine learning | [Introduction](1-Introduction/README.md) | Learn the basic concepts behind machine learning | [Lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
| 02 | The History of machine learning | [Introduction](1-Introduction/README.md) | Learn the history underlying this field | [Lesson](1-Introduction/2-history-of-ML/README.md) | Jen and Amy |
| 03 | Fairness and machine learning | [Introduction](1-Introduction/README.md) | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [Lesson](1-Introduction/3-fairness/README.md) | Tomomi |
| 04 | Techniques for machine learning | [Introduction](1-Introduction/README.md) | What techniques do ML researchers use to build ML models? | [Lesson](1-Introduction/4-techniques-of-ML/README.md) | Chris and Jen |
| 05 | Introduction to regression | [Regression](2-Regression/README.md) | Get started with Python and Scikit-learn for regression models | <ul><li>[Python](2-Regression/1-Tools/README.md)</li><li>[R](2-Regression/1-Tools/solution/R/lesson_1.html)</li></ul> | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 06 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Visualize and clean data in preparation for ML | <ul><li>[Python](2-Regression/2-Data/README.md)</li><li>[R](2-Regression/2-Data/solution/R/lesson_2.html)</li></ul> | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 07 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build linear and polynomial regression models | <ul><li>[Python](2-Regression/3-Linear/README.md)</li><li>[R](2-Regression/3-Linear/solution/R/lesson_3.html)</li></ul> | <ul><li>Jen and Dmitry</li><li>Eric Wanjau</li></ul> |
| 08 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build a logistic regression model | <ul><li>[Python](2-Regression/4-Logistic/README.md) </li><li>[R](2-Regression/4-Logistic/solution/R/lesson_4.html)</li></ul> | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 09 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a web app to use your trained model | [Python](3-Web-App/1-Web-App/README.md) | Jen |
| 10 | Introduction to classification | [Classification](4-Classification/README.md) | Clean, prep, and visualize your data; introduction to classification | <ul><li> [Python](4-Classification/1-Introduction/README.md) </li><li>[R](4-Classification/1-Introduction/solution/R/lesson_10.html) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 11 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Introduction to classifiers | <ul><li> [Python](4-Classification/2-Classifiers-1/README.md)</li><li>[R](4-Classification/2-Classifiers-1/solution/R/lesson_11.html) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 12 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | More classifiers | <ul><li> [Python](4-Classification/3-Classifiers-2/README.md)</li><li>[R](4-Classification/3-Classifiers-2/solution/R/lesson_12.html) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 13 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Build a recommender web app using your model | [Python](4-Classification/4-Applied/README.md) | Jen |
| 14 | Introduction to clustering | [Clustering](5-Clustering/README.md) | Clean, prep, and visualize your data; Introduction to clustering | <ul><li> [Python](5-Clustering/1-Visualize/README.md)</li><li>[R](5-Clustering/1-Visualize/solution/R/lesson_14.html) | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 15 | Exploring Nigerian Musical Tastes 🎧 | [Clustering](5-Clustering/README.md) | Explore the K-Means clustering method | <ul><li> [Python](5-Clustering/2-K-Means/README.md)</li><li>[R](5-Clustering/2-K-Means/solution/R/lesson_15.html) | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 16 | Introduction to natural language processing ☕️ | [Natural language processing](6-NLP/README.md) | Learn the basics about NLP by building a simple bot | [Python](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
| 17 | Common NLP Tasks ☕️ | [Natural language processing](6-NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [Python](6-NLP/2-Tasks/README.md) | Stephen |
| 18 | Translation and sentiment analysis ♥️ | [Natural language processing](6-NLP/README.md) | Translation and sentiment analysis with Jane Austen | [Python](6-NLP/3-Translation-Sentiment/README.md) | Stephen |
| 19 | Romantic hotels of Europe ♥️ | [Natural language processing](6-NLP/README.md) | Sentiment analysis with hotel reviews 1 | [Python](6-NLP/4-Hotel-Reviews-1/README.md) | Stephen |
| 20 | Romantic hotels of Europe ♥️ | [Natural language processing](6-NLP/README.md) | Sentiment analysis with hotel reviews 2 | [Python](6-NLP/5-Hotel-Reviews-2/README.md) | Stephen |
| 21 | Introduction to time series forecasting | [Time series](7-TimeSeries/README.md) | Introduction to time series forecasting | [Python](7-TimeSeries/1-Introduction/README.md) | Francesca |
| 22 | ⚡️ World Power Usage ⚡️ - time series forecasting with ARIMA | [Time series](7-TimeSeries/README.md) | Time series forecasting with ARIMA | [Python](7-TimeSeries/2-ARIMA/README.md) | Francesca |
| 23 | ⚡️ World Power Usage ⚡️ - time series forecasting with SVR | [Time series](7-TimeSeries/README.md) | Time series forecasting with Support Vector Regressor | [Python](7-TimeSeries/3-SVR/README.md) | Anirban |
| 24 | Introduction to reinforcement learning | [Reinforcement learning](8-Reinforcement/README.md) | Introduction to reinforcement learning with Q-Learning | [Python](8-Reinforcement/1-QLearning/README.md) | Dmitry |
| 25 | Help Peter avoid the wolf! 🐺 | [Reinforcement learning](8-Reinforcement/README.md) | Reinforcement learning Gym | [Python](8-Reinforcement/2-Gym/README.md) | Dmitry |
| Postscript | Real-World ML scenarios and applications | [ML in the Wild](9-Real-World/README.md) | Interesting and revealing real-world applications of classical ML | [Lesson](9-Real-World/1-Applications/README.md) | Team |
| Postscript | Model Debugging in ML using RAI dashboard | [ML in the Wild](9-Real-World/README.md) | Model Debugging in Machine Learning using Responsible AI dashboard components | [Lesson](9-Real-World/2-Debugging-ML-Models/README.md) | Ruth Yakubu |
| 01 | ⚡️ Introduction to Pytorch ⚡️ | [Introduction](1-Introduction/README.md) | Learn the basic concepts of deep learning using Pytorch | [Lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
## Offline access

Loading…
Cancel
Save