Co-authored-by: leestott <2511341+leestott@users.noreply.github.com>copilot/fix-0de5e46c-afe2-43ab-8c38-67d5a3358ccc
parent
fc45572aa6
commit
3503f04860
@ -0,0 +1,239 @@
|
||||
# Installation Guide
|
||||
|
||||
This guide will help you set up your environment to work with the Data Science for Beginners curriculum.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Quick Start Options](#quick-start-options)
|
||||
- [Local Installation](#local-installation)
|
||||
- [Verify Your Installation](#verify-your-installation)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before you begin, you should have:
|
||||
|
||||
- Basic familiarity with command line/terminal
|
||||
- A GitHub account (free)
|
||||
- Stable internet connection for initial setup
|
||||
|
||||
## Quick Start Options
|
||||
|
||||
### Option 1: GitHub Codespaces (Recommended for Beginners)
|
||||
|
||||
The easiest way to get started is with GitHub Codespaces, which provides a complete development environment in your browser.
|
||||
|
||||
1. Navigate to the [repository](https://github.com/microsoft/Data-Science-For-Beginners)
|
||||
2. Click the **Code** dropdown menu
|
||||
3. Select the **Codespaces** tab
|
||||
4. Click **Create codespace on main**
|
||||
5. Wait for the environment to initialize (2-3 minutes)
|
||||
|
||||
Your environment is now ready with all dependencies pre-installed!
|
||||
|
||||
### Option 2: Local Development
|
||||
|
||||
For working on your own computer, follow the detailed instructions below.
|
||||
|
||||
## Local Installation
|
||||
|
||||
### Step 1: Install Git
|
||||
|
||||
Git is required to clone the repository and track your changes.
|
||||
|
||||
**Windows:**
|
||||
- Download from [git-scm.com](https://git-scm.com/download/win)
|
||||
- Run the installer with default settings
|
||||
|
||||
**macOS:**
|
||||
- Install via Homebrew: `brew install git`
|
||||
- Or download from [git-scm.com](https://git-scm.com/download/mac)
|
||||
|
||||
**Linux:**
|
||||
```bash
|
||||
# Debian/Ubuntu
|
||||
sudo apt-get update
|
||||
sudo apt-get install git
|
||||
|
||||
# Fedora
|
||||
sudo dnf install git
|
||||
|
||||
# Arch
|
||||
sudo pacman -S git
|
||||
```
|
||||
|
||||
### Step 2: Clone the Repository
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/microsoft/Data-Science-For-Beginners.git
|
||||
|
||||
# Navigate to the directory
|
||||
cd Data-Science-For-Beginners
|
||||
```
|
||||
|
||||
### Step 3: Install Python and Jupyter
|
||||
|
||||
Python 3.7 or higher is required for the data science lessons.
|
||||
|
||||
**Windows:**
|
||||
1. Download Python from [python.org](https://www.python.org/downloads/)
|
||||
2. During installation, check "Add Python to PATH"
|
||||
3. Verify installation:
|
||||
```bash
|
||||
python --version
|
||||
```
|
||||
|
||||
**macOS:**
|
||||
```bash
|
||||
# Using Homebrew
|
||||
brew install python3
|
||||
|
||||
# Verify installation
|
||||
python3 --version
|
||||
```
|
||||
|
||||
**Linux:**
|
||||
```bash
|
||||
# Most Linux distributions come with Python pre-installed
|
||||
python3 --version
|
||||
|
||||
# If not installed:
|
||||
# Debian/Ubuntu
|
||||
sudo apt-get install python3 python3-pip
|
||||
|
||||
# Fedora
|
||||
sudo dnf install python3 python3-pip
|
||||
```
|
||||
|
||||
### Step 4: Set Up Python Environment
|
||||
|
||||
It's recommended to use a virtual environment to keep dependencies isolated.
|
||||
|
||||
```bash
|
||||
# Create a virtual environment
|
||||
python -m venv venv
|
||||
|
||||
# Activate the virtual environment
|
||||
# On Windows:
|
||||
venv\Scripts\activate
|
||||
|
||||
# On macOS/Linux:
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
### Step 5: Install Python Packages
|
||||
|
||||
Install the required data science libraries:
|
||||
|
||||
```bash
|
||||
pip install jupyter pandas numpy matplotlib seaborn scikit-learn
|
||||
```
|
||||
|
||||
### Step 6: Install Node.js and npm (For Quiz App)
|
||||
|
||||
The quiz application requires Node.js and npm.
|
||||
|
||||
**Windows/macOS:**
|
||||
- Download from [nodejs.org](https://nodejs.org/) (LTS version recommended)
|
||||
- Run the installer
|
||||
|
||||
**Linux:**
|
||||
```bash
|
||||
# Debian/Ubuntu
|
||||
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
|
||||
sudo apt-get install -y nodejs
|
||||
|
||||
# Fedora
|
||||
sudo dnf install nodejs
|
||||
|
||||
# Verify installation
|
||||
node --version
|
||||
npm --version
|
||||
```
|
||||
|
||||
### Step 7: Install Quiz App Dependencies
|
||||
|
||||
```bash
|
||||
# Navigate to quiz app directory
|
||||
cd quiz-app
|
||||
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Return to root directory
|
||||
cd ..
|
||||
```
|
||||
|
||||
### Step 8: Install Docsify (Optional)
|
||||
|
||||
For offline access to documentation:
|
||||
|
||||
```bash
|
||||
npm install -g docsify-cli
|
||||
```
|
||||
|
||||
## Verify Your Installation
|
||||
|
||||
### Test Python and Jupyter
|
||||
|
||||
```bash
|
||||
# Activate your virtual environment if not already activated
|
||||
# On Windows:
|
||||
venv\Scripts\activate
|
||||
# On macOS/Linux:
|
||||
source venv/bin/activate
|
||||
|
||||
# Start Jupyter Notebook
|
||||
jupyter notebook
|
||||
```
|
||||
|
||||
Your browser should open with the Jupyter interface. You can now navigate to any lesson's `.ipynb` file.
|
||||
|
||||
### Test Quiz Application
|
||||
|
||||
```bash
|
||||
# Navigate to quiz app
|
||||
cd quiz-app
|
||||
|
||||
# Start development server
|
||||
npm run serve
|
||||
```
|
||||
|
||||
The quiz app should be available at `http://localhost:8080` (or another port if 8080 is busy).
|
||||
|
||||
### Test Documentation Server
|
||||
|
||||
```bash
|
||||
# From the root directory of the repository
|
||||
docsify serve
|
||||
```
|
||||
|
||||
The documentation should be available at `http://localhost:3000`.
|
||||
|
||||
## Using VS Code Dev Containers
|
||||
|
||||
If you have Docker installed, you can use VS Code Dev Containers:
|
||||
|
||||
1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop)
|
||||
2. Install [Visual Studio Code](https://code.visualstudio.com/)
|
||||
3. Install the [Remote - Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
|
||||
4. Open the repository in VS Code
|
||||
5. Press `F1` and select "Remote-Containers: Reopen in Container"
|
||||
6. Wait for the container to build (first time only)
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Explore the [README.md](README.md) for an overview of the curriculum
|
||||
- Read [USAGE.md](USAGE.md) for common workflows and examples
|
||||
- Check [TROUBLESHOOTING.md](TROUBLESHOOTING.md) if you encounter issues
|
||||
- Review [CONTRIBUTING.md](CONTRIBUTING.md) if you want to contribute
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you encounter issues:
|
||||
|
||||
1. Check the [TROUBLESHOOTING.md](TROUBLESHOOTING.md) guide
|
||||
2. Search existing [GitHub Issues](https://github.com/microsoft/Data-Science-For-Beginners/issues)
|
||||
3. Join our [Discord community](https://aka.ms/ds4beginners/discord)
|
||||
4. Create a new issue with detailed information about your problem
|
||||
@ -0,0 +1,611 @@
|
||||
# Troubleshooting Guide
|
||||
|
||||
This guide provides solutions to common issues you might encounter while working with the Data Science for Beginners curriculum.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Python and Jupyter Issues](#python-and-jupyter-issues)
|
||||
- [Package and Dependency Issues](#package-and-dependency-issues)
|
||||
- [Jupyter Notebook Issues](#jupyter-notebook-issues)
|
||||
- [Quiz Application Issues](#quiz-application-issues)
|
||||
- [Git and GitHub Issues](#git-and-github-issues)
|
||||
- [Docsify Documentation Issues](#docsify-documentation-issues)
|
||||
- [Data and File Issues](#data-and-file-issues)
|
||||
- [Performance Issues](#performance-issues)
|
||||
- [Getting Additional Help](#getting-additional-help)
|
||||
|
||||
## Python and Jupyter Issues
|
||||
|
||||
### Python Not Found or Wrong Version
|
||||
|
||||
**Problem:** `python: command not found` or wrong Python version
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Check Python version
|
||||
python --version
|
||||
python3 --version
|
||||
|
||||
# If Python 3 is installed as 'python3', create an alias
|
||||
# On macOS/Linux, add to ~/.bashrc or ~/.zshrc:
|
||||
alias python=python3
|
||||
alias pip=pip3
|
||||
|
||||
# Or use python3 explicitly
|
||||
python3 -m pip install jupyter
|
||||
```
|
||||
|
||||
**Windows Solution:**
|
||||
1. Reinstall Python from [python.org](https://www.python.org/)
|
||||
2. During installation, check "Add Python to PATH"
|
||||
3. Restart your terminal/command prompt
|
||||
|
||||
### Virtual Environment Activation Issues
|
||||
|
||||
**Problem:** Virtual environment won't activate
|
||||
|
||||
**Solution:**
|
||||
|
||||
**Windows:**
|
||||
```bash
|
||||
# If you get execution policy error
|
||||
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||||
|
||||
# Then activate
|
||||
venv\Scripts\activate
|
||||
```
|
||||
|
||||
**macOS/Linux:**
|
||||
```bash
|
||||
# Ensure the activate script is executable
|
||||
chmod +x venv/bin/activate
|
||||
|
||||
# Then activate
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
**Verify activation:**
|
||||
```bash
|
||||
# Your prompt should show (venv)
|
||||
# Check Python location
|
||||
which python # Should point to venv
|
||||
```
|
||||
|
||||
### Jupyter Kernel Issues
|
||||
|
||||
**Problem:** "Kernel not found" or "Kernel keeps dying"
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Reinstall kernel
|
||||
python -m ipykernel install --user --name=datascience --display-name="Python (Data Science)"
|
||||
|
||||
# Or use the default kernel
|
||||
python -m ipykernel install --user
|
||||
|
||||
# Restart Jupyter
|
||||
jupyter notebook
|
||||
```
|
||||
|
||||
**Problem:** Wrong Python version in Jupyter
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Install Jupyter in your virtual environment
|
||||
source venv/bin/activate # Activate first
|
||||
pip install jupyter ipykernel
|
||||
|
||||
# Register the kernel
|
||||
python -m ipykernel install --user --name=venv --display-name="Python (venv)"
|
||||
|
||||
# In Jupyter, select Kernel -> Change kernel -> Python (venv)
|
||||
```
|
||||
|
||||
## Package and Dependency Issues
|
||||
|
||||
### Import Errors
|
||||
|
||||
**Problem:** `ModuleNotFoundError: No module named 'pandas'` (or other packages)
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Ensure virtual environment is activated
|
||||
source venv/bin/activate # macOS/Linux
|
||||
venv\Scripts\activate # Windows
|
||||
|
||||
# Install missing package
|
||||
pip install pandas
|
||||
|
||||
# Install all common packages
|
||||
pip install jupyter pandas numpy matplotlib seaborn scikit-learn
|
||||
|
||||
# Verify installation
|
||||
python -c "import pandas; print(pandas.__version__)"
|
||||
```
|
||||
|
||||
### Pip Installation Failures
|
||||
|
||||
**Problem:** `pip install` fails with permission errors
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Use --user flag
|
||||
pip install --user package-name
|
||||
|
||||
# Or use virtual environment (recommended)
|
||||
python -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install package-name
|
||||
```
|
||||
|
||||
**Problem:** `pip install` fails with SSL certificate errors
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Update pip first
|
||||
python -m pip install --upgrade pip
|
||||
|
||||
# Try installing with trusted host (temporary workaround)
|
||||
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org package-name
|
||||
```
|
||||
|
||||
### Package Version Conflicts
|
||||
|
||||
**Problem:** Incompatible package versions
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Create fresh virtual environment
|
||||
python -m venv venv-new
|
||||
source venv-new/bin/activate # or venv-new\Scripts\activate on Windows
|
||||
|
||||
# Install packages with specific versions if needed
|
||||
pip install pandas==1.3.0
|
||||
pip install numpy==1.21.0
|
||||
|
||||
# Or let pip resolve dependencies
|
||||
pip install jupyter pandas numpy matplotlib seaborn scikit-learn
|
||||
```
|
||||
|
||||
## Jupyter Notebook Issues
|
||||
|
||||
### Jupyter Won't Start
|
||||
|
||||
**Problem:** `jupyter notebook` command not found
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Install Jupyter
|
||||
pip install jupyter
|
||||
|
||||
# Or use python -m
|
||||
python -m jupyter notebook
|
||||
|
||||
# Add to PATH if needed (macOS/Linux)
|
||||
export PATH="$HOME/.local/bin:$PATH"
|
||||
```
|
||||
|
||||
### Notebook Won't Load or Save
|
||||
|
||||
**Problem:** "Notebook failed to load" or save errors
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Check file permissions
|
||||
```bash
|
||||
# Make sure you have write permissions
|
||||
ls -l notebook.ipynb
|
||||
chmod 644 notebook.ipynb # If needed
|
||||
```
|
||||
|
||||
2. Check for file corruption
|
||||
```bash
|
||||
# Try opening in text editor to check JSON structure
|
||||
# Copy content to new notebook if corrupted
|
||||
```
|
||||
|
||||
3. Clear Jupyter cache
|
||||
```bash
|
||||
jupyter notebook --clear-cache
|
||||
```
|
||||
|
||||
### Cell Won't Execute
|
||||
|
||||
**Problem:** Cell stuck on "In [*]" or takes forever
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. **Interrupt the kernel**: Click "Interrupt" button or press `I, I`
|
||||
2. **Restart kernel**: Kernel menu → Restart
|
||||
3. **Check for infinite loops** in your code
|
||||
4. **Clear output**: Cell → All Output → Clear
|
||||
|
||||
### Plots Not Displaying
|
||||
|
||||
**Problem:** `matplotlib` plots don't show in notebook
|
||||
|
||||
**Solution:**
|
||||
|
||||
```python
|
||||
# Add magic command at the top of notebook
|
||||
%matplotlib inline
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Create plot
|
||||
plt.plot([1, 2, 3, 4])
|
||||
plt.show() # Make sure to call show()
|
||||
```
|
||||
|
||||
**Alternative for interactive plots:**
|
||||
```python
|
||||
%matplotlib notebook
|
||||
# Or
|
||||
%matplotlib widget
|
||||
```
|
||||
|
||||
## Quiz Application Issues
|
||||
|
||||
### npm install Fails
|
||||
|
||||
**Problem:** Errors during `npm install`
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Clear npm cache
|
||||
npm cache clean --force
|
||||
|
||||
# Remove node_modules and package-lock.json
|
||||
rm -rf node_modules package-lock.json
|
||||
|
||||
# Reinstall
|
||||
npm install
|
||||
|
||||
# If still failing, try with legacy peer deps
|
||||
npm install --legacy-peer-deps
|
||||
```
|
||||
|
||||
### Quiz App Won't Start
|
||||
|
||||
**Problem:** `npm run serve` fails
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Check Node.js version
|
||||
node --version # Should be 12.x or higher
|
||||
|
||||
# Reinstall dependencies
|
||||
cd quiz-app
|
||||
rm -rf node_modules package-lock.json
|
||||
npm install
|
||||
|
||||
# Try different port
|
||||
npm run serve -- --port 8081
|
||||
```
|
||||
|
||||
### Port Already in Use
|
||||
|
||||
**Problem:** "Port 8080 is already in use"
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Find and kill process on port 8080
|
||||
# macOS/Linux:
|
||||
lsof -ti:8080 | xargs kill -9
|
||||
|
||||
# Windows:
|
||||
netstat -ano | findstr :8080
|
||||
taskkill /PID <PID> /F
|
||||
|
||||
# Or use a different port
|
||||
npm run serve -- --port 8081
|
||||
```
|
||||
|
||||
### Quiz Not Loading or Blank Page
|
||||
|
||||
**Problem:** Quiz app loads but shows blank page
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Check browser console for errors (F12)
|
||||
2. Clear browser cache and cookies
|
||||
3. Try a different browser
|
||||
4. Ensure JavaScript is enabled
|
||||
5. Check for ad blockers interfering
|
||||
|
||||
```bash
|
||||
# Rebuild the app
|
||||
npm run build
|
||||
npm run serve
|
||||
```
|
||||
|
||||
## Git and GitHub Issues
|
||||
|
||||
### Git Not Recognized
|
||||
|
||||
**Problem:** `git: command not found`
|
||||
|
||||
**Solution:**
|
||||
|
||||
**Windows:**
|
||||
- Install Git from [git-scm.com](https://git-scm.com/)
|
||||
- Restart terminal after installation
|
||||
|
||||
**macOS:**
|
||||
```bash
|
||||
# Install via Homebrew
|
||||
brew install git
|
||||
|
||||
# Or install Xcode Command Line Tools
|
||||
xcode-select --install
|
||||
```
|
||||
|
||||
**Linux:**
|
||||
```bash
|
||||
sudo apt-get install git # Debian/Ubuntu
|
||||
sudo dnf install git # Fedora
|
||||
```
|
||||
|
||||
### Clone Fails
|
||||
|
||||
**Problem:** `git clone` fails with authentication errors
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Use HTTPS URL
|
||||
git clone https://github.com/microsoft/Data-Science-For-Beginners.git
|
||||
|
||||
# If you have 2FA enabled on GitHub, use Personal Access Token
|
||||
# Create token at: https://github.com/settings/tokens
|
||||
# Use token as password when prompted
|
||||
```
|
||||
|
||||
### Permission Denied (publickey)
|
||||
|
||||
**Problem:** SSH key authentication fails
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Generate SSH key
|
||||
ssh-keygen -t ed25519 -C "your_email@example.com"
|
||||
|
||||
# Add key to ssh-agent
|
||||
eval "$(ssh-agent -s)"
|
||||
ssh-add ~/.ssh/id_ed25519
|
||||
|
||||
# Add public key to GitHub
|
||||
# Copy key: cat ~/.ssh/id_ed25519.pub
|
||||
# Add at: https://github.com/settings/keys
|
||||
```
|
||||
|
||||
## Docsify Documentation Issues
|
||||
|
||||
### Docsify Command Not Found
|
||||
|
||||
**Problem:** `docsify: command not found`
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Install globally
|
||||
npm install -g docsify-cli
|
||||
|
||||
# If permission error on macOS/Linux
|
||||
sudo npm install -g docsify-cli
|
||||
|
||||
# Verify installation
|
||||
docsify --version
|
||||
|
||||
# If still not found, add npm global path
|
||||
# Find npm global path
|
||||
npm config get prefix
|
||||
|
||||
# Add to PATH (add to ~/.bashrc or ~/.zshrc)
|
||||
export PATH="$PATH:/usr/local/bin"
|
||||
```
|
||||
|
||||
### Documentation Not Loading
|
||||
|
||||
**Problem:** Docsify serves but content doesn't load
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Ensure you're in the repository root
|
||||
cd Data-Science-For-Beginners
|
||||
|
||||
# Check for index.html
|
||||
ls index.html
|
||||
|
||||
# Serve with specific port
|
||||
docsify serve --port 3000
|
||||
|
||||
# Check browser console for errors (F12)
|
||||
```
|
||||
|
||||
### Images Not Displaying
|
||||
|
||||
**Problem:** Images show broken link icon
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Check image paths are relative
|
||||
2. Ensure image files exist in the repository
|
||||
3. Clear browser cache
|
||||
4. Verify file extensions match (case-sensitive on some systems)
|
||||
|
||||
## Data and File Issues
|
||||
|
||||
### File Not Found Errors
|
||||
|
||||
**Problem:** `FileNotFoundError` when loading data
|
||||
|
||||
**Solution:**
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
# Check current working directory
|
||||
print(os.getcwd())
|
||||
|
||||
# Use absolute path
|
||||
data_path = os.path.join(os.getcwd(), 'data', 'filename.csv')
|
||||
df = pd.read_csv(data_path)
|
||||
|
||||
# Or use relative path from notebook location
|
||||
df = pd.read_csv('../data/filename.csv')
|
||||
|
||||
# Verify file exists
|
||||
print(os.path.exists('data/filename.csv'))
|
||||
```
|
||||
|
||||
### CSV Reading Errors
|
||||
|
||||
**Problem:** Errors reading CSV files
|
||||
|
||||
**Solution:**
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
# Try different encodings
|
||||
df = pd.read_csv('file.csv', encoding='utf-8')
|
||||
# or
|
||||
df = pd.read_csv('file.csv', encoding='latin-1')
|
||||
# or
|
||||
df = pd.read_csv('file.csv', encoding='ISO-8859-1')
|
||||
|
||||
# Handle missing values
|
||||
df = pd.read_csv('file.csv', na_values=['NA', 'N/A', ''])
|
||||
|
||||
# Specify delimiter if not comma
|
||||
df = pd.read_csv('file.csv', delimiter=';')
|
||||
```
|
||||
|
||||
### Memory Errors with Large Datasets
|
||||
|
||||
**Problem:** `MemoryError` when loading large files
|
||||
|
||||
**Solution:**
|
||||
|
||||
```python
|
||||
# Read in chunks
|
||||
chunk_size = 10000
|
||||
chunks = []
|
||||
for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size):
|
||||
# Process chunk
|
||||
chunks.append(chunk)
|
||||
df = pd.concat(chunks)
|
||||
|
||||
# Or read specific columns only
|
||||
df = pd.read_csv('file.csv', usecols=['col1', 'col2'])
|
||||
|
||||
# Use more efficient data types
|
||||
df = pd.read_csv('file.csv', dtype={'column_name': 'int32'})
|
||||
```
|
||||
|
||||
## Performance Issues
|
||||
|
||||
### Slow Notebook Performance
|
||||
|
||||
**Problem:** Notebooks run very slowly
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. **Restart kernel and clear output**
|
||||
- Kernel → Restart & Clear Output
|
||||
|
||||
2. **Close unused notebooks**
|
||||
|
||||
3. **Optimize code:**
|
||||
```python
|
||||
# Use vectorized operations instead of loops
|
||||
# Bad:
|
||||
result = []
|
||||
for x in data:
|
||||
result.append(x * 2)
|
||||
|
||||
# Good:
|
||||
result = data * 2 # NumPy/Pandas vectorization
|
||||
```
|
||||
|
||||
4. **Sample large datasets:**
|
||||
```python
|
||||
# Work with sample during development
|
||||
df_sample = df.sample(n=1000) # or df.head(1000)
|
||||
```
|
||||
|
||||
### Browser Crashes
|
||||
|
||||
**Problem:** Browser crashes or becomes unresponsive
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Close unused tabs
|
||||
2. Clear browser cache
|
||||
3. Increase browser memory (Chrome: `chrome://settings/system`)
|
||||
4. Use JupyterLab instead:
|
||||
```bash
|
||||
pip install jupyterlab
|
||||
jupyter lab
|
||||
```
|
||||
|
||||
## Getting Additional Help
|
||||
|
||||
### Before Asking for Help
|
||||
|
||||
1. Check this troubleshooting guide
|
||||
2. Search [GitHub Issues](https://github.com/microsoft/Data-Science-For-Beginners/issues)
|
||||
3. Review [INSTALLATION.md](INSTALLATION.md) and [USAGE.md](USAGE.md)
|
||||
4. Try searching the error message online
|
||||
|
||||
### How to Ask for Help
|
||||
|
||||
When creating an issue or asking for help, include:
|
||||
|
||||
1. **Operating System**: Windows, macOS, or Linux (which distribution)
|
||||
2. **Python Version**: Run `python --version`
|
||||
3. **Error Message**: Copy the complete error message
|
||||
4. **Steps to Reproduce**: What you did before the error occurred
|
||||
5. **What You've Tried**: Solutions you've already attempted
|
||||
|
||||
**Example:**
|
||||
```
|
||||
**Operating System:** macOS 12.0
|
||||
**Python Version:** 3.9.7
|
||||
**Error Message:** ModuleNotFoundError: No module named 'pandas'
|
||||
**Steps to Reproduce:**
|
||||
1. Activated virtual environment
|
||||
2. Started Jupyter notebook
|
||||
3. Tried to import pandas
|
||||
|
||||
**What I've Tried:**
|
||||
- Ran pip install pandas
|
||||
- Restarted Jupyter
|
||||
```
|
||||
|
||||
### Community Resources
|
||||
|
||||
- **GitHub Issues**: [Create an issue](https://github.com/microsoft/Data-Science-For-Beginners/issues/new)
|
||||
- **Discord**: [Join our community](https://aka.ms/ds4beginners/discord)
|
||||
- **Discussions**: [GitHub Discussions](https://github.com/microsoft/Data-Science-For-Beginners/discussions)
|
||||
- **Microsoft Learn**: [Q&A Forums](https://docs.microsoft.com/answers/)
|
||||
|
||||
### Related Documentation
|
||||
|
||||
- [INSTALLATION.md](INSTALLATION.md) - Setup instructions
|
||||
- [USAGE.md](USAGE.md) - How to use the curriculum
|
||||
- [CONTRIBUTING.md](CONTRIBUTING.md) - How to contribute
|
||||
- [README.md](README.md) - Project overview
|
||||
@ -0,0 +1,360 @@
|
||||
# Usage Guide
|
||||
|
||||
This guide provides examples and common workflows for using the Data Science for Beginners curriculum.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [How to Use This Curriculum](#how-to-use-this-curriculum)
|
||||
- [Working with Lessons](#working-with-lessons)
|
||||
- [Working with Jupyter Notebooks](#working-with-jupyter-notebooks)
|
||||
- [Using the Quiz Application](#using-the-quiz-application)
|
||||
- [Common Workflows](#common-workflows)
|
||||
- [Tips for Self-Learners](#tips-for-self-learners)
|
||||
- [Tips for Teachers](#tips-for-teachers)
|
||||
|
||||
## How to Use This Curriculum
|
||||
|
||||
This curriculum is designed to be flexible and can be used in multiple ways:
|
||||
|
||||
- **Self-paced learning**: Work through lessons independently at your own speed
|
||||
- **Classroom instruction**: Use as a structured course with guided instruction
|
||||
- **Study groups**: Learn collaboratively with peers
|
||||
- **Workshop format**: Intensive short-term learning sessions
|
||||
|
||||
## Working with Lessons
|
||||
|
||||
Each lesson follows a consistent structure to maximize learning:
|
||||
|
||||
### Lesson Structure
|
||||
|
||||
1. **Pre-lesson Quiz**: Test your existing knowledge
|
||||
2. **Sketchnote** (Optional): Visual summary of key concepts
|
||||
3. **Video** (Optional): Supplemental video content
|
||||
4. **Written Lesson**: Core concepts and explanations
|
||||
5. **Jupyter Notebook**: Hands-on coding exercises
|
||||
6. **Assignment**: Practice what you've learned
|
||||
7. **Post-lesson Quiz**: Reinforce your understanding
|
||||
|
||||
### Example Workflow for a Lesson
|
||||
|
||||
```bash
|
||||
# 1. Navigate to the lesson directory
|
||||
cd 1-Introduction/01-defining-data-science
|
||||
|
||||
# 2. Read the README.md
|
||||
# Open README.md in your browser or editor
|
||||
|
||||
# 3. Take the pre-lesson quiz
|
||||
# Click the quiz link in the README
|
||||
|
||||
# 4. Open the Jupyter notebook (if available)
|
||||
jupyter notebook
|
||||
|
||||
# 5. Complete the exercises in the notebook
|
||||
|
||||
# 6. Work on the assignment
|
||||
|
||||
# 7. Take the post-lesson quiz
|
||||
```
|
||||
|
||||
## Working with Jupyter Notebooks
|
||||
|
||||
### Starting Jupyter
|
||||
|
||||
```bash
|
||||
# Activate your virtual environment
|
||||
source venv/bin/activate # On macOS/Linux
|
||||
# OR
|
||||
venv\Scripts\activate # On Windows
|
||||
|
||||
# Start Jupyter from the repository root
|
||||
jupyter notebook
|
||||
```
|
||||
|
||||
### Running Notebook Cells
|
||||
|
||||
1. **Execute a cell**: Press `Shift + Enter` or click the "Run" button
|
||||
2. **Execute all cells**: Select "Cell" → "Run All" from the menu
|
||||
3. **Restart kernel**: Select "Kernel" → "Restart" if you encounter issues
|
||||
|
||||
### Example: Working with Data in a Notebook
|
||||
|
||||
```python
|
||||
# Import required libraries
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Load a dataset
|
||||
df = pd.read_csv('data/sample.csv')
|
||||
|
||||
# Explore the data
|
||||
df.head()
|
||||
df.info()
|
||||
df.describe()
|
||||
|
||||
# Create a visualization
|
||||
plt.figure(figsize=(10, 6))
|
||||
plt.plot(df['column_name'])
|
||||
plt.title('Sample Visualization')
|
||||
plt.xlabel('X-axis Label')
|
||||
plt.ylabel('Y-axis Label')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Saving Your Work
|
||||
|
||||
- Jupyter auto-saves periodically
|
||||
- Manually save: Press `Ctrl + S` (or `Cmd + S` on macOS)
|
||||
- Your progress is saved in the `.ipynb` file
|
||||
|
||||
## Using the Quiz Application
|
||||
|
||||
### Running the Quiz App Locally
|
||||
|
||||
```bash
|
||||
# Navigate to quiz app directory
|
||||
cd quiz-app
|
||||
|
||||
# Start the development server
|
||||
npm run serve
|
||||
|
||||
# Access at http://localhost:8080
|
||||
```
|
||||
|
||||
### Taking Quizzes
|
||||
|
||||
1. Pre-lesson quizzes are linked at the top of each lesson
|
||||
2. Post-lesson quizzes are linked at the bottom of each lesson
|
||||
3. Each quiz has 3 questions
|
||||
4. Quizzes are designed to reinforce learning, not to test exhaustively
|
||||
|
||||
### Quiz Numbering
|
||||
|
||||
- Quizzes are numbered 0-39 (40 total quizzes)
|
||||
- Each lesson typically has a pre and post quiz
|
||||
- Quiz URLs include the quiz number: `https://ff-quizzes.netlify.app/en/ds/quiz/0`
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Workflow 1: Complete Beginner Path
|
||||
|
||||
```bash
|
||||
# 1. Set up your environment (see INSTALLATION.md)
|
||||
|
||||
# 2. Start with Lesson 1
|
||||
cd 1-Introduction/01-defining-data-science
|
||||
|
||||
# 3. For each lesson:
|
||||
# - Take pre-lesson quiz
|
||||
# - Read the lesson content
|
||||
# - Work through the notebook
|
||||
# - Complete the assignment
|
||||
# - Take post-lesson quiz
|
||||
|
||||
# 4. Progress through all 20 lessons sequentially
|
||||
```
|
||||
|
||||
### Workflow 2: Topic-Specific Learning
|
||||
|
||||
If you're interested in a specific topic:
|
||||
|
||||
```bash
|
||||
# Example: Focus on Data Visualization
|
||||
cd 3-Data-Visualization
|
||||
|
||||
# Explore lessons 9-13:
|
||||
# - Lesson 9: Visualizing Quantities
|
||||
# - Lesson 10: Visualizing Distributions
|
||||
# - Lesson 11: Visualizing Proportions
|
||||
# - Lesson 12: Visualizing Relationships
|
||||
# - Lesson 13: Meaningful Visualizations
|
||||
```
|
||||
|
||||
### Workflow 3: Project-Based Learning
|
||||
|
||||
```bash
|
||||
# 1. Review the Data Science Lifecycle lessons (14-16)
|
||||
cd 4-Data-Science-Lifecycle
|
||||
|
||||
# 2. Work through a real-world example (Lesson 20)
|
||||
cd ../6-Data-Science-In-Wild/20-Real-World-Examples
|
||||
|
||||
# 3. Apply concepts to your own project
|
||||
```
|
||||
|
||||
### Workflow 4: Cloud-Based Data Science
|
||||
|
||||
```bash
|
||||
# Learn about cloud data science (Lessons 17-19)
|
||||
cd 5-Data-Science-In-Cloud
|
||||
|
||||
# 17: Introduction to Cloud Data Science
|
||||
# 18: Low-Code ML Tools
|
||||
# 19: Azure Machine Learning Studio
|
||||
```
|
||||
|
||||
## Tips for Self-Learners
|
||||
|
||||
### Stay Organized
|
||||
|
||||
```bash
|
||||
# Create a learning journal
|
||||
mkdir my-learning-journal
|
||||
|
||||
# For each lesson, create notes
|
||||
echo "# Lesson 1 Notes" > my-learning-journal/lesson-01-notes.md
|
||||
```
|
||||
|
||||
### Practice Regularly
|
||||
|
||||
- Set aside dedicated time each day or week
|
||||
- Complete at least one lesson per week
|
||||
- Review previous lessons periodically
|
||||
|
||||
### Engage with the Community
|
||||
|
||||
- Join the [Discord community](https://aka.ms/ds4beginners/discord)
|
||||
- Participate in [GitHub Discussions](https://github.com/microsoft/Data-Science-For-Beginners/discussions)
|
||||
- Share your progress and ask questions
|
||||
|
||||
### Build Your Own Projects
|
||||
|
||||
After completing lessons, apply concepts to personal projects:
|
||||
|
||||
```python
|
||||
# Example: Analyze your own dataset
|
||||
import pandas as pd
|
||||
|
||||
# Load your own data
|
||||
my_data = pd.read_csv('my-project/data.csv')
|
||||
|
||||
# Apply techniques learned
|
||||
# - Data cleaning (Lesson 8)
|
||||
# - Exploratory data analysis (Lesson 7)
|
||||
# - Visualization (Lessons 9-13)
|
||||
# - Analysis (Lesson 15)
|
||||
```
|
||||
|
||||
## Tips for Teachers
|
||||
|
||||
### Classroom Setup
|
||||
|
||||
1. Review [for-teachers.md](for-teachers.md) for detailed guidance
|
||||
2. Set up a shared environment (GitHub Classroom or Codespaces)
|
||||
3. Establish a communication channel (Discord, Slack, or Teams)
|
||||
|
||||
### Lesson Planning
|
||||
|
||||
**Suggested 10-Week Schedule:**
|
||||
|
||||
- **Week 1-2**: Introduction (Lessons 1-4)
|
||||
- **Week 3-4**: Working with Data (Lessons 5-8)
|
||||
- **Week 5-6**: Data Visualization (Lessons 9-13)
|
||||
- **Week 7-8**: Data Science Lifecycle (Lessons 14-16)
|
||||
- **Week 9**: Cloud Data Science (Lessons 17-19)
|
||||
- **Week 10**: Real-World Applications & Final Projects (Lesson 20)
|
||||
|
||||
### Running Docsify for Offline Access
|
||||
|
||||
```bash
|
||||
# Serve documentation locally for classroom use
|
||||
docsify serve
|
||||
|
||||
# Students can access at localhost:3000
|
||||
# No internet required after initial setup
|
||||
```
|
||||
|
||||
### Assignment Grading
|
||||
|
||||
- Review student notebooks for completed exercises
|
||||
- Check for understanding through quiz scores
|
||||
- Evaluate final projects using data science lifecycle principles
|
||||
|
||||
### Creating Assignments
|
||||
|
||||
```python
|
||||
# Example custom assignment template
|
||||
"""
|
||||
Assignment: [Topic]
|
||||
|
||||
Objective: [Learning goal]
|
||||
|
||||
Dataset: [Provide or have students find one]
|
||||
|
||||
Tasks:
|
||||
1. Load and explore the dataset
|
||||
2. Clean and prepare the data
|
||||
3. Create at least 3 visualizations
|
||||
4. Perform analysis
|
||||
5. Communicate findings
|
||||
|
||||
Deliverables:
|
||||
- Jupyter notebook with code and explanations
|
||||
- Written summary of findings
|
||||
"""
|
||||
```
|
||||
|
||||
## Working Offline
|
||||
|
||||
### Download Resources
|
||||
|
||||
```bash
|
||||
# Clone the entire repository
|
||||
git clone https://github.com/microsoft/Data-Science-For-Beginners.git
|
||||
|
||||
# Download datasets in advance
|
||||
# Most datasets are included in the repository
|
||||
```
|
||||
|
||||
### Run Documentation Locally
|
||||
|
||||
```bash
|
||||
# Serve with Docsify
|
||||
docsify serve
|
||||
|
||||
# Access at localhost:3000
|
||||
```
|
||||
|
||||
### Run Quiz App Locally
|
||||
|
||||
```bash
|
||||
cd quiz-app
|
||||
npm run serve
|
||||
```
|
||||
|
||||
## Accessing Translated Content
|
||||
|
||||
Translations are available in 40+ languages:
|
||||
|
||||
```bash
|
||||
# Access translated lessons
|
||||
cd translations/fr # French
|
||||
cd translations/es # Spanish
|
||||
cd translations/de # German
|
||||
# ... and many more
|
||||
```
|
||||
|
||||
Each translation maintains the same structure as the English version.
|
||||
|
||||
## Additional Resources
|
||||
|
||||
### Continue Learning
|
||||
|
||||
- [Microsoft Learn](https://docs.microsoft.com/learn/) - Additional learning paths
|
||||
- [Student Hub](https://docs.microsoft.com/learn/student-hub) - Resources for students
|
||||
- [Azure AI Foundry](https://aka.ms/foundry/forum) - Community forum
|
||||
|
||||
### Related Curricula
|
||||
|
||||
- [AI for Beginners](https://aka.ms/ai-beginners)
|
||||
- [ML for Beginners](https://aka.ms/ml-beginners)
|
||||
- [Web Dev for Beginners](https://aka.ms/webdev-beginners)
|
||||
- [Generative AI for Beginners](https://aka.ms/genai-beginners)
|
||||
|
||||
## Getting Help
|
||||
|
||||
- Check [TROUBLESHOOTING.md](TROUBLESHOOTING.md) for common issues
|
||||
- Search [GitHub Issues](https://github.com/microsoft/Data-Science-For-Beginners/issues)
|
||||
- Join our [Discord](https://aka.ms/ds4beginners/discord)
|
||||
- Review [CONTRIBUTING.md](CONTRIBUTING.md) to report issues or contribute
|
||||
Loading…
Reference in new issue