You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Data-Science-For-Beginners/AGENTS.md

11 KiB

AGENTS.md

Project Overview

Data Science for Beginners is a comprehensive 10-week, 20-lesson curriculum created by Microsoft Azure Cloud Advocates. The repository is a learning resource that teaches foundational data science concepts through project-based lessons, including Jupyter notebooks, interactive quizzes, and hands-on assignments.

Key Technologies:

  • Jupyter Notebooks: Primary learning medium using Python 3
  • Python Libraries: pandas, numpy, matplotlib for data analysis and visualization
  • Vue.js 2: Quiz application (quiz-app folder)
  • Docsify: Documentation site generator for offline access
  • Node.js/npm: Package management for JavaScript components
  • Markdown: All lesson content and documentation

Architecture:

  • Multi-language educational repository with extensive translations
  • Structured into lesson modules (1-Introduction through 6-Data-Science-In-Wild)
  • Each lesson includes README, notebooks, assignments, and quizzes
  • Standalone Vue.js quiz application for pre/post-lesson assessments
  • GitHub Codespaces and VS Code dev containers support

Setup Commands

Repository Setup

# Clone the repository (if not already cloned)
git clone https://github.com/microsoft/Data-Science-For-Beginners.git
cd Data-Science-For-Beginners

Python Environment Setup

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install common data science libraries (no requirements.txt exists)
pip install jupyter pandas numpy matplotlib seaborn scikit-learn

Quiz Application Setup

# Navigate to quiz app
cd quiz-app

# Install dependencies
npm install

# Start development server
npm run serve

# Build for production
npm run build

# Lint and fix files
npm run lint

Docsify Documentation Server

# Install Docsify globally
npm install -g docsify-cli

# Serve documentation locally
docsify serve

# Documentation will be available at localhost:3000

Visualization Projects Setup

For visualization projects like meaningful-visualizations (lesson 13):

# Navigate to starter or solution folder
cd 3-Data-Visualization/13-meaningful-visualizations/starter

# Install dependencies
npm install

# Start development server
npm run serve

# Build for production
npm run build

# Lint files
npm run lint

Development Workflow

Working with Jupyter Notebooks

  1. Start Jupyter in the repository root: jupyter notebook
  2. Navigate to the desired lesson folder
  3. Open .ipynb files to work through exercises
  4. Notebooks are self-contained with explanations and code cells
  5. Most notebooks use pandas, numpy, and matplotlib - ensure these are installed

Lesson Structure

Each lesson typically contains:

  • README.md - Main lesson content with theory and examples
  • notebook.ipynb - Hands-on Jupyter notebook exercises
  • assignment.ipynb or assignment.md - Practice assignments
  • solution/ folder - Solution notebooks and code
  • images/ folder - Supporting visual materials

Quiz Application Development

  • Vue.js 2 application with hot-reload during development
  • Quizzes stored in quiz-app/src/assets/translations/
  • Each language has its own translation folder (en, fr, es, etc.)
  • Quiz numbering starts at 0 and goes up to 39 (40 quizzes total)

Adding Translations

  • Translations go in translations/ folder at repository root
  • Each language has complete lesson structure mirrored from English
  • Automated translation via GitHub Actions (co-op-translator.yml)

Testing Instructions

Quiz Application Testing

cd quiz-app

# Run lint checks
npm run lint

# Test build process
npm run build

# Manual testing: Start dev server and verify quiz functionality
npm run serve

Notebook Testing

  • No automated test framework exists for notebooks
  • Manual validation: Run all cells in sequence to ensure no errors
  • Verify data files are accessible and outputs are generated correctly
  • Check that visualizations render properly

Documentation Testing

# Verify Docsify renders correctly
docsify serve

# Check for broken links manually by navigating through content
# Verify all lesson links work in the rendered documentation

Code Quality Checks

# Vue.js projects (quiz-app and visualization projects)
cd quiz-app  # or visualization project folder
npm run lint

# Python notebooks - manual verification recommended
# Ensure imports work and cells execute without errors

Code Style Guidelines

Python (Jupyter Notebooks)

  • Follow PEP 8 style guidelines for Python code
  • Use clear variable names that explain the data being analyzed
  • Include markdown cells with explanations before code cells
  • Keep code cells focused on single concepts or operations
  • Use pandas for data manipulation, matplotlib for visualization
  • Common import pattern:
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    

JavaScript/Vue.js

  • Follow Vue.js 2 style guide and best practices
  • ESLint configuration in quiz-app/package.json
  • Use Vue single-file components (.vue files)
  • Maintain component-based architecture
  • Run npm run lint before committing changes

Markdown Documentation

  • Use clear headings hierarchy (# ## ### etc.)
  • Include code blocks with language specifiers
  • Add alt text for images
  • Link to related lessons and resources
  • Keep line lengths reasonable for readability

File Organization

  • Lesson content in numbered folders (01-defining-data-science, etc.)
  • Solutions in dedicated solution/ subfolders
  • Translations mirror English structure in translations/ folder
  • Keep data files in data/ or lesson-specific folders

Build and Deployment

Quiz Application Deployment

cd quiz-app

# Build production version
npm run build

# Output is in dist/ folder
# Deploy dist/ folder to static hosting (Azure Static Web Apps, Netlify, etc.)

Azure Static Web Apps Deployment

The quiz-app can be deployed to Azure Static Web Apps:

  1. Create Azure Static Web App resource
  2. Connect to GitHub repository
  3. Configure build settings:
    • App location: quiz-app
    • Output location: dist
  4. GitHub Actions workflow will auto-deploy on push

Documentation Site

# Build PDF from Docsify (optional)
npm run convert

# Docsify documentation is served directly from markdown files
# No build step required for deployment
# Deploy repository to static hosting with Docsify

GitHub Codespaces

  • Repository includes dev container configuration
  • Codespaces automatically sets up Python and Node.js environment
  • Open repository in Codespace via GitHub UI
  • All dependencies install automatically

Pull Request Guidelines

Before Submitting

# For Vue.js changes in quiz-app
cd quiz-app
npm run lint
npm run build

# Test changes locally
npm run serve

PR Title Format

  • Use clear, descriptive titles
  • Format: [Component] Brief description
  • Examples:
    • [Lesson 7] Fix Python notebook import error
    • [Quiz App] Add German translation
    • [Docs] Update README with new prerequisites

Required Checks

  • Ensure all code runs without errors
  • Verify notebooks execute completely
  • Confirm Vue.js apps build successfully
  • Check that documentation links work
  • Test quiz application if modified
  • Verify translations maintain consistent structure

Contribution Guidelines

  • Follow existing code style and patterns
  • Add explanatory comments for complex logic
  • Update relevant documentation
  • Test changes across different lesson modules if applicable
  • Review the CONTRIBUTING.md file

Additional Notes

Common Libraries Used

  • pandas: Data manipulation and analysis
  • numpy: Numerical computing
  • matplotlib: Data visualization and plotting
  • seaborn: Statistical data visualization (some lessons)
  • scikit-learn: Machine learning (advanced lessons)

Working with Data Files

  • Data files located in data/ folder or lesson-specific directories
  • Most notebooks expect data files in relative paths
  • CSV files are primary data format
  • Some lessons use JSON for non-relational data examples

Multilingual Support

  • 40+ language translations via automated GitHub Actions
  • Translation workflow in .github/workflows/co-op-translator.yml
  • Translations in translations/ folder with language codes
  • Quiz translations in quiz-app/src/assets/translations/

Development Environment Options

  1. Local Development: Install Python, Jupyter, Node.js locally
  2. GitHub Codespaces: Cloud-based instant development environment
  3. VS Code Dev Containers: Local container-based development
  4. Binder: Launch notebooks in cloud (if configured)

Lesson Content Guidelines

  • Each lesson is standalone but builds on previous concepts
  • Pre-lesson quizzes test prior knowledge
  • Post-lesson quizzes reinforce learning
  • Assignments provide hands-on practice
  • Sketchnotes provide visual summaries

Troubleshooting Common Issues

Jupyter Kernel Issues:

# Ensure correct kernel is installed
python -m ipykernel install --user --name=datascience

npm Install Failures:

# Clear npm cache and retry
npm cache clean --force
rm -rf node_modules package-lock.json
npm install

Import Errors in Notebooks:

  • Verify all required libraries are installed
  • Check Python version compatibility (Python 3.7+ recommended)
  • Ensure virtual environment is activated

Docsify Not Loading:

  • Verify you're serving from repository root
  • Check that index.html exists
  • Ensure proper network access (port 3000)

Performance Considerations

  • Large datasets may take time to load in notebooks
  • Visualization rendering can be slow for complex plots
  • Vue.js dev server enables hot-reload for quick iteration
  • Production builds are optimized and minified

Security Notes

  • No sensitive data or credentials should be committed
  • Use environment variables for any API keys in cloud lessons
  • Azure-related lessons may require Azure account credentials
  • Keep dependencies updated for security patches

Contributing to Translations

  • Automated translations managed via GitHub Actions
  • Manual corrections welcomed for translation accuracy
  • Follow existing translation folder structure
  • Update quiz links to include language parameter: ?loc=fr
  • Test translated lessons for proper rendering

Project Maintenance

  • Regular updates to keep content current
  • Community contributions welcome
  • Issues tracked on GitHub
  • PRs reviewed by curriculum maintainers
  • Monthly content reviews and updates