You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
11 KiB
11 KiB
AGENTS.md
Project Overview
Data Science for Beginners is a comprehensive 10-week, 20-lesson curriculum created by Microsoft Azure Cloud Advocates. The repository is a learning resource that teaches foundational data science concepts through project-based lessons, including Jupyter notebooks, interactive quizzes, and hands-on assignments.
Key Technologies:
- Jupyter Notebooks: Primary learning medium using Python 3
- Python Libraries: pandas, numpy, matplotlib for data analysis and visualization
- Vue.js 2: Quiz application (quiz-app folder)
- Docsify: Documentation site generator for offline access
- Node.js/npm: Package management for JavaScript components
- Markdown: All lesson content and documentation
Architecture:
- Multi-language educational repository with extensive translations
- Structured into lesson modules (1-Introduction through 6-Data-Science-In-Wild)
- Each lesson includes README, notebooks, assignments, and quizzes
- Standalone Vue.js quiz application for pre/post-lesson assessments
- GitHub Codespaces and VS Code dev containers support
Setup Commands
Repository Setup
# Clone the repository (if not already cloned)
git clone https://github.com/microsoft/Data-Science-For-Beginners.git
cd Data-Science-For-Beginners
Python Environment Setup
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install common data science libraries (no requirements.txt exists)
pip install jupyter pandas numpy matplotlib seaborn scikit-learn
Quiz Application Setup
# Navigate to quiz app
cd quiz-app
# Install dependencies
npm install
# Start development server
npm run serve
# Build for production
npm run build
# Lint and fix files
npm run lint
Docsify Documentation Server
# Install Docsify globally
npm install -g docsify-cli
# Serve documentation locally
docsify serve
# Documentation will be available at localhost:3000
Visualization Projects Setup
For visualization projects like meaningful-visualizations (lesson 13):
# Navigate to starter or solution folder
cd 3-Data-Visualization/13-meaningful-visualizations/starter
# Install dependencies
npm install
# Start development server
npm run serve
# Build for production
npm run build
# Lint files
npm run lint
Development Workflow
Working with Jupyter Notebooks
- Start Jupyter in the repository root:
jupyter notebook - Navigate to the desired lesson folder
- Open
.ipynbfiles to work through exercises - Notebooks are self-contained with explanations and code cells
- Most notebooks use pandas, numpy, and matplotlib - ensure these are installed
Lesson Structure
Each lesson typically contains:
README.md- Main lesson content with theory and examplesnotebook.ipynb- Hands-on Jupyter notebook exercisesassignment.ipynborassignment.md- Practice assignmentssolution/folder - Solution notebooks and codeimages/folder - Supporting visual materials
Quiz Application Development
- Vue.js 2 application with hot-reload during development
- Quizzes stored in
quiz-app/src/assets/translations/ - Each language has its own translation folder (en, fr, es, etc.)
- Quiz numbering starts at 0 and goes up to 39 (40 quizzes total)
Adding Translations
- Translations go in
translations/folder at repository root - Each language has complete lesson structure mirrored from English
- Automated translation via GitHub Actions (co-op-translator.yml)
Testing Instructions
Quiz Application Testing
cd quiz-app
# Run lint checks
npm run lint
# Test build process
npm run build
# Manual testing: Start dev server and verify quiz functionality
npm run serve
Notebook Testing
- No automated test framework exists for notebooks
- Manual validation: Run all cells in sequence to ensure no errors
- Verify data files are accessible and outputs are generated correctly
- Check that visualizations render properly
Documentation Testing
# Verify Docsify renders correctly
docsify serve
# Check for broken links manually by navigating through content
# Verify all lesson links work in the rendered documentation
Code Quality Checks
# Vue.js projects (quiz-app and visualization projects)
cd quiz-app # or visualization project folder
npm run lint
# Python notebooks - manual verification recommended
# Ensure imports work and cells execute without errors
Code Style Guidelines
Python (Jupyter Notebooks)
- Follow PEP 8 style guidelines for Python code
- Use clear variable names that explain the data being analyzed
- Include markdown cells with explanations before code cells
- Keep code cells focused on single concepts or operations
- Use pandas for data manipulation, matplotlib for visualization
- Common import pattern:
import pandas as pd import numpy as np import matplotlib.pyplot as plt
JavaScript/Vue.js
- Follow Vue.js 2 style guide and best practices
- ESLint configuration in
quiz-app/package.json - Use Vue single-file components (.vue files)
- Maintain component-based architecture
- Run
npm run lintbefore committing changes
Markdown Documentation
- Use clear headings hierarchy (# ## ### etc.)
- Include code blocks with language specifiers
- Add alt text for images
- Link to related lessons and resources
- Keep line lengths reasonable for readability
File Organization
- Lesson content in numbered folders (01-defining-data-science, etc.)
- Solutions in dedicated
solution/subfolders - Translations mirror English structure in
translations/folder - Keep data files in
data/or lesson-specific folders
Build and Deployment
Quiz Application Deployment
cd quiz-app
# Build production version
npm run build
# Output is in dist/ folder
# Deploy dist/ folder to static hosting (Azure Static Web Apps, Netlify, etc.)
Azure Static Web Apps Deployment
The quiz-app can be deployed to Azure Static Web Apps:
- Create Azure Static Web App resource
- Connect to GitHub repository
- Configure build settings:
- App location:
quiz-app - Output location:
dist
- App location:
- GitHub Actions workflow will auto-deploy on push
Documentation Site
# Build PDF from Docsify (optional)
npm run convert
# Docsify documentation is served directly from markdown files
# No build step required for deployment
# Deploy repository to static hosting with Docsify
GitHub Codespaces
- Repository includes dev container configuration
- Codespaces automatically sets up Python and Node.js environment
- Open repository in Codespace via GitHub UI
- All dependencies install automatically
Pull Request Guidelines
Before Submitting
# For Vue.js changes in quiz-app
cd quiz-app
npm run lint
npm run build
# Test changes locally
npm run serve
PR Title Format
- Use clear, descriptive titles
- Format:
[Component] Brief description - Examples:
[Lesson 7] Fix Python notebook import error[Quiz App] Add German translation[Docs] Update README with new prerequisites
Required Checks
- Ensure all code runs without errors
- Verify notebooks execute completely
- Confirm Vue.js apps build successfully
- Check that documentation links work
- Test quiz application if modified
- Verify translations maintain consistent structure
Contribution Guidelines
- Follow existing code style and patterns
- Add explanatory comments for complex logic
- Update relevant documentation
- Test changes across different lesson modules if applicable
- Review the CONTRIBUTING.md file
Additional Notes
Common Libraries Used
- pandas: Data manipulation and analysis
- numpy: Numerical computing
- matplotlib: Data visualization and plotting
- seaborn: Statistical data visualization (some lessons)
- scikit-learn: Machine learning (advanced lessons)
Working with Data Files
- Data files located in
data/folder or lesson-specific directories - Most notebooks expect data files in relative paths
- CSV files are primary data format
- Some lessons use JSON for non-relational data examples
Multilingual Support
- 40+ language translations via automated GitHub Actions
- Translation workflow in
.github/workflows/co-op-translator.yml - Translations in
translations/folder with language codes - Quiz translations in
quiz-app/src/assets/translations/
Development Environment Options
- Local Development: Install Python, Jupyter, Node.js locally
- GitHub Codespaces: Cloud-based instant development environment
- VS Code Dev Containers: Local container-based development
- Binder: Launch notebooks in cloud (if configured)
Lesson Content Guidelines
- Each lesson is standalone but builds on previous concepts
- Pre-lesson quizzes test prior knowledge
- Post-lesson quizzes reinforce learning
- Assignments provide hands-on practice
- Sketchnotes provide visual summaries
Troubleshooting Common Issues
Jupyter Kernel Issues:
# Ensure correct kernel is installed
python -m ipykernel install --user --name=datascience
npm Install Failures:
# Clear npm cache and retry
npm cache clean --force
rm -rf node_modules package-lock.json
npm install
Import Errors in Notebooks:
- Verify all required libraries are installed
- Check Python version compatibility (Python 3.7+ recommended)
- Ensure virtual environment is activated
Docsify Not Loading:
- Verify you're serving from repository root
- Check that
index.htmlexists - Ensure proper network access (port 3000)
Performance Considerations
- Large datasets may take time to load in notebooks
- Visualization rendering can be slow for complex plots
- Vue.js dev server enables hot-reload for quick iteration
- Production builds are optimized and minified
Security Notes
- No sensitive data or credentials should be committed
- Use environment variables for any API keys in cloud lessons
- Azure-related lessons may require Azure account credentials
- Keep dependencies updated for security patches
Contributing to Translations
- Automated translations managed via GitHub Actions
- Manual corrections welcomed for translation accuracy
- Follow existing translation folder structure
- Update quiz links to include language parameter:
?loc=fr - Test translated lessons for proper rendering
Related Resources
- Main curriculum: https://aka.ms/datascience-beginners
- Microsoft Learn: https://docs.microsoft.com/learn/
- Student Hub: https://docs.microsoft.com/learn/student-hub
- Discussion Forum: https://github.com/microsoft/Data-Science-For-Beginners/discussions
- Other Microsoft curricula: ML for Beginners, AI for Beginners, Web Dev for Beginners
Project Maintenance
- Regular updates to keep content current
- Community contributions welcome
- Issues tracked on GitHub
- PRs reviewed by curriculum maintainers
- Monthly content reviews and updates