9.4 KiB
Contributing to Data Science for Beginners
Thank you for your interest in contributing to the Data Science for Beginners curriculum! We welcome contributions from the community.
Table of Contents
- Code of Conduct
- How Can I Contribute?
- Getting Started
- Contribution Guidelines
- Pull Request Process
- Style Guidelines
- Contributor License Agreement
Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
How Can I Contribute?
Reporting Bugs
Before creating bug reports, please check the existing issues to avoid duplicates. When you create a bug report, include as many details as possible:
- Use a clear and descriptive title
- Describe the exact steps to reproduce the problem
- Provide specific examples (code snippets, screenshots)
- Describe the behavior you observed and what you expected
- Include your environment details (OS, Python version, browser)
Suggesting Enhancements
Enhancement suggestions are welcome! When suggesting enhancements:
- Use a clear and descriptive title
- Provide a detailed description of the suggested enhancement
- Explain why this enhancement would be useful
- List any similar features in other projects, if applicable
Contributing to Documentation
Documentation improvements are always appreciated:
- Fix typos and grammatical errors
- Improve clarity of explanations
- Add missing documentation
- Update outdated information
- Add examples or use cases
Contributing Code
We welcome code contributions including:
- New lessons or exercises
- Bug fixes
- Improvements to existing notebooks
- New datasets or examples
- Quiz application enhancements
Getting Started
Prerequisites
Before contributing, ensure you have:
- A GitHub account
- Git installed on your system
- Python 3.7+ and Jupyter installed
- Node.js and npm (for quiz app contributions)
- Familiarity with the curriculum structure
See INSTALLATION.md for detailed setup instructions.
Fork and Clone
- Fork the repository on GitHub
- Clone your fork locally:
git clone https://github.com/YOUR-USERNAME/Data-Science-For-Beginners.git cd Data-Science-For-Beginners - Add upstream remote:
git remote add upstream https://github.com/microsoft/Data-Science-For-Beginners.git
Create a Branch
Create a new branch for your work:
git checkout -b feature/your-feature-name
# or
git checkout -b fix/your-bug-fix
Branch naming conventions:
feature/- New features or lessonsfix/- Bug fixesdocs/- Documentation changesrefactor/- Code refactoring
Contribution Guidelines
For Lesson Content
When contributing lessons or modifying existing ones:
-
Follow the existing structure:
- README.md with lesson content
- Jupyter notebook with exercises
- Assignment (if applicable)
- Link to pre and post quizzes
-
Include these elements:
- Clear learning objectives
- Step-by-step explanations
- Code examples with comments
- Exercises for practice
- Links to additional resources
-
Ensure accessibility:
- Use clear, simple language
- Provide alt text for images
- Include code comments
- Consider different learning styles
For Jupyter Notebooks
-
Clear all outputs before committing:
jupyter nbconvert --clear-output --inplace notebook.ipynb -
Include markdown cells with explanations
-
Use consistent formatting:
# Import libraries at the top import pandas as pd import numpy as np import matplotlib.pyplot as plt # Use meaningful variable names # Add comments for complex operations # Follow PEP 8 style guidelines -
Test your notebook completely before submitting
For Python Code
Follow PEP 8 style guidelines:
# Good practices
import pandas as pd
def calculate_mean(data):
"""Calculate the mean of a dataset.
Args:
data (list): List of numerical values
Returns:
float: Mean of the dataset
"""
return sum(data) / len(data)
For Quiz App Contributions
When modifying the quiz application:
-
Test locally:
cd quiz-app npm install npm run serve -
Run linter:
npm run lint -
Build successfully:
npm run build -
Follow Vue.js style guide and existing patterns
For Translations
When adding or updating translations:
- Follow the structure in
translations/folder - Use the language code as folder name (e.g.,
frfor French) - Maintain the same file structure as English version
- Update quiz links to include language parameter:
?loc=fr - Test all links and formatting
Pull Request Process
Before Submitting
-
Update your branch with latest changes:
git fetch upstream git rebase upstream/main -
Test your changes:
- Run all modified notebooks
- Test quiz app if modified
- Verify all links work
- Check for spelling and grammar errors
-
Commit your changes:
git add . git commit -m "Brief description of changes"Write clear commit messages:
- Use present tense ("Add feature" not "Added feature")
- Use imperative mood ("Move cursor to..." not "Moves cursor to...")
- Limit first line to 72 characters
- Reference issues and pull requests when relevant
-
Push to your fork:
git push origin feature/your-feature-name
Creating the Pull Request
- Go to the repository
- Click "Pull requests" → "New pull request"
- Click "compare across forks"
- Select your fork and branch
- Click "Create pull request"
PR Title Format
Use clear, descriptive titles following this format:
[Component] Brief description
Examples:
[Lesson 7] Fix Python notebook import error[Quiz App] Add German translation[Docs] Update README with new prerequisites[Fix] Correct data path in visualization lesson
PR Description
Include in your PR description:
- What: What changes did you make?
- Why: Why are these changes necessary?
- How: How did you implement the changes?
- Testing: How did you test the changes?
- Screenshots: Include screenshots for visual changes
- Related Issues: Link to related issues (e.g., "Fixes #123")
Review Process
- Automated checks will run on your PR
- Maintainers will review your contribution
- Address feedback by making additional commits
- Once approved, a maintainer will merge your PR
After Your PR is Merged
-
Delete your branch:
git branch -d feature/your-feature-name git push origin --delete feature/your-feature-name -
Update your fork:
git checkout main git pull upstream main git push origin main
Style Guidelines
Markdown
- Use consistent heading levels
- Include blank lines between sections
- Use code blocks with language specifiers:
```python import pandas as pd ``` - Add alt text to images:
 - Keep line lengths reasonable (around 80-100 characters)
Python
- Follow PEP 8 style guide
- Use meaningful variable names
- Add docstrings to functions
- Include type hints where appropriate:
def process_data(df: pd.DataFrame) -> pd.DataFrame: """Process the input dataframe.""" return df
JavaScript/Vue.js
- Follow Vue.js 2 style guide
- Use ESLint configuration provided
- Write modular, reusable components
- Add comments for complex logic
File Organization
- Keep related files together
- Use descriptive file names
- Follow existing directory structure
- Don't commit unnecessary files (.DS_Store, .pyc, node_modules, etc.)
Contributor License Agreement
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
Questions?
- Check our Discord Channel #data-science-for-beginners
- Join our Discord community
- Review existing issues and pull requests
Thank You!
Your contributions make this curriculum better for everyone. Thank you for taking the time to contribute!