Merge pull request #681 from microsoft/copilot/fix-68b23bbf-3e40-4d4b-a759-7d2a26a17f58
[WIP] Add beginner-friendly examplespull/683/head
commit
f61e948df4
@ -0,0 +1,87 @@
|
|||||||
|
"""
|
||||||
|
Hello World - Data Science Style!
|
||||||
|
|
||||||
|
This is your very first data science program. It introduces you to the basic
|
||||||
|
concepts of working with data in Python.
|
||||||
|
|
||||||
|
What you'll learn:
|
||||||
|
- How to create a simple dataset
|
||||||
|
- How to display data
|
||||||
|
- How to work with Python lists and dictionaries
|
||||||
|
- Basic data manipulation
|
||||||
|
|
||||||
|
Prerequisites: Just Python installed on your computer!
|
||||||
|
"""
|
||||||
|
|
||||||
|
# Let's start with the classic "Hello, World!" but with a data science twist
|
||||||
|
print("=" * 50)
|
||||||
|
print("Hello, World of Data Science!")
|
||||||
|
print("=" * 50)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# In data science, we work with data. Let's create our first simple dataset.
|
||||||
|
# We'll use a list to store information about students and their test scores.
|
||||||
|
|
||||||
|
# A list is a collection of items in Python, written with square brackets []
|
||||||
|
students = ["Alice", "Bob", "Charlie", "Diana", "Eve"]
|
||||||
|
scores = [85, 92, 78, 95, 88]
|
||||||
|
|
||||||
|
print("Our Dataset:")
|
||||||
|
print("-" * 50)
|
||||||
|
print("Students:", students)
|
||||||
|
print("Scores:", scores)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Now let's do something useful with this data!
|
||||||
|
# We can find basic statistics about the scores
|
||||||
|
|
||||||
|
# Find the highest score
|
||||||
|
highest_score = max(scores)
|
||||||
|
print(f"📊 Highest score: {highest_score}")
|
||||||
|
|
||||||
|
# Find the lowest score
|
||||||
|
lowest_score = min(scores)
|
||||||
|
print(f"📊 Lowest score: {lowest_score}")
|
||||||
|
|
||||||
|
# Calculate the average score
|
||||||
|
# sum() adds all numbers together, len() tells us how many items we have
|
||||||
|
average_score = sum(scores) / len(scores)
|
||||||
|
print(f"📊 Average score: {average_score:.2f}") # .2f means show 2 decimal places
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Let's find who got the highest score
|
||||||
|
# We use index() to find where the highest_score is in our list
|
||||||
|
top_student_index = scores.index(highest_score)
|
||||||
|
top_student = students[top_student_index]
|
||||||
|
print(f"🏆 Top student: {top_student} with a score of {highest_score}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Now let's organize this data in a more structured way
|
||||||
|
# We'll use a dictionary - it pairs keys (student names) with values (scores)
|
||||||
|
print("Student Scores (organized as key-value pairs):")
|
||||||
|
print("-" * 50)
|
||||||
|
|
||||||
|
# Create a dictionary by pairing students with their scores
|
||||||
|
student_scores = {}
|
||||||
|
for i in range(len(students)):
|
||||||
|
student_scores[students[i]] = scores[i]
|
||||||
|
|
||||||
|
# Display each student and their score
|
||||||
|
for student, score in student_scores.items():
|
||||||
|
# Add a special marker for the top student
|
||||||
|
marker = "⭐" if student == top_student else " "
|
||||||
|
print(f"{marker} {student}: {score} points")
|
||||||
|
|
||||||
|
print()
|
||||||
|
print("=" * 50)
|
||||||
|
print("Congratulations! You've completed your first data science program!")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
# What did we just do?
|
||||||
|
# 1. Created a simple dataset (student names and scores)
|
||||||
|
# 2. Performed basic analysis (max, min, average)
|
||||||
|
# 3. Found insights (who is the top student)
|
||||||
|
# 4. Organized the data in a useful structure (dictionary)
|
||||||
|
#
|
||||||
|
# These are the fundamental building blocks of data science!
|
||||||
|
# Next, you'll learn to work with real datasets using powerful libraries.
|
||||||
@ -0,0 +1,174 @@
|
|||||||
|
"""
|
||||||
|
Simple Data Analysis
|
||||||
|
|
||||||
|
Learn how to analyze data and answer questions about it.
|
||||||
|
This example demonstrates common data analysis operations.
|
||||||
|
|
||||||
|
What you'll learn:
|
||||||
|
- How to calculate statistics on your data
|
||||||
|
- How to filter data based on conditions
|
||||||
|
- How to group and aggregate data
|
||||||
|
- How to sort data
|
||||||
|
|
||||||
|
Prerequisites: pandas library (install with: pip install pandas)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
print("=" * 70)
|
||||||
|
print("Simple Data Analysis Tutorial")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Load a dataset - we'll use the honey production data
|
||||||
|
print("📂 Loading honey production data...")
|
||||||
|
data = pd.read_csv('../data/honey.csv')
|
||||||
|
print("✅ Data loaded!\n")
|
||||||
|
|
||||||
|
# Quick look at the data
|
||||||
|
print("-" * 70)
|
||||||
|
print("FIRST FEW ROWS")
|
||||||
|
print("-" * 70)
|
||||||
|
print(data.head(3))
|
||||||
|
print()
|
||||||
|
|
||||||
|
# SECTION 1: Basic Statistics
|
||||||
|
print("=" * 70)
|
||||||
|
print("SECTION 1: CALCULATING STATISTICS")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Let's look at the 'totalprod' column (total production)
|
||||||
|
if 'totalprod' in data.columns:
|
||||||
|
total_production = data['totalprod']
|
||||||
|
|
||||||
|
print("Total Honey Production Statistics:")
|
||||||
|
print("-" * 70)
|
||||||
|
print(f" Mean (Average): {total_production.mean():,.2f}")
|
||||||
|
print(f" Median (Middle): {total_production.median():,.2f}")
|
||||||
|
print(f" Mode (Most common): {total_production.mode().values[0]:,.2f}")
|
||||||
|
print(f" Std Dev: {total_production.std():,.2f}")
|
||||||
|
print(f" Minimum: {total_production.min():,.2f}")
|
||||||
|
print(f" Maximum: {total_production.max():,.2f}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# SECTION 2: Filtering Data
|
||||||
|
print("=" * 70)
|
||||||
|
print("SECTION 2: FILTERING DATA")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Let's filter the data to show only records from a specific year
|
||||||
|
if 'year' in data.columns:
|
||||||
|
year_to_filter = 2000
|
||||||
|
filtered_data = data[data['year'] == year_to_filter]
|
||||||
|
|
||||||
|
print(f"Showing data for year {year_to_filter}:")
|
||||||
|
print("-" * 70)
|
||||||
|
print(f"Found {len(filtered_data)} records")
|
||||||
|
print()
|
||||||
|
print(filtered_data.head())
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Filter based on multiple conditions
|
||||||
|
if 'totalprod' in data.columns and 'year' in data.columns:
|
||||||
|
# Find records where production was above 10 million pounds after 2010
|
||||||
|
high_production = data[(data['totalprod'] > 10000000) & (data['year'] > 2010)]
|
||||||
|
|
||||||
|
print("High production years (>10M pounds after 2010):")
|
||||||
|
print("-" * 70)
|
||||||
|
print(f"Found {len(high_production)} records")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# SECTION 3: Grouping and Aggregating
|
||||||
|
print("=" * 70)
|
||||||
|
print("SECTION 3: GROUPING AND AGGREGATING DATA")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Group by state and calculate average production
|
||||||
|
if 'state' in data.columns and 'totalprod' in data.columns:
|
||||||
|
# Group the data by state and calculate mean production
|
||||||
|
state_averages = data.groupby('state')['totalprod'].mean()
|
||||||
|
|
||||||
|
# Sort to see which states have highest average production
|
||||||
|
state_averages_sorted = state_averages.sort_values(ascending=False)
|
||||||
|
|
||||||
|
print("Top 10 States by Average Honey Production:")
|
||||||
|
print("-" * 70)
|
||||||
|
for i, (state, avg_prod) in enumerate(state_averages_sorted.head(10).items(), 1):
|
||||||
|
print(f"{i:2d}. {state:20s} {avg_prod:,.0f} pounds")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# SECTION 4: Sorting Data
|
||||||
|
print("=" * 70)
|
||||||
|
print("SECTION 4: SORTING DATA")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
if 'totalprod' in data.columns:
|
||||||
|
# Sort by total production in descending order
|
||||||
|
sorted_data = data.sort_values('totalprod', ascending=False)
|
||||||
|
|
||||||
|
print("Records with Highest Production:")
|
||||||
|
print("-" * 70)
|
||||||
|
# Show the top 5 records
|
||||||
|
columns_to_show = ['state', 'year', 'totalprod'] if all(col in data.columns for col in ['state', 'year', 'totalprod']) else data.columns[:3]
|
||||||
|
print(sorted_data[columns_to_show].head())
|
||||||
|
print()
|
||||||
|
|
||||||
|
# SECTION 5: Counting Values
|
||||||
|
print("=" * 70)
|
||||||
|
print("SECTION 5: COUNTING VALUES")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
if 'state' in data.columns:
|
||||||
|
# Count how many records we have for each state
|
||||||
|
state_counts = data['state'].value_counts()
|
||||||
|
|
||||||
|
print("Number of records per state (top 10):")
|
||||||
|
print("-" * 70)
|
||||||
|
for state, count in state_counts.head(10).items():
|
||||||
|
print(f"{state:20s} {count:3d} records")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# SECTION 6: Answering a Question
|
||||||
|
print("=" * 70)
|
||||||
|
print("SECTION 6: ANSWERING A REAL QUESTION")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Question: Which state had the highest honey production in 2012?
|
||||||
|
if all(col in data.columns for col in ['state', 'year', 'totalprod']):
|
||||||
|
year_2012 = data[data['year'] == 2012]
|
||||||
|
|
||||||
|
if len(year_2012) > 0:
|
||||||
|
# Find the row with maximum production in 2012
|
||||||
|
max_prod_idx = year_2012['totalprod'].idxmax()
|
||||||
|
max_prod_state = year_2012.loc[max_prod_idx, 'state']
|
||||||
|
max_prod_amount = year_2012.loc[max_prod_idx, 'totalprod']
|
||||||
|
|
||||||
|
print("Question: Which state had the highest honey production in 2012?")
|
||||||
|
print("-" * 70)
|
||||||
|
print(f"Answer: {max_prod_state}")
|
||||||
|
print(f"Production: {max_prod_amount:,.0f} pounds")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
print("=" * 70)
|
||||||
|
print("CONGRATULATIONS!")
|
||||||
|
print("=" * 70)
|
||||||
|
print("You've learned how to:")
|
||||||
|
print(" ✓ Calculate basic statistics (mean, median, mode, etc.)")
|
||||||
|
print(" ✓ Filter data based on conditions")
|
||||||
|
print(" ✓ Group data and calculate aggregates")
|
||||||
|
print(" ✓ Sort data to find top/bottom values")
|
||||||
|
print(" ✓ Count occurrences of values")
|
||||||
|
print(" ✓ Answer real questions using data")
|
||||||
|
print()
|
||||||
|
print("Try this yourself:")
|
||||||
|
print(" • Find the state with the lowest average production")
|
||||||
|
print(" • Calculate total production by year")
|
||||||
|
print(" • Find trends over time")
|
||||||
|
print("=" * 70)
|
||||||
@ -0,0 +1,210 @@
|
|||||||
|
"""
|
||||||
|
Basic Data Visualization
|
||||||
|
|
||||||
|
Learn how to create simple, effective visualizations to communicate your findings.
|
||||||
|
Visualizations help you and others understand data at a glance.
|
||||||
|
|
||||||
|
What you'll learn:
|
||||||
|
- How to create bar charts
|
||||||
|
- How to create line plots
|
||||||
|
- How to create pie charts
|
||||||
|
- How to customize and save your visualizations
|
||||||
|
|
||||||
|
Prerequisites:
|
||||||
|
- pandas library (install with: pip install pandas)
|
||||||
|
- matplotlib library (install with: pip install matplotlib)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
|
||||||
|
print("=" * 70)
|
||||||
|
print("Basic Data Visualization Tutorial")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Load data
|
||||||
|
print("📂 Loading honey production data...")
|
||||||
|
data = pd.read_csv('../data/honey.csv')
|
||||||
|
print("✅ Data loaded!\n")
|
||||||
|
|
||||||
|
# For better display, we'll use a subset of the data
|
||||||
|
# Let's focus on a few states in recent years
|
||||||
|
if 'state' in data.columns and 'year' in data.columns:
|
||||||
|
# Get data for a few states in recent years
|
||||||
|
states_to_show = ['CA', 'FL', 'ND', 'SD', 'MT']
|
||||||
|
recent_data = data[(data['year'] >= 2010) & (data['state'].isin(states_to_show))]
|
||||||
|
|
||||||
|
# VISUALIZATION 1: Bar Chart
|
||||||
|
print("=" * 70)
|
||||||
|
print("VISUALIZATION 1: BAR CHART")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
if 'state' in data.columns and 'totalprod' in data.columns:
|
||||||
|
# Calculate average production by state (for top 10 states)
|
||||||
|
state_avg = data.groupby('state')['totalprod'].mean().sort_values(ascending=False).head(10)
|
||||||
|
|
||||||
|
print("Creating a bar chart of average honey production by state...")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Create the figure and axis
|
||||||
|
plt.figure(figsize=(12, 6)) # Width: 12 inches, Height: 6 inches
|
||||||
|
|
||||||
|
# Create the bar chart
|
||||||
|
plt.bar(state_avg.index, state_avg.values, color='gold', edgecolor='orange')
|
||||||
|
|
||||||
|
# Add labels and title
|
||||||
|
plt.xlabel('State', fontsize=12)
|
||||||
|
plt.ylabel('Average Production (pounds)', fontsize=12)
|
||||||
|
plt.title('Top 10 States by Average Honey Production', fontsize=14, fontweight='bold')
|
||||||
|
|
||||||
|
# Rotate x-axis labels for better readability
|
||||||
|
plt.xticks(rotation=45)
|
||||||
|
|
||||||
|
# Add a grid for easier reading (behind the bars)
|
||||||
|
plt.grid(axis='y', alpha=0.3, linestyle='--')
|
||||||
|
|
||||||
|
# Adjust layout to prevent label cutoff
|
||||||
|
plt.tight_layout()
|
||||||
|
|
||||||
|
# Save the figure
|
||||||
|
plt.savefig('bar_chart_example.png', dpi=300, bbox_inches='tight')
|
||||||
|
print("✅ Bar chart saved as 'bar_chart_example.png'")
|
||||||
|
plt.close() # Close to free memory
|
||||||
|
print()
|
||||||
|
|
||||||
|
# VISUALIZATION 2: Line Plot
|
||||||
|
print("=" * 70)
|
||||||
|
print("VISUALIZATION 2: LINE PLOT")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
if 'year' in data.columns and 'totalprod' in data.columns:
|
||||||
|
# Calculate total production by year
|
||||||
|
yearly_production = data.groupby('year')['totalprod'].sum()
|
||||||
|
|
||||||
|
print("Creating a line plot of honey production over time...")
|
||||||
|
print()
|
||||||
|
|
||||||
|
plt.figure(figsize=(12, 6))
|
||||||
|
|
||||||
|
# Create the line plot
|
||||||
|
plt.plot(yearly_production.index, yearly_production.values,
|
||||||
|
marker='o', # Add circular markers at each data point
|
||||||
|
linewidth=2, # Line thickness
|
||||||
|
color='darkorange', # Line color
|
||||||
|
markersize=6, # Size of markers
|
||||||
|
markerfacecolor='gold') # Fill color of markers
|
||||||
|
|
||||||
|
# Add labels and title
|
||||||
|
plt.xlabel('Year', fontsize=12)
|
||||||
|
plt.ylabel('Total Production (pounds)', fontsize=12)
|
||||||
|
plt.title('Honey Production Over Time', fontsize=14, fontweight='bold')
|
||||||
|
|
||||||
|
# Add a grid
|
||||||
|
plt.grid(True, alpha=0.3, linestyle='--')
|
||||||
|
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.savefig('line_plot_example.png', dpi=300, bbox_inches='tight')
|
||||||
|
print("✅ Line plot saved as 'line_plot_example.png'")
|
||||||
|
plt.close()
|
||||||
|
print()
|
||||||
|
|
||||||
|
# VISUALIZATION 3: Pie Chart
|
||||||
|
print("=" * 70)
|
||||||
|
print("VISUALIZATION 3: PIE CHART")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
if 'state' in data.columns and 'totalprod' in data.columns:
|
||||||
|
# Get total production for top 5 states
|
||||||
|
top5_states = data.groupby('state')['totalprod'].sum().sort_values(ascending=False).head(5)
|
||||||
|
|
||||||
|
print("Creating a pie chart of production share (top 5 states)...")
|
||||||
|
print()
|
||||||
|
|
||||||
|
plt.figure(figsize=(10, 8))
|
||||||
|
|
||||||
|
# Create the pie chart
|
||||||
|
colors = ['gold', 'orange', 'lightsalmon', 'lightcoral', 'peachpuff']
|
||||||
|
plt.pie(top5_states.values,
|
||||||
|
labels=top5_states.index, # State names
|
||||||
|
autopct='%1.1f%%', # Show percentages
|
||||||
|
startangle=90, # Start from top
|
||||||
|
colors=colors,
|
||||||
|
explode=(0.1, 0, 0, 0, 0)) # Slightly separate the first slice
|
||||||
|
|
||||||
|
plt.title('Top 5 States Share of Total Honey Production',
|
||||||
|
fontsize=14, fontweight='bold', pad=20)
|
||||||
|
|
||||||
|
plt.savefig('pie_chart_example.png', dpi=300, bbox_inches='tight')
|
||||||
|
print("✅ Pie chart saved as 'pie_chart_example.png'")
|
||||||
|
plt.close()
|
||||||
|
print()
|
||||||
|
|
||||||
|
# VISUALIZATION 4: Multiple Lines on One Plot
|
||||||
|
print("=" * 70)
|
||||||
|
print("VISUALIZATION 4: COMPARING MULTIPLE SERIES")
|
||||||
|
print("=" * 70)
|
||||||
|
print()
|
||||||
|
|
||||||
|
if 'year' in data.columns and 'totalprod' in data.columns and 'state' in data.columns:
|
||||||
|
# Compare production trends for a few states
|
||||||
|
states_to_compare = ['CA', 'ND', 'SD']
|
||||||
|
|
||||||
|
print(f"Creating a comparison plot for states: {', '.join(states_to_compare)}...")
|
||||||
|
print()
|
||||||
|
|
||||||
|
plt.figure(figsize=(12, 6))
|
||||||
|
|
||||||
|
# Plot a line for each state
|
||||||
|
colors_map = {'CA': 'blue', 'ND': 'green', 'SD': 'red'}
|
||||||
|
|
||||||
|
for state in states_to_compare:
|
||||||
|
if state in data['state'].values:
|
||||||
|
state_data = data[data['state'] == state].groupby('year')['totalprod'].sum()
|
||||||
|
plt.plot(state_data.index, state_data.values,
|
||||||
|
marker='o',
|
||||||
|
label=state, # This will appear in the legend
|
||||||
|
linewidth=2,
|
||||||
|
color=colors_map.get(state, 'gray'))
|
||||||
|
|
||||||
|
plt.xlabel('Year', fontsize=12)
|
||||||
|
plt.ylabel('Total Production (pounds)', fontsize=12)
|
||||||
|
plt.title('Honey Production Comparison by State', fontsize=14, fontweight='bold')
|
||||||
|
plt.legend(title='State') # Add a legend
|
||||||
|
plt.grid(True, alpha=0.3, linestyle='--')
|
||||||
|
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.savefig('comparison_plot_example.png', dpi=300, bbox_inches='tight')
|
||||||
|
print("✅ Comparison plot saved as 'comparison_plot_example.png'")
|
||||||
|
plt.close()
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
print("=" * 70)
|
||||||
|
print("CONGRATULATIONS!")
|
||||||
|
print("=" * 70)
|
||||||
|
print("You've learned how to:")
|
||||||
|
print(" ✓ Create bar charts to compare categories")
|
||||||
|
print(" ✓ Create line plots to show trends over time")
|
||||||
|
print(" ✓ Create pie charts to show proportions")
|
||||||
|
print(" ✓ Plot multiple data series on one chart")
|
||||||
|
print(" ✓ Customize colors, labels, and titles")
|
||||||
|
print(" ✓ Save your visualizations as image files")
|
||||||
|
print()
|
||||||
|
print("Your visualizations have been saved in the examples/ folder!")
|
||||||
|
print()
|
||||||
|
print("Try this yourself:")
|
||||||
|
print(" • Change the colors of your charts")
|
||||||
|
print(" • Add more states to the comparison plot")
|
||||||
|
print(" • Create a horizontal bar chart")
|
||||||
|
print(" • Experiment with different chart styles")
|
||||||
|
print()
|
||||||
|
print("Pro tip: Always choose the right chart type for your data:")
|
||||||
|
print(" • Bar charts: Compare categories")
|
||||||
|
print(" • Line plots: Show trends over time")
|
||||||
|
print(" • Pie charts: Show parts of a whole")
|
||||||
|
print(" • Scatter plots: Show relationships between variables")
|
||||||
|
print("=" * 70)
|
||||||
@ -0,0 +1,135 @@
|
|||||||
|
# Beginner-Friendly Data Science Examples
|
||||||
|
|
||||||
|
Welcome to the examples directory! This collection of simple, well-commented examples is designed to help you get started with data science, even if you're a complete beginner.
|
||||||
|
|
||||||
|
## 📚 What You'll Find Here
|
||||||
|
|
||||||
|
Each example is self-contained and includes:
|
||||||
|
- **Clear comments** explaining every step
|
||||||
|
- **Simple, readable code** that demonstrates one concept at a time
|
||||||
|
- **Real-world context** to help you understand when and why to use these techniques
|
||||||
|
- **Expected output** so you know what to look for
|
||||||
|
|
||||||
|
## 🚀 Getting Started
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
Before running these examples, make sure you have:
|
||||||
|
- Python 3.7 or higher installed
|
||||||
|
- Basic understanding of how to run Python scripts
|
||||||
|
|
||||||
|
### Installing Required Libraries
|
||||||
|
```bash
|
||||||
|
pip install pandas numpy matplotlib
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📖 Examples Overview
|
||||||
|
|
||||||
|
### 1. Hello World - Data Science Style
|
||||||
|
**File:** `01_hello_world_data_science.py`
|
||||||
|
|
||||||
|
Your first data science program! Learn how to:
|
||||||
|
- Load a simple dataset
|
||||||
|
- Display basic information about your data
|
||||||
|
- Print your first data science output
|
||||||
|
|
||||||
|
Perfect for absolute beginners who want to see their first data science program in action.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Loading and Exploring Data
|
||||||
|
**File:** `02_loading_data.py`
|
||||||
|
|
||||||
|
Learn the fundamentals of working with data:
|
||||||
|
- Read data from CSV files
|
||||||
|
- View the first few rows of your dataset
|
||||||
|
- Get basic statistics about your data
|
||||||
|
- Understand data types
|
||||||
|
|
||||||
|
This is often the first step in any data science project!
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Simple Data Analysis
|
||||||
|
**File:** `03_simple_analysis.py`
|
||||||
|
|
||||||
|
Perform your first data analysis:
|
||||||
|
- Calculate basic statistics (mean, median, mode)
|
||||||
|
- Find maximum and minimum values
|
||||||
|
- Count occurrences of values
|
||||||
|
- Filter data based on conditions
|
||||||
|
|
||||||
|
See how to answer simple questions about your data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Data Visualization Basics
|
||||||
|
**File:** `04_basic_visualization.py`
|
||||||
|
|
||||||
|
Create your first visualizations:
|
||||||
|
- Make a simple bar chart
|
||||||
|
- Create a line plot
|
||||||
|
- Generate a pie chart
|
||||||
|
- Save your visualizations as images
|
||||||
|
|
||||||
|
Learn to communicate your findings visually!
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. Working with Real Data
|
||||||
|
**File:** `05_real_world_example.py`
|
||||||
|
|
||||||
|
Put it all together with a complete example:
|
||||||
|
- Load real data from the repository
|
||||||
|
- Clean and prepare the data
|
||||||
|
- Perform analysis
|
||||||
|
- Create meaningful visualizations
|
||||||
|
- Draw conclusions
|
||||||
|
|
||||||
|
This example shows you a complete workflow from start to finish.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 How to Use These Examples
|
||||||
|
|
||||||
|
1. **Start from the beginning**: The examples are numbered in order of difficulty. Begin with `01_hello_world_data_science.py` and work your way through.
|
||||||
|
|
||||||
|
2. **Read the comments**: Each file has detailed comments explaining what the code does and why. Read them carefully!
|
||||||
|
|
||||||
|
3. **Experiment**: Try modifying the code. What happens if you change a value? Break things and fix them - that's how you learn!
|
||||||
|
|
||||||
|
4. **Run the code**: Execute each example and observe the output. Compare it with what you expected.
|
||||||
|
|
||||||
|
5. **Build on it**: Once you understand an example, try extending it with your own ideas.
|
||||||
|
|
||||||
|
## 💡 Tips for Beginners
|
||||||
|
|
||||||
|
- **Don't rush**: Take time to understand each example before moving to the next one
|
||||||
|
- **Type the code yourself**: Don't just copy-paste. Typing helps you learn and remember
|
||||||
|
- **Look up unfamiliar concepts**: If you see something you don't understand, search for it online or in the main lessons
|
||||||
|
- **Ask questions**: Join the [discussion forum](https://github.com/microsoft/Data-Science-For-Beginners/discussions) if you need help
|
||||||
|
- **Practice regularly**: Try to code a little bit every day rather than long sessions once a week
|
||||||
|
|
||||||
|
## 🔗 Next Steps
|
||||||
|
|
||||||
|
After completing these examples, you're ready to:
|
||||||
|
- Work through the main curriculum lessons
|
||||||
|
- Try the assignments in each lesson folder
|
||||||
|
- Explore the Jupyter notebooks for more in-depth learning
|
||||||
|
- Create your own data science projects
|
||||||
|
|
||||||
|
## 📚 Additional Resources
|
||||||
|
|
||||||
|
- [Main Curriculum](../README.md) - The complete 20-lesson course
|
||||||
|
- [For Teachers](../for-teachers.md) - Using this curriculum in your classroom
|
||||||
|
- [Microsoft Learn](https://docs.microsoft.com/learn/) - Free online learning resources
|
||||||
|
- [Python Documentation](https://docs.python.org/3/) - Official Python reference
|
||||||
|
|
||||||
|
## 🤝 Contributing
|
||||||
|
|
||||||
|
Found a bug or have an idea for a new example? We welcome contributions! Please see our [Contributing Guide](../CONTRIBUTING.md).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Happy Learning! 🎉**
|
||||||
|
|
||||||
|
Remember: Every expert was once a beginner. Take it one step at a time, and don't be afraid to make mistakes - they're part of the learning process!
|
||||||
Loading…
Reference in new issue