|
|
10 months ago | |
|---|---|---|
| .. | ||
| README.md | 10 months ago | |
| assignment.md | 10 months ago | |
README.md
Data Science in the Real World
![]() |
|---|
| Data Science In The Real World - Sketchnote by @nitya |
We're nearing the end of this learning journey!
We began by defining data science and ethics, explored various tools and techniques for data analysis and visualization, reviewed the data science lifecycle, and examined how to scale and automate workflows using cloud computing services. Now, you might be wondering: "How do I apply all these learnings to real-world scenarios?"
In this lesson, we'll delve into real-world applications of data science across industries and explore specific examples in research, digital humanities, and sustainability. We'll also discuss student project opportunities and wrap up with resources to help you continue your learning journey.
Pre-Lecture Quiz
Pre-lecture quiz
Data Science + Industry
The democratization of AI has made it easier for developers to design and integrate AI-driven decision-making and data-driven insights into user experiences and development workflows. Here are some examples of how data science is applied in real-world industry contexts:
-
Google Flu Trends used data science to correlate search terms with flu trends. Although the approach had flaws, it highlighted the potential (and challenges) of data-driven healthcare predictions.
-
UPS Routing Predictions - explains how UPS uses data science and machine learning to predict optimal delivery routes, factoring in weather conditions, traffic patterns, delivery deadlines, and more.
-
NYC Taxicab Route Visualization - data obtained through Freedom Of Information Laws was used to visualize a day in the life of NYC cabs, providing insights into navigation patterns, earnings, and trip durations over a 24-hour period.
-
Uber Data Science Workbench - leverages data from millions of daily Uber trips (pickup/dropoff locations, trip durations, preferred routes, etc.) to build analytics tools for pricing, safety, fraud detection, and navigation decisions.
-
Sports Analytics - focuses on predictive analytics (team and player analysis, like Moneyball, and fan management) and data visualization (team dashboards, fan engagement, etc.) with applications in talent scouting, sports betting, and venue management.
-
Data Science in Banking - showcases the role of data science in finance, including risk modeling, fraud detection, customer segmentation, real-time predictions, and recommender systems. Predictive analytics also drive critical metrics like credit scores.
-
Data Science in Healthcare - highlights applications such as medical imaging (MRI, X-Ray, CT-Scan), genomics (DNA sequencing), drug development (risk assessment, success prediction), predictive analytics (patient care and logistics), disease tracking, and prevention.
Image Credit: Data Flair: 6 Amazing Data Science Applications
The figure illustrates other domains and examples of applying data science techniques. Interested in exploring more applications? Check out the Review & Self Study section below.
Data Science + Research
![]() |
|---|
| Data Science & Research - Sketchnote by @nitya |
While industry applications often focus on large-scale use cases, research projects can be valuable for two reasons:
- Innovation opportunities - enable rapid prototyping of advanced concepts and testing of user experiences for next-generation applications.
- Deployment challenges - investigate potential harms or unintended consequences of data science technologies in real-world scenarios.
For students, research projects offer learning and collaboration opportunities that deepen understanding and foster connections with experts or teams working in areas of interest. So, what do research projects look like, and how can they make an impact?
Consider the MIT Gender Shades Study by Joy Buolamwini (MIT Media Labs) and Timnit Gebru (then at Microsoft Research), which produced a signature research paper:
- What: The study aimed to evaluate bias in automated facial analysis algorithms and datasets based on gender and skin type.
- Why: Facial analysis is used in contexts like law enforcement, airport security, and hiring systems, where inaccurate classifications (e.g., due to bias) can lead to economic and social harm. Addressing bias is crucial for fairness.
- How: Researchers noted that existing benchmarks predominantly featured lighter-skinned subjects. They curated a new dataset (1000+ images) balanced by gender and skin type, which was used to assess the accuracy of three gender classification products (Microsoft, IBM, Face++).
Results revealed that while overall accuracy was good, error rates varied significantly across subgroups, with misgendering being higher for females and individuals with darker skin tones, indicating bias.
Key Outcomes: The study underscored the need for more representative datasets (balanced subgroups) and inclusive teams (diverse backgrounds) to identify and address biases early in AI solutions. Such research also informs principles and practices for responsible AI to enhance fairness in AI products and processes.
Want to learn about relevant research efforts at Microsoft?
- Explore Microsoft Research Projects on Artificial Intelligence.
- Check out student projects from Microsoft Research Data Science Summer School.
- Learn about the Fairlearn project and Responsible AI initiatives.
Data Science + Humanities
![]() |
|---|
| Data Science & Digital Humanities - Sketchnote by @nitya |
Digital Humanities is defined as "a collection of practices and approaches combining computational methods with humanistic inquiry." Stanford projects like "rebooting history" and "poetic thinking" demonstrate the connection between Digital Humanities and Data Science, emphasizing techniques like network analysis, information visualization, spatial analysis, and text analysis to revisit historical and literary datasets for new insights.
Want to explore and extend a project in this field?
Check out "Emily Dickinson and the Meter of Mood" by Jen Looper, which uses data science to reinterpret familiar poetry and examine its meaning in new contexts. For example, can we predict the season in which a poem was written by analyzing its tone or sentiment? What does this reveal about the author's mindset during that time?
To answer this, follow the steps of the data science lifecycle:
Data Acquisition- collect a relevant dataset for analysis using APIs (e.g., Poetry DB API) or web scraping tools (e.g., Project Gutenberg).Data Cleaning- format, sanitize, and simplify text using tools like Visual Studio Code and Microsoft Excel.Data Analysis- import the dataset into "Notebooks" for analysis using Python packages (e.g., pandas, numpy, matplotlib) to organize and visualize the data.Sentiment Analysis- integrate cloud services like Text Analytics and use low-code tools like Power Automate for automated workflows.
This workflow allows you to explore seasonal impacts on poem sentiment and develop your own interpretations of the author. Try it yourself, then extend the notebook to ask new questions or visualize the data differently!
Use tools from the Digital Humanities toolkit to pursue these inquiries.
Data Science + Sustainability
![]() |
|---|
| Data Science & Sustainability - Sketchnote by @nitya |
The 2030 Agenda For Sustainable Development, adopted by all United Nations members in 2015, outlines 17 goals, including those aimed at Protecting the Planet from degradation and climate change. The Microsoft Sustainability initiative supports these goals by exploring technology solutions to build a more sustainable future, focusing on four key objectives: being carbon negative, water positive, zero waste, and bio-diverse by 2030.
Addressing these challenges at scale requires cloud-scale thinking and large datasets. The Planetary Computer initiative offers four components to assist data scientists and developers:
-
Data Catalog - provides petabytes of Earth Systems data (free and Azure-hosted).
-
Planetary API - enables users to search for relevant data across space and time.
-
Hub - a managed environment for processing massive geospatial datasets.
-
Applications - showcases use cases and tools for sustainability insights. The Planetary Computer Project is currently in preview (as of Sep 2021) - here's how you can start contributing to sustainability solutions using data science.
-
Request access to begin exploring and connect with others.
-
Explore documentation to learn about supported datasets and APIs.
-
Check out applications like Ecosystem Monitoring for inspiration on project ideas.
Consider how you can use data visualization to highlight or amplify insights related to issues like climate change and deforestation. Or think about how these insights can be leveraged to design new user experiences that encourage behavioral changes for more sustainable living.
Data Science + Students
We've discussed real-world applications in industry and research, and looked at examples of data science applications in digital humanities and sustainability. So how can you develop your skills and share your knowledge as beginners in data science?
Here are some examples of student data science projects to inspire you:
- MSR Data Science Summer School with GitHub projects exploring topics such as:
- Digitizing Material Culture: Exploring socio-economic distributions in Sirkap - by Ornella Altunyan and her team at Claremont, using ArcGIS StoryMaps.
🚀 Challenge
Look for articles that suggest beginner-friendly data science projects - like these 50 topic areas, these 21 project ideas, or these 16 projects with source code that you can analyze and adapt. And don't forget to blog about your learning experiences and share your insights with the community.
Post-Lecture Quiz
Post-lecture quiz
Review & Self Study
Want to dive deeper into use cases? Here are some relevant articles:
- 17 Data Science Applications and Examples - Jul 2021
- 11 Breathtaking Data Science Applications in Real World - May 2021
- Data Science In The Real World - Article Collection
- Data Science In: Education, Agriculture, Finance, Movies & more.
Assignment
Explore A Planetary Computer Dataset
Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.



