You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

16 KiB

Data Science in the Real World

 Sketchnote by (@sketchthedocs)
Data Science In The Real World - Sketchnote by @nitya

We're nearing the end of this learning journey!

We began by defining data science and ethics, explored various tools and techniques for data analysis and visualization, reviewed the data science lifecycle, and examined how to scale and automate workflows using cloud computing services. Now, you might be wondering: "How do I apply all these learnings to real-world scenarios?"

In this lesson, we'll delve into real-world applications of data science across industries and explore specific examples in research, digital humanities, and sustainability. We'll also discuss student project opportunities and wrap up with resources to help you continue your learning journey.

Pre-Lecture Quiz

Pre-lecture quiz

Data Science + Industry

The democratization of AI has made it easier for developers to design and integrate AI-driven decision-making and data-driven insights into user experiences and development workflows. Here are some examples of how data science is applied in real-world industry contexts:

  • Google Flu Trends used data science to correlate search terms with flu trends. Although the approach had flaws, it highlighted the potential (and challenges) of data-driven healthcare predictions.

  • UPS Routing Predictions - explains how UPS uses data science and machine learning to predict optimal delivery routes, factoring in weather conditions, traffic patterns, delivery deadlines, and more.

  • NYC Taxicab Route Visualization - data obtained through Freedom Of Information Laws was used to visualize a day in the life of NYC cabs, providing insights into navigation patterns, earnings, and trip durations over a 24-hour period.

  • Uber Data Science Workbench - leverages data from millions of daily Uber trips (pickup/dropoff locations, trip durations, preferred routes, etc.) to build analytics tools for pricing, safety, fraud detection, and navigation decisions.

  • Sports Analytics - focuses on predictive analytics (team and player analysis, like Moneyball, and fan management) and data visualization (team dashboards, fan engagement, etc.) with applications in talent scouting, sports betting, and venue management.

  • Data Science in Banking - demonstrates the value of data science in finance, with applications like risk modeling, fraud detection, customer segmentation, real-time predictions, and recommender systems. Predictive analytics also play a key role in measures like credit scores.

  • Data Science in Healthcare - showcases applications such as medical imaging (e.g., MRI, X-Ray, CT-Scan), genomics (DNA sequencing), drug development (risk assessment, success prediction), predictive analytics (patient care and logistics), disease tracking, and prevention.

Data Science Applications in The Real World Image Credit: Data Flair: 6 Amazing Data Science Applications

The figure highlights other domains and examples of data science applications. Interested in exploring more? Check out the Review & Self Study section below.

Data Science + Research

 Sketchnote by (@sketchthedocs)
Data Science & Research - Sketchnote by @nitya

While industry applications often focus on large-scale use cases, research projects can offer two key benefits:

  • Innovation opportunities - enabling rapid prototyping of advanced concepts and testing user experiences for next-generation applications.
  • Deployment challenges - investigating potential harms or unintended consequences of data science technologies in real-world contexts.

For students, research projects provide learning and collaboration opportunities that deepen understanding and foster connections with experts in areas of interest. So, what do research projects look like, and how can they make an impact?

Consider the MIT Gender Shades Study by Joy Buolamwini (MIT Media Labs), co-authored with Timnit Gebru (then at Microsoft Research). This study focused on:

  • What: Evaluating bias in automated facial analysis algorithms and datasets based on gender and skin type.
  • Why: Facial analysis is used in critical areas like law enforcement, airport security, and hiring systems, where inaccuracies (e.g., due to bias) can lead to economic and social harm. Addressing bias is essential for fairness.
  • How: Researchers identified that existing benchmarks predominantly featured lighter-skinned subjects. They curated a new dataset (1000+ images) balanced by gender and skin type to evaluate the accuracy of three gender classification products (Microsoft, IBM, Face++).

Results revealed that while overall accuracy was good, error rates varied significantly across subgroups, with misgendering being higher for females and individuals with darker skin tones, indicating bias.

Key Outcomes: The study emphasized the need for more representative datasets (balanced subgroups) and inclusive teams (diverse backgrounds) to identify and address biases early in AI solutions. Such research has influenced organizations to adopt principles and practices for responsible AI to enhance fairness in their AI products and processes.

Interested in relevant research at Microsoft?

Data Science + Humanities

 Sketchnote by (@sketchthedocs)
Data Science & Digital Humanities - Sketchnote by @nitya

Digital Humanities is defined as "a collection of practices and approaches combining computational methods with humanistic inquiry." Stanford projects like "rebooting history" and "poetic thinking" illustrate the connection between Digital Humanities and Data Science, emphasizing techniques like network analysis, information visualization, spatial analysis, and text analysis to uncover new insights from historical and literary datasets.

Want to explore a project in this space?

Check out "Emily Dickinson and the Meter of Mood" by Jen Looper. This project uses data science to revisit familiar poetry and explore its meaning in new contexts. For example, can we predict the season in which a poem was written by analyzing its tone or sentiment? What might this reveal about the author's state of mind during that time?

To answer this, follow the steps of the data science lifecycle:

This workflow allows you to explore seasonal impacts on poem sentiment and develop your own perspectives on the author. Try it out, then extend the notebook to ask new questions or visualize the data differently!

Use tools from the Digital Humanities toolkit to pursue similar inquiries.

Data Science + Sustainability

 Sketchnote by (@sketchthedocs)
Data Science & Sustainability - Sketchnote by @nitya

The 2030 Agenda For Sustainable Development, adopted by all United Nations members in 2015, outlines 17 goals, including those aimed at Protecting the Planet from degradation and climate change. The Microsoft Sustainability initiative supports these goals by leveraging technology to build a more sustainable future, focusing on four key objectives: being carbon negative, water positive, zero waste, and bio-diverse by 2030.

Addressing these challenges requires large-scale data and cloud-based solutions. The Planetary Computer initiative provides four components to assist data scientists and developers:

  • Data Catalog - Offers petabytes of Earth Systems data, free and hosted on Azure.

  • Planetary API - Enables users to search for relevant data across space and time.

  • Hub - A managed environment for processing massive geospatial datasets.

  • Applications - Showcases tools and use cases for sustainability insights. The Planetary Computer Project is currently in preview (as of Sep 2021) - here's how you can start contributing to sustainability solutions using data science.

  • Request access to begin exploring and connect with others in the community.

  • Explore documentation to learn about the available datasets and APIs.

  • Check out applications like Ecosystem Monitoring for ideas and inspiration.

Consider how you can use data visualization to highlight or enhance insights into issues like climate change and deforestation. Or think about how these insights can be leveraged to design new user experiences that encourage behavioral changes for more sustainable living.

Data Science + Students

We've discussed real-world applications in industry and research, and looked at examples of data science applications in digital humanities and sustainability. So how can you develop your skills and share your knowledge as data science beginners?

Here are some examples of student data science projects to inspire you:

🚀 Challenge

Look for articles that suggest beginner-friendly data science projects, such as these 50 topic areas, these 21 project ideas, or these 16 projects with source code that you can analyze and adapt. Don't forget to blog about your learning experiences and share your insights with the community.

Post-Lecture Quiz

Post-lecture quiz

Review & Self Study

Want to dive deeper into use cases? Here are some relevant articles:

Assignment

Explore A Planetary Computer Dataset


Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.