{ "nbformat": 4, "nbformat_minor": 2, "metadata": { "colab": { "name": "lesson_2-R.ipynb", "provenance": [], "collapsed_sections": [], "toc_visible": true }, "kernelspec": { "name": "ir", "display_name": "R" }, "language_info": { "name": "R" }, "coopTranslator": { "original_hash": "f3c335f9940cfd76528b3ef918b9b342", "translation_date": "2025-09-06T13:52:52+00:00", "source_file": "2-Regression/2-Data/solution/R/lesson_2-R.ipynb", "language_code": "sw" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Jenga mfano wa regression: andaa na onyesha data\n", "\n", "## **Regression ya Mstari kwa Malenge - Somo la 2**\n", "#### Utangulizi\n", "\n", "Sasa kwa kuwa umejiandaa na zana unazohitaji kuanza kujenga mifano ya kujifunza mashine kwa kutumia Tidymodels na Tidyverse, uko tayari kuanza kuuliza maswali kuhusu data yako. Unapofanya kazi na data na kutumia suluhisho za ML, ni muhimu sana kuelewa jinsi ya kuuliza swali sahihi ili kufungua uwezo wa dataset yako ipasavyo.\n", "\n", "Katika somo hili, utajifunza:\n", "\n", "- Jinsi ya kuandaa data yako kwa ajili ya kujenga mifano.\n", "\n", "- Jinsi ya kutumia `ggplot2` kwa uonyeshaji wa data.\n", "\n", "Swali unalotaka kujibiwa litaamua ni aina gani ya algorithmi za ML utatumia. Na ubora wa jibu unalopata utategemea sana asili ya data yako.\n", "\n", "Hebu tuone hili kwa kufanya zoezi la vitendo.\n", "\n", "\n", "
\n",
" \n",
"
\n",
"\n",
"> Kumbusho: Opereta wa bomba (`%>%`) hufanya shughuli kwa mpangilio wa kimantiki kwa kupitisha kitu mbele kwenye kazi au usemi wa simu. Unaweza kufikiria opereta wa bomba kama kusema \"na kisha\" katika msimbo wako.\n"
],
"metadata": {
"id": "REWcIv9yX29v"
}
},
{
"cell_type": "markdown",
"source": [
"## 2. Angalia data iliyokosekana\n",
"\n",
"Moja ya changamoto za kawaida ambazo wanasayansi wa data hukutana nazo ni data isiyokamilika au iliyokosekana. R inawakilisha thamani zilizokosekana, au zisizojulikana, kwa thamani maalum: `NA` (Not Available).\n",
"\n",
"Kwa hivyo, tunawezaje kujua kwamba fremu ya data ina thamani zilizokosekana?\n",
"
\n",
"- Njia moja rahisi ni kutumia kazi ya msingi ya R `anyNA` ambayo inarejesha vitu vya kimantiki `TRUE` au `FALSE`.\n"
],
"metadata": {
"id": "Zxfb3AM5YbUe"
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"pumpkins %>% \n",
" anyNA()"
],
"outputs": [],
"metadata": {
"id": "G--DQutAYltj"
}
},
{
"cell_type": "markdown",
"source": [
"Kunaonekana kuna baadhi ya data zinazokosekana! Hapo ndipo mahali pazuri pa kuanzia.\n",
"\n",
"- Njia nyingine ni kutumia kazi `is.na()` ambayo inaonyesha ni vipengele vipi vya safu wima vinavyokosekana kwa kutumia mantiki ya `TRUE`.\n"
],
"metadata": {
"id": "mU-7-SB6YokF"
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"pumpkins %>% \n",
" is.na() %>% \n",
" head(n = 7)"
],
"outputs": [],
"metadata": {
"id": "W-DxDOR4YxSW"
}
},
{
"cell_type": "markdown",
"source": [
"Sawa, kazi imekamilika lakini kwa fremu kubwa ya data kama hii, itakuwa isiyo na ufanisi na karibu haiwezekani kupitia safu zote na nguzo moja baada ya nyingineπ΄.\n",
"\n",
"- Njia ya kueleweka zaidi itakuwa ni kuhesabu jumla ya thamani zinazokosekana kwa kila nguzo:\n"
],
"metadata": {
"id": "xUWxipKYY0o7"
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"pumpkins %>% \n",
" is.na() %>% \n",
" colSums()"
],
"outputs": [],
"metadata": {
"id": "ZRBWV6P9ZArL"
}
},
{
"cell_type": "markdown",
"source": [
"Ni bora zaidi! Kuna data inayokosekana, lakini labda haitakuwa na umuhimu kwa kazi inayofanyika. Hebu tuone uchambuzi zaidi utakavyoleta matokeo.\n",
"\n",
"> Pamoja na seti nzuri za pakiti na kazi, R ina nyaraka bora sana. Kwa mfano, tumia `help(colSums)` au `?colSums` ili kujifunza zaidi kuhusu kazi hiyo.\n"
],
"metadata": {
"id": "9gv-crB6ZD1Y"
}
},
{
"cell_type": "markdown",
"source": [
"## 3. Dplyr: Sarufi ya Uendeshaji wa Takwimu\n",
"\n",
"
\n",
" \n",
"
\n"
],
"metadata": {
"id": "VrDwF031avlR"
}
},
{
"cell_type": "markdown",
"source": [
"#### dplyr::case_when()\n",
"\n",
"**Lakini subiri! Kuna jambo moja zaidi la kufanya**\n",
"\n",
"Je, uliona kwamba kiasi cha bushel kinatofautiana kwa kila safu? Unahitaji kuweka bei sawa ili kuonyesha bei kwa bushel moja, si kwa 1 1/9 au 1/2 bushel. Ni wakati wa kufanya hesabu ili kuifanya iwe ya kawaida.\n",
"\n",
"Tutatumia kazi [`case_when()`](https://dplyr.tidyverse.org/reference/case_when.html) kubadilisha safu ya Bei kulingana na masharti fulani. `case_when` inakuruhusu kuunganisha taarifa nyingi za `if_else()` kwa urahisi.\n"
],
"metadata": {
"id": "mLpw2jH4a0tx"
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Convert the price if the Package contains fractional bushel values\n",
"new_pumpkins <- new_pumpkins %>% \n",
" mutate(Price = case_when(\n",
" str_detect(Package, \"1 1/9\") ~ Price/(1 + 1/9),\n",
" str_detect(Package, \"1/2\") ~ Price/(1/2),\n",
" TRUE ~ Price))\n",
"\n",
"# View the first few rows of the data\n",
"new_pumpkins %>% \n",
" slice_head(n = 30)"
],
"outputs": [],
"metadata": {
"id": "P68kLVQmbM6I"
}
},
{
"cell_type": "markdown",
"source": [
"Sasa, tunaweza kuchambua bei kwa kila kipimo kulingana na kipimo chao cha busheli. Hata hivyo, utafiti huu wote wa busheli za maboga unaonyesha jinsi ilivyo `muhimu` sana `kuelewa asili ya data yako`!\n",
"\n",
"> β
Kulingana na [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308), uzito wa busheli hutegemea aina ya mazao, kwa kuwa ni kipimo cha ujazo. \"Busheli ya nyanya, kwa mfano, inapaswa kuwa na uzito wa pauni 56... Majani na mboga za majani huchukua nafasi zaidi na uzito mdogo, hivyo busheli ya mchicha ni pauni 20 tu.\" Ni jambo gumu kidogo! Tusijisumbue na kufanya ubadilishaji wa busheli hadi pauni, badala yake tuweke bei kwa busheli. Hata hivyo, utafiti huu wote wa busheli za maboga unaonyesha jinsi ilivyo muhimu sana kuelewa asili ya data yako!\n",
"\n",
"> β
Je, uliona kwamba maboga yanayouzwa kwa nusu busheli ni ghali sana? Je, unaweza kubaini kwa nini? Dokezo: maboga madogo ni ghali zaidi kuliko makubwa, labda kwa sababu kuna mengi zaidi yao kwa busheli, ikizingatiwa nafasi isiyotumika inayochukuliwa na boga moja kubwa la pie lenye uwazi.\n"
],
"metadata": {
"id": "pS2GNPagbSdb"
}
},
{
"cell_type": "markdown",
"source": [
"Sasa mwisho, kwa ajili ya kujifurahisha tu πββοΈ, hebu pia tuhamishe safu ya Mwezi kwenye nafasi ya kwanza yaani `kabla` ya safu ya `Package`.\n",
"\n",
"`dplyr::relocate()` inatumika kubadilisha nafasi za safu.\n"
],
"metadata": {
"id": "qql1SowfbdnP"
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Create a new data frame new_pumpkins\n",
"new_pumpkins <- new_pumpkins %>% \n",
" relocate(Month, .before = Package)\n",
"\n",
"new_pumpkins %>% \n",
" slice_head(n = 7)"
],
"outputs": [],
"metadata": {
"id": "JJ1x6kw8bixF"
}
},
{
"cell_type": "markdown",
"source": [
"Kazi nzuri!π Sasa una seti safi na nadhifu ya data ambayo unaweza kutumia kujenga mfano wako mpya wa regression! \n",
"
\n"
],
"metadata": {
"id": "y8TJ0Za_bn5Y"
}
},
{
"cell_type": "markdown",
"source": [
"## 4. Uonyeshaji wa data kwa ggplot2\n",
"\n",
"
\n",
" \n",
"
\n"
],
"metadata": {
"id": "Ml7SDCLQcPvE"
}
},
{
"cell_type": "markdown",
"source": [
"### **Je, tunafanyaje iwe ya manufaa?**\n",
"\n",
"Ili kupata chati zinazoonyesha data ya manufaa, mara nyingi unahitaji kuunganisha data kwa namna fulani. Kwa mfano, katika hali yetu, kupata wastani wa bei ya maboga kwa kila mwezi kungeweza kutoa ufahamu zaidi kuhusu mifumo ya msingi katika data yetu. Hii inatupeleka kwenye kipengele kingine cha **dplyr**:\n",
"\n",
"#### `dplyr::group_by() %>% summarize()`\n",
"\n",
"Uchanganuzi wa vikundi katika R unaweza kufanywa kwa urahisi kwa kutumia\n",
"\n",
"`dplyr::group_by() %>% summarize()`\n",
"\n",
"- `dplyr::group_by()` hubadilisha kitengo cha uchambuzi kutoka seti nzima ya data hadi vikundi vya mtu binafsi kama vile kwa kila mwezi.\n",
"\n",
"- `dplyr::summarize()` huunda fremu mpya ya data yenye safu moja kwa kila kigezo cha kikundi na safu moja kwa kila takwimu ya muhtasari uliyoainisha.\n",
"\n",
"Kwa mfano, tunaweza kutumia `dplyr::group_by() %>% summarize()` kuunganisha maboga katika vikundi kulingana na safu ya **Month** na kisha kupata **wastani wa bei** kwa kila mwezi.\n"
],
"metadata": {
"id": "jMakvJZIcVkh"
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Find the average price of pumpkins per month\r\n",
"new_pumpkins %>%\r\n",
" group_by(Month) %>% \r\n",
" summarise(mean_price = mean(Price))"
],
"outputs": [],
"metadata": {
"id": "6kVSUa2Bcilf"
}
},
{
"cell_type": "markdown",
"source": [
"Fupi!β¨\n",
"\n",
"Vipengele vya kategoria kama miezi vinaonyeshwa vyema kwa kutumia mchoro wa mistari π. Tabaka zinazohusika na michoro ya mistari ni `geom_bar()` na `geom_col()`. Tazama `?geom_bar` ili kujifunza zaidi.\n",
"\n",
"Hebu tuunde moja!\n"
],
"metadata": {
"id": "Kds48GUBcj3W"
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"# Find the average price of pumpkins per month then plot a bar chart\r\n",
"new_pumpkins %>%\r\n",
" group_by(Month) %>% \r\n",
" summarise(mean_price = mean(Price)) %>% \r\n",
" ggplot(aes(x = Month, y = mean_price)) +\r\n",
" geom_col(fill = \"midnightblue\", alpha = 0.7) +\r\n",
" ylab(\"Pumpkin Price\")"
],
"outputs": [],
"metadata": {
"id": "VNbU1S3BcrxO"
}
},
{
"cell_type": "markdown",
"source": [
"π€©π€©Hii ni uwasilishaji wa data unaofaa zaidi! Inaonekana inaonyesha kwamba bei ya juu zaidi ya maboga hutokea mwezi wa Septemba na Oktoba. Je, hilo linakubaliana na matarajio yako? Kwa nini au kwa nini siyo?\n",
"\n",
"Hongera kwa kumaliza somo la pili π! Uliandaa data yako kwa ajili ya kujenga modeli, kisha ukagundua maarifa zaidi kwa kutumia uwasilishaji wa data!\n"
],
"metadata": {
"id": "zDm0VOzzcuzR"
}
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n---\n\n**Kanusho**: \nHati hii imetafsiriwa kwa kutumia huduma ya tafsiri ya AI [Co-op Translator](https://github.com/Azure/co-op-translator). Ingawa tunajitahidi kwa usahihi, tafadhali fahamu kuwa tafsiri za kiotomatiki zinaweza kuwa na makosa au kutokuwa sahihi. Hati ya asili katika lugha yake ya awali inapaswa kuzingatiwa kama chanzo cha mamlaka. Kwa taarifa muhimu, inashauriwa kutumia tafsiri ya kitaalamu ya binadamu. Hatutawajibika kwa maelewano mabaya au tafsiri zisizo sahihi zinazotokana na matumizi ya tafsiri hii.\n"
]
}
]
}