From ea9f42e8953668a9d8da2ee14ccc1a5a6c06f049 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jo=C3=A3o=20Francisco=20Cocca=20Fukuda?= Date: Tue, 6 Oct 2020 10:52:37 -0300 Subject: [PATCH] Add `Link Gatherer` --- Projects/3-Advanced/Link-Gatherer.md | 39 ++++++++++++++++++++++++++++ README.md | 1 + 2 files changed, 40 insertions(+) create mode 100644 Projects/3-Advanced/Link-Gatherer.md diff --git a/Projects/3-Advanced/Link-Gatherer.md b/Projects/3-Advanced/Link-Gatherer.md new file mode 100644 index 00000000..b650ec95 --- /dev/null +++ b/Projects/3-Advanced/Link-Gatherer.md @@ -0,0 +1,39 @@ +# Link Gatherer + +**Tier:** 3-Advanced + +This application receives a link (or an IP address), does a DNS name resolve and crawls for all links found on the page that leads to other pages on the same domain. + +The objective is to list all pages found on the website. + +## User Stories + +- [ ] User can execute the program with an IP address +- [ ] User can execute the program without a defined port (and port 80 will be set by default) +- [ ] User can execute the program with a defined port + +## Bonus features + +- [ ] User can execute the program with a name address (and a DNS name resolution will occur) +- [ ] User can scan https websites as well +- [ ] User can accept other format types (like gzip) +- [ ] User can format output to get only the local path or the full path (with domain name/IP address prepended) +- [ ] User can choose to ignore or show dangling links (that return a 404 page or any other code to their choice) + +## Useful links and resources + +As this program is meant to be made without too much help (from libraries), I suggest you learn: + +- [ ] HTML + - [HTML Tutorial - W3schools](https://www.w3schools.com/html/default.asp) + - [HTML Course - SoloLearn](https://www.sololearn.com/Course/HTML/) +- [ ] HTTP protocol (to build requests and understand responses) + - [Hypertext Transfer Protocol - Wikipedia](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol): Focus on `Message Format` and `Example Session` + - [An overview of HTTP - Mozilla](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview): Focus on `HTTP Flow` and `HTTP Messages` +- [ ] Socket programming (on the language of your choice) + - C/C++: [Socket Programming in C/C++ - GeeksForGeeks](https://www.geeksforgeeks.org/socket-programming-cc/) + - Python: [Socket Programming in Python - GeeksForGeeks](https://www.geeksforgeeks.org/socket-programming-python/) + +## Example projects + +I could not find anything like this. The closest was [HTTrack](https://www.httrack.com/), but it's main functionality is different. diff --git a/README.md b/README.md index f6557617..7923e5c2 100644 --- a/README.md +++ b/README.md @@ -133,6 +133,7 @@ required to complete them. | [Instagram Clone](./Projects/3-Advanced/Instagram-Clone-App.md) | A clone of Facebook's Instagram app | 3-Advanced | | [GitHub Timeline](./Projects/3-Advanced/GitHub-Timeline-App.md) | Generate a timeline of a users GitHub Repos | 3-Advanced | | [Kudos Slackbot](./Projects/3-Advanced/Kudos-Slackbot.md) | Give recognition to a deserving peer | 3-Advanced | +| [Link Gatherer](./Projects/3-Advanced/Link-Gatherer.md) | Find and list all pages possibly accessible on a website. | 3-Advanced | | [Movie App](./Projects/3-Advanced/Movie-App.md) | Browse, Find Ratings, Check Actors and Find you next movie to watch | 3-Advanced | | [MyPodcast Library](./Projects/3-Advanced/MyPodcast-Library-app.md) | Create a library of favorite podcasts | 3-Advanced | | [NASA Exoplanet Query](./Projects/3-Advanced/NASA-Exoplanet-Query.md) | Query NASA's Exoplanet Archive | 3-Advanced |