From f426070b4b2390531677040d551f3849e9b5ee6b Mon Sep 17 00:00:00 2001 From: Patrick <50352812+Mueller-Patrick@users.noreply.github.com> Date: Fri, 16 Apr 2021 09:00:04 +0200 Subject: [PATCH] Adjusting crawler UCS --- Use-Case-Specification:-Web-Crawler.md | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/Use-Case-Specification:-Web-Crawler.md b/Use-Case-Specification:-Web-Crawler.md index 1c4c01b..722fc52 100644 --- a/Use-Case-Specification:-Web-Crawler.md +++ b/Use-Case-Specification:-Web-Crawler.md @@ -7,25 +7,18 @@ The web crawler is an important component of our project. In this Use-Case-Speci ### Activity Diagram ![activity diagram](https://github.com/Mueller-Patrick/Betterzon/blob/master/doku/AC_Crawler.png) -At the very beginning the crawler process reads it's configuration file. If it's invalid, the process will terminate. -If not, the crawler will check if the specified Shop is already present in the database. If not, it will create the entry and continue with fetching all products from a certain category. -For every product in that list the following will be done: -- Check if the product is available on amazon. --- If not, the product is discarded -- Check if the product is in the database --- If not, it is added -- Add the fetched price to the price database -If all fetched products are processed, the process is terminated. +At the very beginning the crawler process reads it's configuration file and the products to be crawled from the database. If one of both is invalid / not availabel, the process will terminate. +If both are valid, the load balancer will distribute the crawling tasks across all registered crawling instances. Each instance will then iterate over the assigned list of products. For each product, it will then iterate over all vendors where this product is available and fetch the price. After all prices have been fetched, the price entries are sent to the database. After all products are crawled, the program is done and can potentially send the status of the execution to some HTTP endpoint. ## 3. Special Requirements -TBD +N/A ## 4. Preconditions ### 4.1 The Database has to accept connections -### 4.2 A configuration file has to be in place +### 4.2 A configuration file has to be in place / Environment variables have to be set properly ## 5. Postconditions -TBD +N/A ## 6. Function Points [tbd] \ No newline at end of file