Adjusting crawler UCS

Patrick 2021-04-16 09:00:04 +02:00
parent 6f514803e8
commit f426070b4b

@ -7,25 +7,18 @@ The web crawler is an important component of our project. In this Use-Case-Speci
### Activity Diagram
![activity diagram](https://github.com/Mueller-Patrick/Betterzon/blob/master/doku/AC_Crawler.png)
At the very beginning the crawler process reads it's configuration file. If it's invalid, the process will terminate.
If not, the crawler will check if the specified Shop is already present in the database. If not, it will create the entry and continue with fetching all products from a certain category.
For every product in that list the following will be done:
- Check if the product is available on amazon.
-- If not, the product is discarded
- Check if the product is in the database
-- If not, it is added
- Add the fetched price to the price database
If all fetched products are processed, the process is terminated.
At the very beginning the crawler process reads it's configuration file and the products to be crawled from the database. If one of both is invalid / not availabel, the process will terminate.
If both are valid, the load balancer will distribute the crawling tasks across all registered crawling instances. Each instance will then iterate over the assigned list of products. For each product, it will then iterate over all vendors where this product is available and fetch the price. After all prices have been fetched, the price entries are sent to the database. After all products are crawled, the program is done and can potentially send the status of the execution to some HTTP endpoint.
## 3. Special Requirements
TBD
N/A
## 4. Preconditions
### 4.1 The Database has to accept connections
### 4.2 A configuration file has to be in place
### 4.2 A configuration file has to be in place / Environment variables have to be set properly
## 5. Postconditions
TBD
N/A
## 6. Function Points
[tbd]