Master deployment (#67)

* BETTERZON-58: Basic Functionality with scrapy (#33)

* BETTERZON-73: Adding API endpoint that returns the lowest non-amazon prices for a given list of product ids (#32)

* BETTERZON-75: User registration API endpoint (#34)

* BETTERZON-75: Adding backend functions to enable user registration

* BETTERZON-75: Adding regex to check email and username

* BETTERZON-83: FE unit testing (#35)

* BETTERZON-83: Making pre-generated unit tests work

* BETTERZON-83: Writing unit tests for angular to improve code coverage

* BETTERZON-79: Adding API endpoint for logging in (#36)

* BETTERZON-84: Adding service method to check if a session is valid (#37)

* BETTERZON-77: Changing error behavior as the previous behavior cloud have opened up security vulnerabilities (#38)

* BETTERZON-76: Adding method descriptions for backend service methods (#40)

* Adding Codacy code quality badge to README

* BETTERZON-89: Refactoring / Reformatting and adding unit tests (#41)

* BETTERZON-90: Adding API endpoint for creating price alarms (#42)

* BETTERZON-91: Adding API endpoint to GET all price alarms for the currently logged in user (#43)

* BETTERZON-92: Adding API endpoint to edit (update) price alarms (#44)

* BETTERZON-99: Adding some basic cucumber tests (#45)

* BETTERZON-100: Switching to cookies for session management (#46)

* BETTERZON-100: Switching session handling to cookies

* BETTERZON-100: Some code reformatting

* BETTERZON-100: Some more code reformatting

* BETTERZON-93: Adding API endpoint to get managed shops (#47)

* BETTERZON-94: Adding API endpoint to deactivate price listings as a vendor manager (#48)

* BETTERZON-97: Adding API endpoint to get all products listed by a specific vendor (#50)

* BETTERZON-98: Adding API endpoint for adding price entries as a registered vendor manager (#51)

* BETTERZON-95: Adding API endpoint for getting, inserting and updating contact persons (#52)

* BETTERZON-58 (#53)

* BETTERZON-58: Basic Functionality with scrapy

* Added independent crawler function, yielding price

* moved logic to amazon.py

* .

* moved scrapy files to unused folder

* Added basic amazon crawler using beautifulsoup4

* Connected Api to Crawler

* Fixed string concatenation for sql statement in getProductLinksForProduct

* BETTERZON-58: Fixing SQL insert

* BETTERZON-58: Adding access key verification

* BETTERZON-58: Fixing API endpoint of the crawler
- The list of products in the API request was treated like a string and henceforth, only the first product has been crawled

* Added another selector for price on amazon (does not work for books)

Co-authored-by: root <root@DESKTOP-ARBPL82.localdomain>
Co-authored-by: Patrick Müller <patrick@mueller-patrick.tech>
Co-authored-by: Patrick <50352812+Mueller-Patrick@users.noreply.github.com>

* BETTERZON-96: Adding API endpoint for delisting a whole vendor (#54)

* BETTERZON-101: Adding service functions for pricealarms api (#55)

- Not properly tested though as login functionality is required to test but not yet implemented

* BETTERZON-110: Refactoring, reformatting and commenting api service (#56)

* BETTERZON-107: Refactoring code with Proxy as design pattern (#49)

* BETTERZON-78 (#39)

* BETTERZON-31, dependencies.

* BETTERZON-31: Fixing dependencies

* BETTERZON-31,
BETTERZON-50

info popover and footer had been changed.

* BETTERZON-74

simple top-bar has been created.

* WIP: creating footer using grid.

* BETTERZON-78 adding bottom bar and top bar

* Adding cookieconsent as dependency again since it was removed by a merge

* Adding cookieconsent as dependency again since it was removed by a merge

* Apply suggestions from code review

Switching from single to double quotes

* BETTERZON-78 - grid added, structured as in Adobe XD mockup

Co-authored-by: Patrick Müller <patrick@mueller-patrick.tech>
Co-authored-by: Patrick <50352812+Mueller-Patrick@users.noreply.github.com>

* BETTERZON-109 (#57)

* BETTERZON-31, dependencies.

* BETTERZON-31: Fixing dependencies

* BETTERZON-31,
BETTERZON-50

info popover and footer had been changed.

* BETTERZON-74

simple top-bar has been created.

* WIP: creating footer using grid.

* BETTERZON-78 adding bottom bar and top bar

* Adding cookieconsent as dependency again since it was removed by a merge

* Adding cookieconsent as dependency again since it was removed by a merge

* Apply suggestions from code review

Switching from single to double quotes

* BETTERZON-78 - grid added, structured as in Adobe XD mockup

* wip: component rewritten, simple grid applied.

* wip: new component created and added to the app.module.ts. Added a minimal grid layout.

* wip: all components were wrapped now. Grid structure has been applied to the main wrapper-class "container".

* wip: component created and added to the app.module.ts

Co-authored-by: Patrick Müller <patrick@mueller-patrick.tech>
Co-authored-by: Patrick <50352812+Mueller-Patrick@users.noreply.github.com>

* BETTERZON-108 (#58)

* BETTERZON-31, dependencies.

* BETTERZON-31: Fixing dependencies

* BETTERZON-31,
BETTERZON-50

info popover and footer had been changed.

* BETTERZON-74

simple top-bar has been created.

* WIP: creating footer using grid.

* BETTERZON-78 adding bottom bar and top bar

* Adding cookieconsent as dependency again since it was removed by a merge

* Adding cookieconsent as dependency again since it was removed by a merge

* Apply suggestions from code review

Switching from single to double quotes

* BETTERZON-78 - grid added, structured as in Adobe XD mockup

* wip: component rewritten, simple grid applied.

* wip: new component created and added to the app.module.ts. Added a minimal grid layout.

* wip: all components were wrapped now. Grid structure has been applied to the main wrapper-class "container".

Co-authored-by: Patrick Müller <patrick@mueller-patrick.tech>
Co-authored-by: Patrick <50352812+Mueller-Patrick@users.noreply.github.com>

* BETTERZON-106 (#59)

* BETTERZON-31, dependencies.

* BETTERZON-31: Fixing dependencies

* BETTERZON-31,
BETTERZON-50

info popover and footer had been changed.

* BETTERZON-74

simple top-bar has been created.

* WIP: creating footer using grid.

* BETTERZON-78 adding bottom bar and top bar

* Adding cookieconsent as dependency again since it was removed by a merge

* Adding cookieconsent as dependency again since it was removed by a merge

* Apply suggestions from code review

Switching from single to double quotes

* BETTERZON-78 - grid added, structured as in Adobe XD mockup

* wip: component rewritten, simple grid applied.

* wip: new component created and added to the app.module.ts. Added a minimal grid layout.

Co-authored-by: Patrick Müller <patrick@mueller-patrick.tech>
Co-authored-by: Patrick <50352812+Mueller-Patrick@users.noreply.github.com>

* BETTEZON-102 (#60)

* BETTERZON-31, dependencies.

* BETTERZON-31: Fixing dependencies

* BETTERZON-31,
BETTERZON-50

info popover and footer had been changed.

* BETTERZON-74

simple top-bar has been created.

* WIP: creating footer using grid.

* BETTERZON-78 adding bottom bar and top bar

* Adding cookieconsent as dependency again since it was removed by a merge

* Adding cookieconsent as dependency again since it was removed by a merge

* Apply suggestions from code review

Switching from single to double quotes

* BETTERZON-78 - grid added, structured as in Adobe XD mockup

* wip: component rewritten, simple grid applied.

Co-authored-by: Patrick Müller <patrick@mueller-patrick.tech>
Co-authored-by: Patrick <50352812+Mueller-Patrick@users.noreply.github.com>

* BETTERZON-113, BETTERZON-114, BETTERZON-115: Adding API endpoint for favorite shops (#61)

* BETTERZON-116: Adding API endpoint for searching a new product (#62)

* BETTERZON-117: Adding API endpoint for getting the latest crawling status (#63)

* BETTERZON-111: Adding service functions for login and registration (#64)

* BETTERZON-112: Adding service functions for managing vendor shops (#65)

* BETTERZON-118: Adding service functions for managing favorite shops (#66)

Co-authored-by: henningxtro <sextro.henning@student.dhbw-karlsruhe.de>
Co-authored-by: root <root@DESKTOP-ARBPL82.localdomain>
Co-authored-by: Reboooooorn <61185041+Reboooooorn@users.noreply.github.com>
This commit is contained in:
Patrick
2021-05-29 10:58:27 +02:00
committed by GitHub
parent 197c39a61d
commit 6e8c52857f
117 changed files with 6624 additions and 2492 deletions
+1 -2
View File
@@ -2,13 +2,12 @@
<module type="WEB_MODULE" version="4">
<component name="FacetManager">
<facet type="Python" name="Python">
<configuration sdkName="Python 3.9" />
<configuration sdkName="Python 3.9 (venv)" />
</facet>
</component>
<component name="NewModuleRootManager" inherit-compiler-output="true">
<exclude-output />
<content url="file://$MODULE_DIR$" />
<orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="library" name="Python 3.9 interpreter library" level="application" />
</component>
</module>
+12 -3
View File
@@ -1,13 +1,17 @@
import os
from flask import Flask
from flask_restful import Resource, Api, reqparse
import crawler
app = Flask(__name__)
api = Api(app)
# To parse request data
parser = reqparse.RequestParser()
parser.add_argument('key')
parser.add_argument('products')
parser.add_argument('key', type=str)
parser.add_argument('products', type=int, action='append')
class CrawlerApi(Resource):
@@ -17,7 +21,12 @@ class CrawlerApi(Resource):
def post(self):
# Accept crawler request here
args = parser.parse_args()
return args
access_key = os.getenv('CRAWLER_ACCESS_KEY')
if(args['key'] == access_key):
crawler.crawl(args['products'])
return {'message': 'success'}
else:
return {'message': 'Wrong access key'}
api.add_resource(CrawlerApi, '/')
+107 -78
View File
@@ -1,78 +1,107 @@
import sql
def crawl(product_ids: [int]) -> dict:
"""
Crawls the given list of products and saves the results to sql
:param products: The list of product IDs to fetch
:return: A dict with the following fields:
total_crawls: number of total crawl tries (products * vendors per product)
successful_crawls: number of successful products
products_with_problems: list of products that have not been crawled successfully
"""
total_crawls = 0
successful_crawls = 0
products_with_problems = []
# Iterate over every product that has to be crawled
for product_id in product_ids:
# Get all links for this product
product_links = sql.getProductLinksForProduct(product_id)
crawled_data = []
# Iterate over every link / vendor
for product_vendor_info in product_links:
total_crawls += 1
# Call the appropriate vendor crawling function and append the result to the list of crawled data
if product_vendor_info['vendor_id'] == 1:
# Amazon
crawled_data.append(__crawl_amazon__(product_vendor_info))
elif product_vendor_info['vendor_id'] == 2:
# Apple
crawled_data.append(__crawl_apple__(product_vendor_info))
elif product_vendor_info['vendor_id'] == 3:
# Media Markt
crawled_data.append(__crawl_mediamarkt__(product_vendor_info))
else:
products_with_problems.append(product_vendor_info)
continue
successful_crawls += 1
# Insert data to SQL
sql.insertData(crawled_data)
return {
'total_crawls': total_crawls,
'successful_crawls': successful_crawls,
'products_with_problems': products_with_problems
}
def __crawl_amazon__(product_info: dict) -> tuple:
"""
Crawls the price for the given product from amazon
:param product_info: A dict with product info containing product_id, vendor_id, url
:return: A tuple with the crawled data, containing (product_id, vendor_id, price_in_cents)
"""
return (product_info['product_id'], product_info['vendor_id'], 123)
def __crawl_apple__(product_info: dict) -> tuple:
"""
Crawls the price for the given product from apple
:param product_info: A dict with product info containing product_id, vendor_id, url
:return: A tuple with the crawled data, containing (product_id, vendor_id, price_in_cents)
"""
return (product_info['product_id'], product_info['vendor_id'], 123)
def __crawl_mediamarkt__(product_info: dict) -> tuple:
"""
Crawls the price for the given product from media markt
:param product_info: A dict with product info containing product_id, vendor_id, url
:return: A tuple with the crawled data, containing (product_id, vendor_id, price_in_cents)
"""
pass
import sql
import requests
from bs4 import BeautifulSoup
HEADERS = ({'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 '
'Safari/537.36'})
def crawl(product_ids: [int]) -> dict:
"""
Crawls the given list of products and saves the results to sql
:param products: The list of product IDs to fetch
:return: A dict with the following fields:
total_crawls: number of total crawl tries (products * vendors per product)
successful_crawls: number of successful products
products_with_problems: list of products that have not been crawled successfully
"""
total_crawls = 0
successful_crawls = 0
products_with_problems = []
# Iterate over every product that has to be crawled
for product_id in product_ids:
# Get all links for this product
product_links = sql.getProductLinksForProduct(product_id)
crawled_data = []
# Iterate over every link / vendor
for product_vendor_info in product_links:
total_crawls += 1
# Call the appropriate vendor crawling function and append the result to the list of crawled data
if product_vendor_info['vendor_id'] == 1:
# Amazon
data = __crawl_amazon__(product_vendor_info)
if data:
crawled_data.append(data)
elif product_vendor_info['vendor_id'] == 2:
# Apple
data = __crawl_apple__(product_vendor_info)
if data:
crawled_data.append(data)
elif product_vendor_info['vendor_id'] == 3:
# Media Markt
data = __crawl_mediamarkt__(product_vendor_info)
if data:
crawled_data.append(data)
else:
products_with_problems.append(product_vendor_info)
continue
successful_crawls += 1
# Insert data to SQL
sql.insertData(crawled_data)
return {
'total_crawls': total_crawls,
'successful_crawls': successful_crawls,
'products_with_problems': products_with_problems
}
def __crawl_amazon__(product_info: dict) -> tuple:
"""
Crawls the price for the given product from amazon
:param product_info: A dict with product info containing product_id, vendor_id, url
:return: A tuple with the crawled data, containing (product_id, vendor_id, price_in_cents)
"""
page = requests.get(product_info['url'], headers=HEADERS)
soup = BeautifulSoup(page.content, features="lxml")
try:
price = int(
soup.find(id='priceblock_ourprice').get_text().replace(".", "").replace(",", "").replace("", "").strip())
if not price:
price = int(soup.find(id='price_inside_buybox').get_text().replace(".", "").replace(",", "").replace("", "").strip())
except RuntimeError:
price = -1
except AttributeError:
price = -1
if price != -1:
return (product_info['product_id'], product_info['vendor_id'], price)
else:
return None
def __crawl_apple__(product_info: dict) -> tuple:
"""
Crawls the price for the given product from apple
:param product_info: A dict with product info containing product_id, vendor_id, url
:return: A tuple with the crawled data, containing (product_id, vendor_id, price_in_cents)
"""
# return (product_info['product_id'], product_info['vendor_id'], 123)
pass
def __crawl_mediamarkt__(product_info: dict) -> tuple:
"""
Crawls the price for the given product from media markt
:param product_info: A dict with product info containing product_id, vendor_id, url
:return: A tuple with the crawled data, containing (product_id, vendor_id, price_in_cents)
"""
pass
+4 -1
View File
@@ -1,4 +1,7 @@
pymysql
flask
flask==1.1.2
flask-sqlalchemy
flask_restful
beautifulsoup4
requests
lxml
-1
View File
@@ -54,7 +54,6 @@ def getProductLinksForProduct(product_id: int) -> [dict]:
cur = conn.cursor()
query = 'SELECT vendor_id, url FROM product_links WHERE product_id = %s'
cur.execute(query, (product_id,))
products = list(map(lambda x: {'product_id': product_id, 'vendor_id': x[0], 'url': x[1]}, cur.fetchall()))
+33
View File
@@ -0,0 +1,33 @@
import scrapy
from scrapy.crawler import CrawlerProcess
import re
class AmazonSpider(scrapy.Spider):
name = 'amazon'
allowed_domains = ['amazon.de']
start_urls = ['https://amazon.de/dp/B083DRCPJG']
# def __init__(self, start_urls):
# self.start_urls = start_urls
def parse(self, response):
price = response.xpath('//*[@id="priceblock_ourprice"]/text()').extract_first()
if not price:
price = response.xpath('//*[@data-asin-price]/@data-asin-price').extract_first() or \
response.xpath('//*[@id="price_inside_buybox"]/text()').extract_first()
euros = re.match('(\d*),\d\d', price).group(1)
cents = re.match('\d*,(\d\d)', price).group(1)
priceincents = euros + cents
yield {'price': priceincents}
def start_crawling():
process = CrawlerProcess(
settings={'COOKIES_ENABLED': 'False', 'CONCURRENT_REQUESTS_PER_IP': 1, 'ROBOTSTXT_OBEY': False,
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
'DOWNLOAD_DELAY': 3}
, install_root_handler=False)
process.crawl()
process.start()
+12
View File
@@ -0,0 +1,12 @@
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html
import scrapy
class CrawlerItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pass
@@ -0,0 +1,103 @@
# Define here the models for your spider middleware
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html
from scrapy import signals
# useful for handling different item types with a single interface
from itemadapter import is_item, ItemAdapter
class CrawlerSpiderMiddleware:
# Not all methods need to be defined. If a method is not defined,
# scrapy acts as if the spider middleware does not modify the
# passed objects.
@classmethod
def from_crawler(cls, crawler):
# This method is used by Scrapy to create your spiders.
s = cls()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
def process_spider_input(self, response, spider):
# Called for each response that goes through the spider
# middleware and into the spider.
# Should return None or raise an exception.
return None
def process_spider_output(self, response, result, spider):
# Called with the results returned from the Spider, after
# it has processed the response.
# Must return an iterable of Request, or item objects.
for i in result:
yield i
def process_spider_exception(self, response, exception, spider):
# Called when a spider or process_spider_input() method
# (from other spider middleware) raises an exception.
# Should return either None or an iterable of Request or item objects.
pass
def process_start_requests(self, start_requests, spider):
# Called with the start requests of the spider, and works
# similarly to the process_spider_output() method, except
# that it doesnt have a response associated.
# Must return only requests (not items).
for r in start_requests:
yield r
def spider_opened(self, spider):
spider.logger.info('Spider opened: %s' % spider.name)
class CrawlerDownloaderMiddleware:
# Not all methods need to be defined. If a method is not defined,
# scrapy acts as if the downloader middleware does not modify the
# passed objects.
@classmethod
def from_crawler(cls, crawler):
# This method is used by Scrapy to create your spiders.
s = cls()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
def process_request(self, request, spider):
# Called for each request that goes through the downloader
# middleware.
# Must either:
# - return None: continue processing this request
# - or return a Response object
# - or return a Request object
# - or raise IgnoreRequest: process_exception() methods of
# installed downloader middleware will be called
return None
def process_response(self, request, response, spider):
# Called with the response returned from the downloader.
# Must either;
# - return a Response object
# - return a Request object
# - or raise IgnoreRequest
return response
def process_exception(self, request, exception, spider):
# Called when a download handler or a process_request()
# (from other downloader middleware) raises an exception.
# Must either:
# - return None: continue processing this exception
# - return a Response object: stops process_exception() chain
# - return a Request object: stops process_exception() chain
pass
def spider_opened(self, spider):
spider.logger.info('Spider opened: %s' % spider.name)
@@ -0,0 +1,13 @@
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
class CrawlerPipeline:
def process_item(self, item, spider):
return item
+88
View File
@@ -0,0 +1,88 @@
# Scrapy settings for crawler project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://docs.scrapy.org/en/latest/topics/settings.html
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'crawler'
SPIDER_MODULES = ['crawler.spiders']
NEWSPIDER_MODULE = 'crawler.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32
# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
CONCURRENT_REQUESTS_PER_IP = 1
# Disable cookies (enabled by default)
COOKIES_ENABLED = False
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'crawler.middlewares.CrawlerSpiderMiddleware': 543,
#}
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'crawler.middlewares.CrawlerDownloaderMiddleware': 543,
#}
# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
# 'crawler.pipelines.CrawlerPipeline': 300,
#}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
# The initial download delay
AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
+11
View File
@@ -0,0 +1,11 @@
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.io/en/latest/deploy.html
[settings]
default = crawler.settings
[deploy]
#url = http://localhost:6800/
project = crawler
@@ -0,0 +1,4 @@
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.
+25
View File
@@ -0,0 +1,25 @@
import scrapy
import re
class AmazonSpider(scrapy.Spider):
name = 'amazon'
allowed_domains = ['amazon.de']
start_urls = ['https://amazon.de/dp/B083DRCPJG']
def parse(self, response):
price = response.xpath('//*[@id="priceblock_ourprice"]/text()').extract_first()
if not price:
price = response.xpath('//*[@data-asin-price]/@data-asin-price').extract_first() or \
response.xpath('//*[@id="price_inside_buybox"]/text()').extract_first()
euros = re.match('(\d*),\d\d', price).group(1)
cents = re.match('\d*,(\d\d)', price).group(1)
priceincents = euros + cents
yield {'price': priceincents}