1001 Freelance Projects
Latest Projects from Freelance Marketplaces
Today is: 08-May-2024 11:39 GMT
View Project
View this project in detail (Note: you will be redirected to external marketplace)
Project title: Fix error in Python-based web scraper wit GUI
Posted by: External project from PeoplePerHour
Started: 24-Apr-2024 19:27 GMT
Description: Hello Freelancers,
I'm searching for a developer familiar with web scraping and Python to fix an existing web scraper which scrapes product data from products from category links from italian ecommerce-website www.yeppon.it

The script basically works, but it gives an error on certain points when scraping, I think because of a light change in the structure of the website which causes an error when the script tries to scrape a products text description.

Goal of this project is to fix the errors so the script works like it used to again, scraping data of products from given category-URLs from the website and giving out the data in csv-files. I think this won't be too much of an effort because it is basically this one error which needs to be located and fixed, everything else still seems to work fine. Price can be discussed.

Some facts:

1. The web scraper is based on Python with a GUI. It's final version comes as an exe file (therefore I can't attach it in the project description, I will send it in the messages or work stream).
2. It scrapes certain product data (like product name, price, description, image links) by category links which can be entered into the GUI. The GUI also has some input fields, these are just for fixed strings which can be entered into the fields and will be given out in the CSV files the script gives the product data in.
3. The scraper technically still works, however, it gives an error when scraping certain categories. You can check this by running the tool, filling out the given input fields with the data explained in the "Instructions" tab of the tool and then start scraping. It will produce this error (can be found in the log file):

--------------------------------------------------------------------------------------------------------------
2024-04-24 12:44:24,765:ERROR:'descriptionHtml'
Traceback (most recent call last):
File "async_scraper.py", line 739, in scrape
description_html = pdata["pageProps"]["product"][
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'descriptionHtml'
2024-04-24 12:44:24,765:INFO: Finally
2024-04-24 12:44:24,765:INFO:


No data found!


------------------------------------------------------------------------------------------------------------------
4. The error seems to occur when scraping a products text description. The text description consists of three possible elements:
- bulletpoints formatted into an ul element
- a text description which is cleaned/has HTML code removed/replaced
- scraping data from a table on the website and putting it into a given HTML structure



It was developed by a freelancer from PPH for a colleague of mine, unfortunately I can't reach my colleague for quite some time now to ask for all the details or the freelancers name, so I will post this to the public.


Scraping some categories will result in the error mentioned above, for example:
https://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/asciugabiancheria
or
https://www.yeppon.it/c/elettrodomestici/grandi-elettrodomestici/frigoriferi

Others work just fine, like:
https://www.yeppon.it/c/telefonia/smartphone/smart-phone




I will attach the files I have about this project from my colleague. As I can't attach exe or rar files, I attached:

- a first version of the Python code (it is a beta version which will give another error which is solved in the final exe file and not the final code, just to give you an impression), as well as the code of the GUI and the requirements. These are async_scraper.txt, gui.txt and requirements.txt
Project ID: 3382569
Project category:
Project budget:
View this project in detail (Note: you will be redirected to external marketplace)
Last Projects / Browse Projects
  Project Started
Company homepage development
Category: CSS, Graphic Design, HTML5, Web Development, Web Design
Budget: $20 - $50 NZD
08-May-2024
10:04 GMT
Full-Time Data Entry Specialist Needed
Category: Data Entry, Data Processing, SEO, Virtual Assistant, Web Search
Budget: $30 - $250 USD
08-May-2024
10:03 GMT
Fun and Emotional Wedding Reel Editing
Category: Adobe Premiere Pro, After Effects, Video Editing, Video Services, Videography
Budget: ₹600 - ₹1500 INR
08-May-2024
10:03 GMT
Confirmis Site Verifier (Singapore) -- 24219
Category: Communications, Human Resources, Local Job, Photography, Travel Ready
Budget: $14 - $18 USD
08-May-2024
10:03 GMT
HTML to WordPress - & Blog
Category: CSS, HTML, PHP, Web Design, WordPress
Budget: ₹3000 - ₹4000 INR
08-May-2024
10:02 GMT
Laravel PHP Script Optimization & Customization
Category: HTML, Laravel, MySQL, PHP, Web Design
Budget: $30 - $250 USD
08-May-2024
10:02 GMT
Création d'un site web wordpress 4 langues
Category: Graphic Design, HTML, PHP, Web Design, WordPress
Budget: €250 - €750 EUR
08-May-2024
10:02 GMT
Facebook Ad Lead Generation for Courses
Category: Advertising, Facebook Marketing, Internet Marketing, Marketing, Social Media Marketing
Budget: ₹1500 - ₹12500 INR
08-May-2024
10:02 GMT
Smart Contract Development for Crypto Dapp
Category: C, Programming, PHP, Software Architecture, Software Testing, Testing / QA
Budget: ₹750 - ₹1250 INR
08-May-2024
10:01 GMT
Resin Miniature Model for Dungeons & Dragons
Category: 3D Animation, 3D Modelling, 3D Rendering, 3ds Max, Solidworks
Budget: $30 - $250 USD
08-May-2024
10:01 GMT
Resolve AspDotNetStorefront Email Dysfunction
Category: Database Administration, Database Programming, MySQL, PHP, SMTP
Budget: $10 - $30 AUD
08-May-2024
10:01 GMT
Edit 37 short videos clips for YouTube Shorts & TikTok
Category: Video Editing, Video Production, Video Services, Videography
Budget: €8 - €30 EUR
08-May-2024
10:01 GMT
WordPress Boat brand professional website template arrangement
Category: Graphic Design, Internet Marketing, SEO, Web Design, WordPress
Budget: €250 - €750 EUR
08-May-2024
10:00 GMT
Engaging Redesign of Driving School Website
Category: Content Writing, Graphic Design, HTML, PHP, Web Design
Budget: ₹600 - ₹1500 INR
08-May-2024
10:00 GMT
make changes to a crm application (only expert in report query e pivot)
Category: Laravel, MySQL
Budget: $80 - $120 USD
08-May-2024
10:00 GMT
Browse All Projects
Projects by Skills ...
Projects for 'android'
Projects for 'ajax'
Projects for 'asp'
Projects for 'aspnet'
Projects for 'cms'
Projects for 'cpp'
Projects for 'csharp'
Projects for 'css'
Projects for 'delphi'
Projects for 'design'
Projects for 'drupal'
Projects for 'excel'
Projects for 'facebook'
Projects for 'flash'
Projects for 'html'
Projects for 'java'
Projects for 'javascript'
Projects for 'joomla'
Projects for 'iphone'
Projects for 'mysql'
Projects for 'photoshop'
Projects for 'php'
Projects for 'python'
Projects for 'ruby'
Projects for 'seo'
Projects for 'sql'
Projects for 'sysadm'
Projects for 'translate'
Projects for 'typing'
Projects for 'twitter'
Projects for 'vbnet'
Projects for 'xml'
Projects for 'wordpress'
Projects for 'writing'
Read RSS feeds ... New!
RSS feed for 'android'
RSS feed for 'ajax'
RSS feed for 'asp'
RSS feed for 'aspnet'
RSS feed for 'cms'
RSS feed for 'cpp'
RSS feed for 'csharp'
RSS feed for 'css'
RSS feed for 'delphi'
RSS feed for 'design'
RSS feed for 'drupal'
RSS feed for 'excel'
RSS feed for 'facebook'
RSS feed for 'flash'
RSS feed for 'html'
RSS feed for 'java'
RSS feed for 'javascript'
RSS feed for 'joomla'
RSS feed for 'iphone'
RSS feed for 'mysql'
RSS feed for 'photoshop'
RSS feed for 'php'
RSS feed for 'python'
RSS feed for 'ruby'
RSS feed for 'seo'
RSS feed for 'sql'
RSS feed for 'sysadm'
RSS feed for 'translate'
RSS feed for 'typing'
RSS feed for 'twitter'
RSS feed for 'vbnet'
RSS feed for 'xml'
RSS feed for 'wordpress'
RSS feed for 'writing'
New!
Проекты на русском
(Projects in Russian)

Short URL:
1001fp.com
Mobile version:
m.1001freelanceprojects.com
Copyright © 2005-2022 1001 Freelance Projects