Appearance
Homework 5
Instructions
Make sure of the following:
- Your solutions should be in form of a report in .md or .ipynb format.
- You have documented your procedure properly.
- Your answers are clear and concise.
When you are finished push you results to Github and raise an issue, just as you have done in previous homeworks. To pass the homework you will have to complete the assigments below and also finish the peer-review.
Feel free to contact me if anything is unclear.
REST API
The goal of this part of the assigment is to fetch data from the Nobel Prize REST API.
Fetch data in JSON format with information on the Nobel prizes in physics from the Nobel Prize API (v2). The docs can be found here.
The JSON structure should look something like the following.
JSON
[{'awardYear': '2022',
'category': {'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'},
'categoryFullName': {'en': 'The Nobel Prize in Physics',
'no': 'Nobelprisen i fysikk',
'se': 'Nobelpriset i fysik'},
'dateAwarded': '2022-10-04',
'prizeAmount': 10000000,
'prizeAmountAdjusted': 10000000,
'links': {'rel': 'nobelPrize',
'href': 'https://api.nobelprize.org/2/nobelPrize/phy/2022',
'action': 'Get',
'types': 'application/json'},
'laureates': [{'id': '1012',
'knownName': {'en': 'Alain Aspect'},
'fullName': {'en': 'Alain Aspect'},
'portion': '1/3',
'sortOrder': '1',
'motivation': {'en': 'for experiments with entangled photons, establishing the violation of Bell inequalities and pioneering quantum information science',
'se': 'för experiment med sammanfätade fotoner som påvisat brott mot Bell-olikheter och banat väg för kvantinformationsvetenskap'},
...
Using the retrieved data, extract all the prize motivations from the JSON-list and visualise the frequencies using a word cloud.
You can find out how to generate word clouds here:
Think about removing stop words and white spaces!
Web Scraping
The goal of this part of the assignment is to scrape data from https://books.toscrape.com/.
Create a web scraper that generates the following table. You can find more information on each book on its own webpage.
upc | title | price | rating |
---|---|---|---|
a897fe39b1053632 | A Light in the Attic | £51.77 | Three |
90fa61229261140a | Tipping the Velvet | £53.74 | One |
6957f44c3847a760 | Soumission | £50.10 | One |
e00eb4fd7b871a48 | Sharp Objects | £47.82 | Four |
4165285e1663650f | Sapiens: A Brief History of Humankind | £54.23 | Five |
On the webpage, you can see 20 books per page. Retrieve data from first three pages, that is page 1-3. That will be 60 datapoints.
Hint: Analyse the URL structure
Good luck!