Skip to content

Homework 5

Instructions

Make sure of the following:

  • Your solutions should be in form of a report in .md or .ipynb format.
  • You have documented your procedure properly.
  • Your answers are clear and concise.

When you are finished push you results to Github and raise an issue, just as you have done in previous homeworks. To pass the homework you will have to complete the assigments below and also finish the peer-review.

Feel free to contact me if anything is unclear.

REST API

The goal of this part of the assigment is to fetch data from the Nobel Prize REST API.

Fetch data in JSON format with information on the Nobel prizes in physics from the Nobel Prize API (v2). The docs can be found here.

The JSON structure should look something like the following.

JSON
[{'awardYear': '2022',
  'category': {'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'},
  'categoryFullName': {'en': 'The Nobel Prize in Physics',
   'no': 'Nobelprisen i fysikk',
   'se': 'Nobelpriset i fysik'},
  'dateAwarded': '2022-10-04',
  'prizeAmount': 10000000,
  'prizeAmountAdjusted': 10000000,
  'links': {'rel': 'nobelPrize',
   'href': 'https://api.nobelprize.org/2/nobelPrize/phy/2022',
   'action': 'Get',
   'types': 'application/json'},
  'laureates': [{'id': '1012',
    'knownName': {'en': 'Alain Aspect'},
    'fullName': {'en': 'Alain Aspect'},
    'portion': '1/3',
    'sortOrder': '1',
    'motivation': {'en': 'for experiments with entangled photons, establishing the violation of Bell inequalities and  pioneering quantum information science',
     'se': 'för experiment med sammanfätade fotoner som påvisat brott mot Bell-olikheter och  banat väg för kvantinformationsvetenskap'},
...

Using the retrieved data, extract all the prize motivations from the JSON-list and visualise the frequencies using a word cloud.

You can find out how to generate word clouds here:

Think about removing stop words and white spaces!

Web Scraping

The goal of this part of the assignment is to scrape data from https://books.toscrape.com/.

Create a web scraper that generates the following table. You can find more information on each book on its own webpage.

upctitlepricerating
a897fe39b1053632A Light in the Attic£51.77Three
90fa61229261140aTipping the Velvet£53.74One
6957f44c3847a760Soumission£50.10One
e00eb4fd7b871a48Sharp Objects£47.82Four
4165285e1663650fSapiens: A Brief History of Humankind£54.23Five

On the webpage, you can see 20 books per page. Retrieve data from first three pages, that is page 1-3. That will be 60 datapoints.

Hint: Analyse the URL structure

Good luck!