Skip to content

Homework 2 — Tables & Plots (Basics)

Overview

In HW2 you’ll practice:

  • Reading CSV files with varying separators/delimiters/decimal signs
  • Basic table operations (select, derive columns, sort, group, summarise)
  • Producing a simple, readable plot that supports a conclusion
  • Writing a clear, reproducible report (Markdown or Notebook)

You may use any tools (Python/R/SQL/editors). Your report must be readable on GitHub (README.md or .ipynb), and all code producing tables/figures must be committed so results are reproducible.

Repo Setup

Use your per-homework repo username-hw-2

     username-hw-2/
     ├─ HW2/
     │  ├─ README.md  (or HW2.ipynb)
     │  ├─ src/       (your code, if any)
     │  └─ data/      (data; or instructions to obtain data)
     ├─ environment/  (env files, e.g. requirements.txt or environment.yml)
     └─ README.md     (repo-level notes)
     └─ .gitignore    (repo-level ignore)

Create a top-level .gitignore and exclude unnecessary files (e.g., *.DS_Store, .ipynb_checkpoints/, large raw files, virtual envs).

Data

All datasets are in the course data repo: https://github.com/su-mt4007/data

  • Booli_sold.csv — apartment sales in Ekhagen
  • 2018_R_per_kommun.csvSwedish 2018 election results by municipality
  • stroke-data.csvstroke outcomes with individual attributes (source: Kaggle)

Tip: CSVs may use different separators (, ; \t) and decimal signs (. ,). In pandas, for example: pd.read_csv(path, sep=';', decimal=','). In R: read.csv(..., sep=';', dec=',').

Tasks

A) Apartment Prices (Ekhagen)

Using Booli_sold.csv:

  1. Price per sqm (ppsqm): Add a derived column ppsqm = price / living_area (use the dataset’s column names as appropriate; handle missing/zero areas).
  2. Top 5 by ppsqm: Show a table of the five most expensive apartments by ppsqm (columns: id/address, size, price, ppsqm).
  3. Average ppsqm: Compute and report the mean ppsqm (state how you handled missing values/outliers).
  4. One insight: Briefly highlight one interesting aspect/pattern (2–4 sentences) and show a small supporting table or plot.

B) Swedish Election 2018

Using 2018_R_per_kommun.csv:

  1. Legitimate votes in Stockholm: Compute the total number of legitimate votes (Giltiga Röster) across all Stockholm municipalities (explain how you filtered Stockholm).
  2. Highest S%: Identify the municipality with the highest vote percentage for Socialdemokraterna (S). Report municipality name and percentage.
  3. Participation ranking: Produce a table of the top 3 municipalities by participation (Valdeltagande) with municipality name and participation rate.

Be explicit about how you parsed the file (separator/decimal), and include any renaming you did so your code is easy to follow.

C) Predicting Strokes (one simple plot)

Using stroke-data.csv:

  • Create one simple, well-labeled plot that supports a single, clear conclusion (e.g., relationship between a binary/categorical feature and stroke outcome, or a summary of rates across groups).
  • State your conclusion in one or two sentences, referring to the plot (avoid causal claims; stick to descriptive insight).

Report Requirements (Markdown or Notebook)

Your report should be concise and easy to follow on GitHub:

  • Title & brief intro (what you did, data files used)
  • Methods/steps with short prose and code cells/blocks
  • Tables/figures placed near the text that discusses them
  • Reproducibility notes: how to run (e.g., pip install -r requirements.txt then open the notebook)
  • Conclusion (1–3 bullets or sentences)

Keep large data out of Git; provide download instructions if needed.

Acceptance Criteria (what we look for)

  • Reads the data correctly (handles separator/decimal, shows code)
  • Derives ppsqm, shows Top-5 table, computes average ppsqm, and states one insight
  • Election tasks completed (Stockholm total legitimate votes; municipality with highest S%; top-3 participation table)
  • One stroke plot + one-sentence conclusion
  • Reproducibility: code and minimal environment notes included
  • Clarity: report is readable on GitHub with headings/markdown

Submission

  1. Push your work to GitHub.

  2. Open an Issue titled HW2 – Submission (optional label: ready-for-grading). In the Issue body, include:

    • Link to your report file (HW2/README.md or HW2/HW2.ipynb)
    • 2–3 lines summarising your results
    • Any notes for the grader (e.g., parsing choices)

Deadline: Monday 23:59 (Europe/Stockholm)

Peer Review (after the deadline)

Comment under your partner’s HW2 – Submission Issue. Copy this checklist:

  • Coverage: Are all HW2 tasks completed?
  • Parsing: Is the separator/decimal handling explicit and correct?
  • Results: Do Top-5, averages, and rankings look consistent with the code?
  • Clarity: Is the report organised and readable on GitHub?
  • Reproducibility: Can you reproduce (or see how to reproduce) the outputs?
  • One suggestion: A specific, actionable improvement.
  • Detailed question: What is the name of the data frame from which the author has identified the municipality with the highest vote percentage for Socialdemokraterna (S)?

Peer-review deadline: Thursday 23:59 (Europe/Stockholm)

Grading

Per-homework scale U / G / VG based on:

  • Completeness (all tasks + submission/peer review)
  • Clarity (well-structured, labelled tables/plots, brief explanations)
  • Correctness & Reproducibility (parsing handled; code produces shown outputs)

Notes

  • Late submissions or reviews require an extra task and are graded Pass/Fail only (no VG).