Appearance
Homework 2 — Tables & Plots (Basics)
Overview
In HW2 you’ll practice:
- Reading CSV files with varying separators/delimiters/decimal signs
- Basic table operations (select, derive columns, sort, group, summarise)
- Producing a simple, readable plot that supports a conclusion
- Writing a clear, reproducible report (Markdown or Notebook)
You may use any tools (Python/R/SQL/editors). Your report must be readable on GitHub (README.md or .ipynb), and all code producing tables/figures must be committed so results are reproducible.
Repo Setup
Use your per-homework repo username-hw-2
username-hw-2/
├─ HW2/
│ ├─ README.md (or HW2.ipynb)
│ ├─ src/ (your code, if any)
│ └─ data/ (data; or instructions to obtain data)
├─ environment/ (env files, e.g. requirements.txt or environment.yml)
└─ README.md (repo-level notes)
└─ .gitignore (repo-level ignore)Create a top-level .gitignore and exclude unnecessary files (e.g., *.DS_Store, .ipynb_checkpoints/, large raw files, virtual envs).
Data
All datasets are in the course data repo: https://github.com/su-mt4007/data
Booli_sold.csv— apartment sales in Ekhagen2018_R_per_kommun.csv— Swedish 2018 election results by municipalitystroke-data.csv— stroke outcomes with individual attributes (source: Kaggle)
Tip: CSVs may use different separators (
,;\t) and decimal signs (.,). In pandas, for example:pd.read_csv(path, sep=';', decimal=','). In R:read.csv(..., sep=';', dec=',').
Tasks
A) Apartment Prices (Ekhagen)
Using Booli_sold.csv:
- Price per sqm (ppsqm): Add a derived column
ppsqm = price / living_area(use the dataset’s column names as appropriate; handle missing/zero areas). - Top 5 by ppsqm: Show a table of the five most expensive apartments by
ppsqm(columns: id/address, size, price,ppsqm). - Average ppsqm: Compute and report the mean
ppsqm(state how you handled missing values/outliers). - One insight: Briefly highlight one interesting aspect/pattern (2–4 sentences) and show a small supporting table or plot.
B) Swedish Election 2018
Using 2018_R_per_kommun.csv:
- Legitimate votes in Stockholm: Compute the total number of legitimate votes (Giltiga Röster) across all Stockholm municipalities (explain how you filtered Stockholm).
- Highest S%: Identify the municipality with the highest vote percentage for Socialdemokraterna (S). Report municipality name and percentage.
- Participation ranking: Produce a table of the top 3 municipalities by participation (Valdeltagande) with municipality name and participation rate.
Be explicit about how you parsed the file (separator/decimal), and include any renaming you did so your code is easy to follow.
C) Predicting Strokes (one simple plot)
Using stroke-data.csv:
- Create one simple, well-labeled plot that supports a single, clear conclusion (e.g., relationship between a binary/categorical feature and stroke outcome, or a summary of rates across groups).
- State your conclusion in one or two sentences, referring to the plot (avoid causal claims; stick to descriptive insight).
Report Requirements (Markdown or Notebook)
Your report should be concise and easy to follow on GitHub:
- Title & brief intro (what you did, data files used)
- Methods/steps with short prose and code cells/blocks
- Tables/figures placed near the text that discusses them
- Reproducibility notes: how to run (e.g.,
pip install -r requirements.txtthen open the notebook) - Conclusion (1–3 bullets or sentences)
Keep large data out of Git; provide download instructions if needed.
Acceptance Criteria (what we look for)
- Reads the data correctly (handles separator/decimal, shows code)
- Derives
ppsqm, shows Top-5 table, computes average ppsqm, and states one insight - Election tasks completed (Stockholm total legitimate votes; municipality with highest S%; top-3 participation table)
- One stroke plot + one-sentence conclusion
- Reproducibility: code and minimal environment notes included
- Clarity: report is readable on GitHub with headings/markdown
Submission
Push your work to GitHub.
Open an Issue titled
HW2 – Submission(optional label:ready-for-grading). In the Issue body, include:- Link to your report file (
HW2/README.mdorHW2/HW2.ipynb) - 2–3 lines summarising your results
- Any notes for the grader (e.g., parsing choices)
- Link to your report file (
Deadline: Monday 23:59 (Europe/Stockholm)
Peer Review (after the deadline)
Comment under your partner’s HW2 – Submission Issue. Copy this checklist:
- Coverage: Are all HW2 tasks completed?
- Parsing: Is the separator/decimal handling explicit and correct?
- Results: Do Top-5, averages, and rankings look consistent with the code?
- Clarity: Is the report organised and readable on GitHub?
- Reproducibility: Can you reproduce (or see how to reproduce) the outputs?
- One suggestion: A specific, actionable improvement.
- Detailed question: What is the name of the data frame from which the author has identified the municipality with the highest vote percentage for Socialdemokraterna (S)?
Peer-review deadline: Thursday 23:59 (Europe/Stockholm)
Grading
Per-homework scale U / G / VG based on:
- Completeness (all tasks + submission/peer review)
- Clarity (well-structured, labelled tables/plots, brief explanations)
- Correctness & Reproducibility (parsing handled; code produces shown outputs)
Notes
- Late submissions or reviews require an extra task and are graded Pass/Fail only (no VG).