Appearance
Project
- Deadline: January 10th, 2025 at 23:59
- Presentation: January 17th, 2025, 13.00-18.00
Overview
For your individual project you will create your own blog post. The post should illustrate an issue using a unique data set collected by yourself and illustrate the tools taught in the course. Hand-in is done in the usual fashion, where you push it to your Github repository under a subdirectory called project and raise an issue. You will be able to push to your Github repository earlier if you want and we recommend that you do, we will only assess the version of your code at the time of the deadline. You will present your project infront of the class. The presentation should be no longer than 5 minutes. Presence is compulsory for the entire session you are presenting in.
Inspiration
You can find some inspiration of what I am expecting of you for the project
- The Olympic Medal Table Visualized Gapminder Style
- On a First Name Basis with Statistics Sweden
- Baby Weight Shiny app
- Are #python users more likely to get into Slytherin?
Data sources
During the course, you were introduced to a lot of possible data sources. Additional public web based data sources could, e.g., be the Stockholm Open Data Portal or an API to query data from Sweden’s national data portal. Another example of a contemporary website for relevant data is the COVID-19 data page by the Swedish Folkhälsomyndigheten.
We strongly recommend that you use data that you have collected yourself and that you find interesting.
Details
Find data out in the wild - this can be an open-access SQL database, API data, scraped data, personal surveillance data (e.g., running watch, log-files), data collected as part of a hobby. The raw data should not be of sensitive nature (data protection!) and should be accessible and uploadable to Github without violating any copyright or access rights.
Determine a good story you can tell based on the data, e.g., a specific hypothesis you want to investigate, a cool visualization of numbers, a data journalism type of story, an educative post, something which might interest your fellow students. Your post can be about a serious matter, but it can also be a not so serious matter. However, make it clear before writing who is your intended readership (general public, fellow B.Sc. students, ornotologists, …)
Read the data, wrangle the data, visualize the data, make simple statistical summaries and interpret the results in accordance with the selected story you chose in 2.
Write a story worthwhile reading for the selected target audience - it can be written in Swedish or English. The story needs to be written so that it can be reproduced, e.g Notebook and should not be excessively long. As a rough guideline: Between 1000-1500 words in the text, no more than 7 figures (tables count as figures). No more than 7 visible code chunks (if you decide to have any at all).
Create a 5 minute presentation about your work to present to your fellow students. The main aim of your presentation is to convince your fellow students that they should read your blog post
The biggest challenge of the project will be to be realistic about what you can achieve within the given deadline. Once you have an estimate of how much that could be, take 50% of that and you are still likely to be busy. Make sure you have a working project early on and then scale up iteratively, so you’re always ready. Start early.
Technical Details
You will hand-in the project as you have done in the homeworks. That is, via github, by pushing it to repo and raising an issue. The project should be under a subdirectory called project
. The submitted file should be a clear notebook for the sake of reproducibility and the data you have used. We should be able to reproduce your analysis by cloning your repository and running the notebook.
Grading
The project will be graded based on the following five dimensions, which have equal weight:
Technical difficulty of the project, i.e. how hard to get the data imported, how much time needed to wrangle, how advanced are some of the methods to get statistical summaries, use of additional technical shenanigans
Coding style and reproducibility of the the submitted file. This includes an assessment of, whether the code is readable.
Quality of the visualisations and their interpretation.
Readability of the project report (is the story concise, are the aims clear, is the readership happy, decent spelling & grammar). In particular: Less is sometimes more!
Quality of the presentation (slides, snore factor, staying in time, …)