Data Science in Everyday Practice

Course Motivation
Image
data analysis
This course is offered as part of the programme "University in the Colleges" and it is hosted by Collegio Nuovo - Sezione Laureati (Graduate Section).
It is aimed at a medical and life science audience, with no prior background of data analysis and a minimal background of statistics. The goal of the course is to provide students with the most important tools and decision criteria, to import and visualise data originating from different sources (structured medical data, laboratory measures, biological experiments), to explore and understand key elements in those datasets they might encounter in their studies or everyday practice.
 
 
 
Where
Via E. Tibaldi, 4, 27100 Pavia PV
 
When
The course will take place during the week 19th to 23rd January 2026
Requirements
None. This course is designed for students with no prior background in data analysis and minimal knowledge of statistics.
 
CFU: 3
 
Intended Learning Outcomes
1. Understand the role of data science in their discipline
2. Use Python and Jupyter to explore different datasets
3. Apply basic criteria and tools to transform and visualise their data
4. Interpret their data based on the results of an exploratory data analysis (EDA)
Contents and Structure
The course focuses on practical data science techniques using Python, leveraging Jupyter Notebooks and/or Google Colab for hands-on practice. Key topics include:
  1. Introduction to Python for Data Science:
    1. Overview of Python's relevance to life sciences.
    2. Introduction to Jupyter Notebooks and Google Colab.
    3. Setting up your environment and working with online/cloud-based tools.
  2. Python Basics for Data Analysis:
    1. Data types and basic operations in Python.
    2. Using pandas to manipulate tabular data.
    3. NumPy for numerical computations.
  3. Data Wrangling and Preparation:
    1. Understanding and cleaning messy datasets.
    2. The concept of “tidy data”/tabular in Python.
    3. Merging, grouping, and transforming datasets.
  4. Data Visualization:
    1. Exploratory plots using matplotlib and seaborn.
    2. Interactive visualizations with Plotly.
    3. Visualization techniques for biological and medical datasets.
  5. Introduction to Statistics in Python:
    1. Descriptive statistics and their applications.
    2. Hypothesis testing using Python’s scipy/statsmodel.
  6. Data Lab:
    1. Hands-on exercises using Google Colab or local Jupyter Notebooks.
    2. (BONUS TRACK) use of ChatGpt in the data science workflow
  7. Applied Examples of Data Analysis:
    1. Real-world case studies with medical and biological datasets.
    2. Exploratory data analysis workflow from data import to visualization.
Methods
Class activity will be focused to demonstrations, discussions and problem solving through interaction: demo, group work, quiz and real-time feedback.
Jupyter notebooks and Google Colab will be used for easy access to coding and command line tools, with no prior experience.
Faculty
Image
Massimiliano Ruocco
Massimiliano Ruocco
Massimiliano is the teacher of this course and he is Adjunct Associate Professor at the Department of Computer Science, in the Data and Artificial Intelligent Group with a strong focus on Machine Learning and Artificial Intelligence. He has extensive expertise in deep neural networks, modern AI for time series analysis, active learning, and self-supervised learning. Massimiliano has a deep understanding of the latest advancements in AI and is able to apply this knowledge to real-world applications.
Image
Francesco Lescai
Francesco Lescai
Francesco Lescai is the host of the course, and he is Associate Professor of Bioinformatics at the Department of Biology and Biotechnology of the University of Pavia. He is also a member of the Scientific Advisory Board of Collegio Nuovo, the College hosting this initiative.
Detailed Schedule