Skip to content
Logo of the University of Glasgow

Advanced Python for Biologists

Python Logo

Date

15 - 26 April 2024

Times

Half days

Venue

Remote course using Zoom and Slack

Trainer

Martin Jones

Description

Python is a dynamic, readable language that is a popular platform for all types of bioinformatics work, from simple one-off scripts to large, complex software projects. This workshop is aimed at people who already have a basic knowledge of Python and are interested in using the language to tackle larger problems. We will focus on three main themes:

  1. learning about advanced language features (recursion, complex data structures, comprehensions, exceptions) that are relevant to bioinformatics work
  2. learning about development tools (benchmarking, profiling, unit testing) that can make it easier for us to write code that is both fast and correct
  3. learning about different programming styles and concepts (object-oriented programming, functional programming) that are suitable for different kinds of problems

The workshop will use examples and exercises drawn from various aspects of bioinformatics work. After completing the workshop, students should be in a position to (1) take advantage of the advanced language features in their own programs and (2) use appropriate tools when developing software programs. They will also have a deeper understanding of how Python works internally, which will be invaluable when making sense of existing code and packages.

Feedback from recent attendees

"Fantastic course-- excellent organisation and course content. Martin is a great teacher. Learnt a lot, especially coming from a programming-naïve background.”

"This course exceeded all my expectations. Martin was a great instructor, who clearly knows how to frame any programming topic into a biology question. Now I feel very confident to keep improving my Python skills (after a couple of failed attempts with other courses in the past).”

Course fee

£500

Cancellation policy

A refund will be issued if a booking is cancelled more than one week prior to the workshop.

Who is the workshop for

This course is designed for people who already know some Python and who are interested in tackling more ambitious programs, particularly ones that will deal with large or complex datasets and will therefore need to work efficiently. Students should have a basic biological background (or be prepared to ask a lot of questions!) as the examples and exercises assume some knowledge of what DNA is, what is meant by gene expression, how to read a phylogenetic tree, etc.

The course is **not** suitable for complete beginners to Python as we will assume quite a lot of knowledge of the basic syntax of the language. The material covered in the Introduction to Python for Biologists

About the trainer

Martin started his programming career by learning Perl during the course of his PhD in evolutionary biology, and started teaching other people to program soon after. Since then he has taught introductory programming to hundreds of biologists, from undergraduates to PIs, and has maintained a philosophy that programming courses must be friendly, approachable, and practical. In his academic career, Martin mixed research and teaching at the University of Edinburgh, culminating in a two year stint as Lecturer in Bioinformatics. He now runs programming courses for biological researchers as a full time freelancer.

Logistics

We will deliver the course over ten days, from Monday 15 April - Friday 26 April 2024. Training will consist of lectures, demonstrations and practical exercises, with the trainer on hand to assist and offer 1-1 support. Slack will be used to share important updates and for asking questions. Lectures/input will be recorded and made available to participants as soon as possible for anyone who needs to catch up. You will need to have an account for Zoom and Slack. We recommend that you download the clients for these rather than using the browser version. Martin will post links to the software and material for the course on the Slack workspace the week before the course starts.

Detailed Syllabus

Session 1 : Recursion and trees

In this session we will cover two very closely related concepts: trees (i.e. the various ways that we can store hierarchical data) and recursive functions (the best way to operate on treelike data). As recursion is inherently confusing, we'll start with a gentle introduction using biological examples before moving on to consider a number of core tree algorithms concerning parents, children, and common ancestors. In the practical session we'll look in detail at one particular way of identifying the last common ancestor of a group of nodes, which will give us an opportunity to explore the role of recursion. Core concepts introduced: nested lists, storing hierarchical data, recursive functions, relationship between recursion and iteration.

Session 2 : Complex data structures

In this session we will briefly recap Python's basic data structures, before looking at a couple of new data types — tuples and sets — and discussing where each should be used. We will then see how we can combine these basic types to make more complex data structures for solving specific problems. We'll finish our discussion by looking at specialized data types that are found in the Python core library. This session will also be our first introduction to benchmarking as we talk about the relative performance of different data types. In the practical session we'll learn how to parse an input file into a complex data structure which we can then use to rapidly query the data. Core concepts introduced: tuples, sets, higher-order data structures, default dicts, Counters, big-O notation.

Session 3 : Classes and objects

In this session we will introduce the core concepts of object-oriented programming, and see how the data types that we use all the time in Python are actually examples of classes. We'll take a very simple example and use it to examine how we can construct our own classes, moving from an imperative style of programming to an object-oriented style. As we do so, we'll discuss where and when object-orientation is a good idea. In the practical we will practise writing classes to solve simple biological problems and familiarize ourselves with the division of code into library and client that object-oriented programming demands. Core concepts introduced: classes, instances, methods vs. functions, self, constructors, magic methods.

Session 4 : Object-oriented programming

Following on from the previous session, we will go over some advanced ideas that are common to most object-oriented programming languages. For each idea we'll discuss the basic concept, the scenarios in which it's useful, and the details of how it works in Python. This overview will also allow us to consider the challenges involved in designing object-oriented code. In the practical we will work on a simulation which will involve multiple classes working together. Core concepts introduced: inheritance and class hierarchies, method overriding, superclasses and subclasses, polymorphism, composition, multiple inheritance.

Session 5 : Functional programming in Python

This session will start with a look at a few different concepts that are important in functional programming, culminating in a discussion of the idea of state and its role in program design. We will see how functional programming is, in many ways, the complement of object-oriented programming and how that realization informs our decision about when to use each approach. We'll take a quick tour of Python's built in tools that take advantage of functional programming and see how we can build our own. We'll finish with a brief look at how functional programming can vastly simplify the writing of parallel code. In the practical, we'll practise using Python's built in functional tools, then implement one of our own. Core concepts introduced: state and mutability, side effects, first-class functions, declarative programming, lazy evaluation, parallelism, higher-order functions.

Session 6 : Exception handling

This session will start with a reminder of the difference between syntax errors and exceptions, after which we will explore the syntax involved in catching and handling exceptions. We'll then examine the way that exceptions can be handled in multiple places and the consequences for program design. We'll finish this session by learning how we can take advantage of Python's built in exception types to signal problems in our own code, and how we can create custom exception types to deal with specific issues. In the practical we'll modify existing code to make use of exceptions. Core concepts introduced: exception classes, try/except/else/finally blocks, context managers, exception bubbling, defining and raising exceptions.

Session 7 : Performance optimization

In this session we'll learn about the various tools Python has for benchmarking code (i.e. measuring its memory and runtime performance) and for profiling code (identifying areas where improvements can be made). We'll see that different tools are useful in different scenarios, and collect a set of recommendations for improving program performance. We'll use these tools to illustrate and measure points about performance that have been made through the course. In the practical, we'll take real-life code examples, measure their performance, and try to improve it. Core concepts introduced: function profiling, line profiling, profiler overhead, timing.

Session 8 : Unit testing

In this session we will begin with a gentle introduction to testing which will illustrate why it's useful and what type of problems it can solve. We'll run through a series of examples using Python's built in testing tools which will cover a number of different testing scenarios. We'll then implement the same set of tests using the pytest testing framework and examine how using a framework makes the tests easier to write and interpret. After looking at a number of specialized tests for different types of code, we'll discuss the impact of program design on testing. In the practical we'll practise building and running test suites for existing code.

Sessions 9 and 10: Workshop time

The last two sessions are set aside to use as workshop time. We will use these sessions for a number of different purposes depending on what the students find most useful:

  1. we may go back and recap material from previous sessions
  2. we may discuss general topics that students are particularly interested in. In previous courses these have ranged from data visualization to Twitter text mining to building web interfaces for tools
  3. we may take a closer look at exercises to solutions from previous sessions
  4. students may use the time to apply what we've learned to their own datasets and research questions

Local Organiser

Rachael Munro

Registration and enquiries

Please register your interest. If you would like to book please provide a Budget Code (internal) or PO number (external).