Skip to content
Logo of the University of Glasgow Logo of Glasgow Polyomics

Introduction to Python for Biologists

Python Logo

Date

6th - 17th December, 3.5h per day

Venue

Zoom and Slack

Trainer

Martin Jones

Description

Python is a dynamic, readable language that is a popular platform for all types of bioinformatics work, from simple one-off scripts to large, complex software projects. This workshop is aimed at complete beginners and assumes no prior programming experience. It gives an overview of the language with an emphasis on practical problem-solving, using examples and exercises drawn from various aspects of bioinformatics work. The workshop is structured so that the parts of the language most useful for bioinformatics are introduced as early as possible, and that students can start writing plausibly-useful programs after the first few sessions. After completing the workshop, students should be in a position to -

  1. apply the skills they have learned to tackling problems in their own research
  2. continue their Python education in a self-directed way

This event will be delivered virtually via Zoom and Slack.

Feedback from recent attendees

"Fantastic course-- excellent organisation and course content. Martin is a great teacher. Learnt a lot, especially coming from a programming-naïve background.”

"This course exceeded all my expectations. Martin was a great instructor, who clearly knows how to frame any programming topic into a biology question. Now I feel very confident to keep improving my Python skills (after a couple of failed attempts with other courses in the past).”

Course fee

£500

Cancellation policy

A refund will be issued if a booking is cancelled more than one week prior to the workshop.

Who is the workshop for

This workshop is aimed at researchers and technical workers with a background in biology, but no previous programming experience. Students should have enough biological/bioinformatics background to appreciate the examples and exercise problems (i.e. they should know what a protein accession number, BLAST report, and FASTA sequence is). The syllabus has been planned with complete beginners to programming in mind, so no particular computer skills (beyond the ability to use a text editor) are necessary. If you are unsure about the suitability of this course for your needs, questions can be directed to Martin Jones.

About the trainer

Martin started his programming career by learning Perl during the course of his PhD in evolutionary biology, and started teaching other people to program soon after. Since then he has taught introductory programming to hundreds of biologists, from undergraduates to PIs, and has maintained a philosophy that programming courses must be friendly, approachable, and practical. In his academic career, Martin mixed research and teaching at the University of Edinburgh, culminating in a two year stint as Lecturer in Bioinformatics. He now runs programming courses for biological researchers as a full time freelancer.

Logistics

We will deliver the course over ten days, from Monday 6 December - Friday 17 December 2021, halfdays only. Each day there will be 3.5 hours of live input (via Zoom) from the trainer (all sessions will be held in the morning except those on wednesday which will be afternoons, and will include breaks). Training will consist of lectures, demonstrations and practical exercises, with the trainer on hand to assist and offer 1-1 support. Slack will be used to share important updates and for asking questions Lectures/input will be recorded and made available to participants as soon as possible for anyone who needs to catch up. You will need to have an account for Zoom and Slack. We recommend that you download the clients for these rather than using the browser version. Martin will post links to the software and material for the course on the Slack workspace the week before the course starts.

Detailed Syllabus

1. Introduction and manipulating text

In this session I introduce the students to Python and explain what we expect them to get out of it and how learning to program can benefit their research. I explain the format of the course and take care of any housekeeping details (like coffee breaks) and get everyone set up with the required software. We'll then run through some examples of tools for working with text and show how they work in the context of biological sequence manipulation. We also cover different types of errors and error messages, and learn how to go about fixing them methodically. Core concepts introduced: terminals, standard output, variables and naming, strings and characters, special characters, output formatting, statements, functions, methods, arguments, comments.

2. Working with files

I introduce this session by talking about the importance of files in bioinformatics pipelines and workflows, and we then explore the Python interfaces for reading from and writing to files. This involves introducing the idea of types and objects, and a bit of discussion about how Python interacts with the operating system. The practical session is spent combining the techniques from session 2 with the file IO tools to create basic file- processing scripts. Core concepts introduced: objects and classes, paths and folders, relationships between variables and values, text and binary files, newlines.

3. Lists and loops

A discussion of the limitations of the techniques learned in session 3 quickly reveals that flow control is required to write more sophisticated file-processing programs, and I introduce the concept of loops. We look at the way in which Python loops work, and how they can be used in a variety of contexts. We explore the use of loops and lists together to tackle some more difficult problems. Core concepts introduced: lists and arrays, blocks and indentation, variable scoping, iteration and the iteration interface, ranges.

4. Conditions

I use the idea of decision-making as a way to introduce conditional tests, and outline the different building-blocks of conditions before showing how conditions can be combined in an expressive way. We look at the different ways that we can use conditions to control program flow, and how we can structure conditions to keep programs readable. Core concepts introduced: Truth and falsehood, Boolean logic, identity and equality, evaluation of statements, branching.

5. Writing functions

We discuss functions that we'd like to see in Python before considering how we can add to our computational toolbox by creating our own. We examine the nuts and bolts of writing functions before looking at best-practice ways of making them usable. We also look at a couple of advanced features of Python - named arguments and defaults. Core concepts introduced: argument passing, encapsulation, data flow through a program.

6. Regular expressions

I show how a range of common problems in bioinformatics can be described in terms of pattern matching, and give an overview of Pythons regex tools. We look at the building blocks of regular expressions themselves, and learn how they are a general solution to the problem of describing patterns in strings, before practising writing some specific examples of regular expressions. Core concepts introduced: domain-specific languages, modules and namespaces.

7. Dictionaries

We discuss a few examples of key-value data and see how the problem of storing them is a common one across bioinformatics and programming in general. We learn about the syntax for dictionary creation and manipulation before talking about the situations in which dictionaries are a better fit that the data structures we have learned about thus far. Core concepts introduced: paired data types, hashing, key uniqueness, argument unpacking and tuples.

8. Working with the filesystem

We discuss the role of Python in the context of a bioinformatics workflow, and how it is often used as a language to "glue" various other components together. We then look at the Python tools for carrying out file and directory manipulation, and for running external programs - two tasks that are often necessary in order to integrate our own programs with existing ones. Core concepts introduced: processes and subprocesses, the shell and shell utilities, program return values.

9 and 10.

The schedule for the final two sessions will be set based on the progress of the course and the interests of the students. We will have time set aside for attendees to finish exercises, work on their own data, or get one-on-one help with real-world problems arising from their research. We may also use some of the time to cover more advanced topics of interest to the attendees, including BioPython, data visualisation, packaging and distributing code, and using alternative interfaces such as iPython.

Local Organiser

Rachael Munro, Glasgow Polyomics

Registration and enquiries

To register, please contact us via our enquiry form and state the title of the course you would like to attend.