Python for beginners
The course will guide you to setup a development environment using VS Code together with Gitlab and creating a personal Git repository for the course. Afterwards, it will provide Python programming concepts, covering variables, functions, loops, lists, dictionaries, among others. During the process we will show you tools to help you keep the code clean and to enforce proper Python syntax and styling. Later, we will cover the basics of file handling in Python, gaining the ability to read/write files and to manipulate data. Finally, we will visualize data by using the packages matplotlib and seaborn by using practical examples.
Complete cluster course - 250213
Welcome to the "Complete Cluster Course"!
This course will
consolidate material presented in the beginner cluster course and expand
on the concepts to be aware of when trying to optimize use of the
cluster. The main message of the course is to embrace the
parallelism available within the cluster and that pipelines should be
made from lots of small independent pieces that are spread throughout
the cluster rather than large monolithic long jobs that run on a single
node. The course will show why this should be done and how to achieve
it. Topics that are going to be addressed:
Video tour of the data centre
What is a cluster
Logging in
Queuing / the scheduler
What resource are available at the CRG cluster
Simple batch scripts - directives
Troubleshooting - what happened to my jobs?
Interactive sessions
Supercomputers, beowulf clusters, horizontal v vertical scaling
Hardware considerations
Multithreaded jobs, parallelism, Amdahl's Law
Job arrays
Job dependencies
Building a pipeline
Storage issues, treemap
Job stats, resource estimation
Scaling analysis
Cluster conversion course - 250221
This course will give a quick revision of essential general concepts for using a cluster well followed by specific examples utilising slurm to run jobs on the cluster.
Linux terminal for beginners 250210
This course provides an introduction to using the linux terminal. It is suitable for complete beginners who have never used the command line before.
NGS courses - Babraham Institute
This package includes a set of short courses:
Quality control in Sequencing Experiments
Analysing Mapped Sequence Data with SeqMonk
RNA-Seq Analysis
10X Single Cell RNA-Seq Analysis
ChIP-Seq Analysis
Analysing bisulfite methylation sequence data
Complete R courses - Babraham Institute
Course 1. Introduction to Tidyverse
R is a
popular language and environment that allows powerful and fast manipulation of
data, offering many statistical and graphical options. This course aims to
introduce R as a tool for statistics and graphics, with the main aim being to
become comfortable with the R environment. As well as introducing core R
language concepts this course also provides the basics of using the Tidyverse
for data maniupulation, and ggplot for plotting. It will focus on entering and
manipulating data in R and producing simple graphs. A few functions for basic
statistics will be briefly introduced, but statistical functions will not be
covered in detail.
Course 2. Advanced Tidyverse
The 'Tidyverse' is a set of add-in R packages for data loading, modelling, manipulation and plotting. It is an attempt to make data analysis and plotting cleaner, simpler and more consistent by addressing some poor design decisions in the original language. This course follows on from our Introduction to R with tidyverse and focusses on the manipulation and restructuring of data using the tidyverse packages. The course shows how to do complex transformations on large data structures and how to deal efficiently with data which is both large and sometimes not well behaved.
Course 3. Introduction to ggplot This course is normally taught as part of the R with Tidyverse bootcamp. Ggplot is the most popular plotting extension to R and replicates many of the graph types found in the core plotting libraries. This course provides an introduction to the ggplot2 libraries and gives a practical guide for how to use these to create different types of graphs.Course 4. Introduction to Core RR is a popular language and environment that allows powerful and fast manipulation of data, offering many statistical and graphical options. This course aims to introduce R as a tool for statistics and graphics, with the main aim being to become comfortable with the R environment. It will focus on entering and manipulating data in R and producing simple graphs. A few functions for basic statistics will be briefly introduced, but statistical functions will not be covered in detail.Course 5. Advanced Core RThis course follows on from the introductory course. It goes into more detail on practical guides to filtering and combining complex data sets. It also looks at other core R concepts such as looping with apply statements and using packages. Finally, it looks at how to document your R analyses and generate complete analysis reports.Course 6. Plotting complex figures with Core RThis course is a comprehensive guide to the use of the built-in R plotting functionality to construct everything from customised simple plots to complex multi-layered figures. It follows on from the material in our introductory R course and participants are expected to have a basic understanding of R - enough to load and do basic manipulation of datasets.Course 7. Introduction to ShinyShiny is an R package that enables interactive web applications to be built using R. They are a great way of allowing users to explore a dataset and make use of the graphical and statistical functionality of R without having to write any code.Course 8. Using R Notebooks This course is designed for people who are already familiar with R and are ready for a more integrated way to perform and report their analyses. It will show the use of R Notebooks for interactive analysis and then demonstrate how to apply this to the production of complete reports.Course 9. Writing R PackagesR packages are the best way to create robust re-usable code, either for internal use or for sharing with the wider community. In this course we will look at how to write functions which are robust for use by others. We will then go through the process of authoring function based R packages with the help of the recommended development tools.Course 10. Using git and GitHub with RStudioRStudio has embedded tools to facilitate the use of git with RProjects. This short course explores this functionality.
Python, Perl, Unix, ML - Babraham Institute
Python, Perl, Unix, ML are considered core bionformatic skills. Here we provide a package for learning these skills.
Course 1. Python.
Part1. Introduction to Python. Python has
established itself as one of the most commonly used programming languages. It
is a very powerful language, which makes it relatively easy to write programs
from simple automation scripts to more fully featured applications. In
bioinformatics python has become widely used both as a language to write
scripts and applications, but also, via packages like pandas, numpy and seaborn
as an environment for data analysis, competing with more focussed languages
such as R. In this course we focus on the use of python to develop simple
scripts and larger applications. These can be used for simple data processing
and aggreagation, for automating repeated tasks or to write larger user-facing command
line programs. We start from the ground up, and make no assumption of any
previous programming experience.
Part 2. Advanced Python. This course builds on the basic features of Python3 introduced in the Introduction to Python course. At the end of this course you should be able to write moderately complicated programs, and be aware of additional resources and wider capabilities of the language to undertake more substantial projects. The course tries to provide a grounding in the basic theory you'll need to write programs in any language as well as an appreciation of the right way to do things in Python.
Part 3. Python: Object Oriented Programming. A strength of Python and a feature that makes this language attractive to so many, is that Python is what is known as an object-oriented programming language (OOP). This is a short course that introduces the basic concepts of OOP. It then goes into more detail explaining how to build and manipulate objects. While this course does not provide an exhaustive discussion of OOP in Python, by the end of the course attendees should be able to build sophisticated objects to aid analysis and research.
Course 2. Introduction to Unix. Increasing amounts of bioinformatics work is done in a command line unix environment. Most large scale processing applications are written for unix and most large scale compute environments are also based on this. This course provides an introduction to the concepts of unix and provides a practical introduction to working in this environment. Internally we link this course to a more specific course illustrating the use of our internal cluster environment and this part of the course could be adapted for other sites with different compute infrastructure.
Course 3. Learning to Program with Perl. For a long time, Perl has been a popular language among those starting out with programming. Although it is a powerful language, many of its features make it especially suited to first time programmers as it reduces the complexity found in many other languages. Perl is also one of the world's most popular languages which means there are a huge number of resources available to anyone setting out to learn it. This course aims to introduce the basic features of the Perl language. At the end you should have everything you need to write moderately complicated programs, and enough pointers to other resources to get you started on bigger projects. The course tries to provide a grounding in the basic theory you'll need to write programs in any language, as well as an appreciation for the right way to do things in Perl.
Course 4. Introduction to Machine Learning. This course provides a theoretical and practical introduction to the use of machine learning on biological datasets. For the final section of the course we will introduce the tidymodels framework for machine learning in R, so it will be helpful to have attended our introductory and advanced R courses, or to have had equivalent experience, although this is not a prerequisite to attend the course.
True Image Deconvolution, Restoration and Analysis Workshop 250313
If you are interested in producing high-quality microscopy images and
obtaining reliable analysis results, this workshop may be of interest
to you. Topics covered will include diffraction, acquisition pitfalls,
spherical aberration, photon noise, Point Spread Function,
Nyquist-Shannon Sampling Rate, Image Quality Control, crosstalk,
(colocalization) analysis, and deconvolution. In addition, the Huygens
Software will be demo-ed.
This course is valid from March 2025 til March 2026
An Introduction to Proteomics - Babraham Institute
This course
provides an introduction to the methods, data and analysis of quantitative
proteomics data. It goes through the background of how the data is acquired and
quantitated and the process of searching the spectra against reference
databases to identify them at the spectrum, peptide and protein level. We look
at quality control of search results to identify problems.
Data
analysis is run using the MSstats package, both via the friendly Shiny
interface, and then in more detail using R. Whilst there are no strict
pre-requisites for this course, a familiatity with R and ggplot would be very
helpful.
An Introduction to Mathematical Modelling - Babraham Institute
This course
was developed in collaboration with the Le Novère lab at The Babraham
Institute. The course is not currently running and is not supported, but we are
leaving course materials here for reference.
It provides
an introduction to the concepts of modelling biological systems. It is intended
for biologists who have no experience in modelling but would like to know how
it might apply to their area of research. The course provides a complete
background to the history of modelling and the different approaches through
which a biological system can be approximated by mathematical methods. The
course also provides a practical introduction to the COPASI modelling environment.
An Introduction to Biological Big Data - Babraham Institute
This couse
provides both a biological and technical introduction to Biological Big Data.
It is divided into three, day-long sessions where participants learn about the
available big data resources, what they mean, and how to use them. There
are extensive practicals to give time for people to familiarise themselves with
the sites they are shown.
Learning Vim (CRG Staff only) 2025
This course introduces vim and provides
resources for jump-starting your vim journey to learn the motions and to start
customising your environment.
Linux containers 2025
This course is designed to teach the basics of everyday Linux Containers
usage. Participants will learn what Linux Containers are and why they
are relevant to today's scientific practice. They will learn hands-on
Docker, the most popular container technology, and by the end of the
course, they should be able to build simple container images by
themselves. They will also be introduced to Singularity/Apptainer, a
more suitable container software for HPC environments.
Your first Nextflow pipeline 2025
Learn how to write a Nextflow pipeline from scratch
Join our beginner-friendly course on Nextflow, the powerful workflow
management system for scalable and reproducible data analysis. This
course covers the fundamentals of Nextflow, from writing your first
pipeline to running it efficiently. You’ll learn how to be reproducible,
using containers and how to automate complex analyses, and optimize
workflows for high-performance computing (HPC).
Introduction to Nextflow 2025
The aim of this course is to give a general overview on Nextflow, focusing on the execution, configuration and deployment of local and publicly available pipelines.
Workflows for reproducible research 250317
This course is designed to teach the fundamental concepts and practical
guidelines for ensuring that everyday data generation and management
tasks fit into reproducible scientific workflows. The course emphasizes
the importance of open data formats, and recommends using Markdown for
documentation. Participants will also learn how to use Gitlab and
Github, two data collaboration platforms, for tracking and managing data
and documentation across different interfaces such as command line,
IDEs and web browser. Git's underlying version control capabilities will
be covered in detail during the hands-on sessions.
Quick introduction to programming in R 250327
This course aims to provide basic notions of R programming to people that have NEVER worked with R and that want to learn how to use it for data analysis and visualization.
The Introduction to R course starts from the very basics of R language, all the way through learning how to create scripts, read and write files, manipulate different data structures and plot the results, which will allow you to learn how to do some basic analysis and visualization of your own data by the end of the course. In this course we will combine explanations and examples with lots of hands-on that will allow you to get familiar with basic programming concepts and explore the different possibilities that R offers.
Intermediate R course: Data handling and visualization with tidyverse and ggplot2 2025
This course aims to provide intermediate notions of R programming to
people that already have basic experience with R. We will learn how to
efficiently manipulate different data structures, compute basic summary
statistics and visualize the results. This will allow you to learn how
to do some basic analysis and visualization of your own data by the end
of the course.
In this course we will combine explanations and examples with lots of
hands-on that will allow you to get familiar with basic programming
concepts and explore the different possibilities that R offers.
Introduction to Python 250310
The course will guide you to setup a development environment using VS Code together with Gitlab and creating a personal Git repository for the course. Afterwards, it will provide Python programming concepts, covering variables, functions, loops, lists, dictionaries, among others. During the process we will show you tools to help you keep the code clean and to enforce proper Python syntax and styling. Later, we will cover the basics of file handling in Python, gaining the ability to read/write files and to manipulate data. Finally, we will visualize data by using the packages matplotlib and seaborn by using practical examples.
Basics of Biostatistics 2025
The course will
give an overview of important concepts and methods used to analyse “Biomedical
data”. The emphasis will be on the understanding of statistical concepts and
their interpretation in a research framework. After a general introduction on
probability theory and statistical inference, an emphasis will be made on common
statistical methods. Particular cases will be used as illustrative examples.