Teaching, courses (and eventually some talks)

Important first note: All this material has a copyright holder (myself). Unless otherwise stated, all the materail is under one of the Creative Commons licenses. These licenses give you a lot of freedom to use the material, modify it, etc (read the details of the license for what exactly you can, and cannot, do). So feel free to use this material, but, please, give credit where credit is due. (To give you an example, most of the recent material is under a Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license that allows you to freely mix, modify, adapt, and even commercially use this material, but you should give proper attribution and share the modified material with the same licencse).


This is a partial list of some classes I teach at UAM, with links to the PDFs or the original LaTeX (if the LaTeX files are not available here, you can ask me for them).

The PDFs (slides and/or class notes) are always available to our students from the institutional Moodle. But you can't access those unless you are from the Universidad Autonoma de Madrid (UAM). For many courses, I provide here the PDFs or, even better, the public repositories with the original LaTeX or Rnw sources.

Applied statistics, in the course Methodology in Molecular Biosciences Research

An intro course to statistics for master’s students of the Molecular Biosciences Master. We use R to teach some basic stats (two-sample comparisons and a little bit of linear models). This is the repository for the Rnw files (LaTeX with R) and all data and scripts to produce the PDF.

Herramientas de programación para bioquímica y biología molecular (Programming tools for biochemistry and molecular biology)

A course for biochemistry and biology bachelors’ degree students. We cover Python (taught by Luis del Peso), R and used to cover a few things about the shell and essential utilities such as sed, grep, etc. This is the github repo with the material (data sets and complete Rnw and scripts to generate the PDFs) for the R part.

Bioinformática y biología molecular de sistemas, BIBCM (Bioinformatics and molecular systems biology)

A course for biochemistry bachelor’s degree students. This is a course taught by several teachers, and I cover phylogenetic inference and statistics for omics. (The PDFs provided are a bundle of three or more files; use your PDF viewer index toolbar or similar to see the full index.) The material for Statistics for omics is very similar to that in the repo for Stats-bioinfo-intro (full LaTeX/Rnw, etc, sources).

Advanced bioinformatics and systems biology

A master’s level course that touches on a variety of topics, from HMMs to statistics to phylogenetic inference. The repository for the original LaTeX files. Unfortunately, this course was last taught on 2014-15, and is unlikely to be taught again in the near future. It is being replaced by two courses (which I’ll be co-teaching, and will get their entry here eventually).

Courses about the R statistical computing system

Some history

(This is mainly for nostalgic and historical reference) I started using S-PLUS in 1996. Through the email list of the Stats department at the University of Wisconsin I got an email about a great offer to obtain a user license (and the floppies) for S-PLUS. I was planning a long field trip and wanted a reliable piece of stats software to analyze my lizard data. I went to the bookstore, looked through “Modern Applied Statistics with S-Plus”, the classic by Venables and Ripley, and after 30 minutes I was hooked (the example of the linear regression with male and female cats, and how easy the subsetting and using different symbols was, lead to love after that 30 minute skimming session). I purchased the book, the software, and spent a year and a half in Brazil enjoying S-Plus and reading the manuals (and rejoicing in thoughts such as “how hard would this have been in SAS or SPSS?“), in addition to chasing lizards.

In the fall of 1998 I took a linear models class (Stats 849?) from Douglas Bates, an R-core member. He introduced us to R, then in version 0.62.3 (and also to ESS and Emacs). R being free software, and having been bitten by the free software virus shortly before that, I started using R as much as I could instead of S-Plus (there were things that were not available — like trellis— others that did not quite work well yet, lots for which I could easily rewrite my former S-Plus scripts, and a bunch for which I tried S-Plus, R, and sometimes SAS, such as mixed-effects models —I was spending a lot of time with those by that time). By mid-2000 I was using R almost exclusively, and I submitted my first package to CRAN, PHYLOGR, around that time.

I kept using R in the first jobs I had in Spain, often to the surprise of my colleagues (some wondered why I insisted in using that unknown system instead of things such as SPSS or SAS —both of which I had use quite heavily in the past and did not particularly want to use much again). When I got to CNIO in 2001, and I returned to doing science, I was pleasantly surprised to see that the Bioconductor project, which had started shortly before, was something that some of my colleagues in bioinformatics were positively curious about.

Since then, I’ve authored/co-authored several R and BioConductor packages, some of which even got their academic papers (e.g., ADaCGH2, RJaCGH, varSelRF, etc) and have used R as the basic computational engine behind several web-based bioinfo applications (some of which we also managed to publish, such as Pomelo II, now available from Pomelo’s IIB site or SignS, now available from its IIB home, etc).

I continue using R as the language in which I do a large chunk of my overall programming and all of my statistical analysis/programming. I will use other things too (mainly C++ when I need the speed), and have been tempted by other languages (right now, the inevitable Julia and LuaJIT) but I think that R is here to stay for quite sometime and I think that the combination of R and C++ (via Rcpp) is a relatively good one.

I think I taught my first R course around 2003 at CNIO and not long after that I taught another one at UAM (organized by the “Red temática del CSIC de Bioinformática”). I’ve covered from basic programming and basic stats to specific issues of parallelization or “omics” data analysis.

Current courses

Most of the R-related courses I’ve taught recently fall into two categories:

  • General intro R programming courses. For example, the material I use for the “Herramientas de programación” bacherlor’s degree course, or the annual one-day course I teach for my frieds at CNIO. This material is available from the R-bioinfo-intro github repo.
  • Intro statistics with R. For example, the material for the “Applied statistics course” (available from the github BM-1 repo), or the very similar (except it is all command-line oriented, so no Rcommander or other GUIs) material from the R-basic-stats github repo.
  • A few other course have been much more specific, such as on debugging and parallelization, or longer and covering wider territory, such as courses that covered from programming to generalized linear mixed effects models.