Unless otherwise stated all the code here is released under the GNU GPL license (v.2 or v.3). Please note that THERE IS NO WARRANTY FOR ANY OF THE PROGRAMS. See section 11 of the GPL for further details.
Most of this code is now available from public repos at my github page.
A Bioconductor package, written in R and C++, for forward population genetic simulation in asexual populations, with special focus on cancer progression. It is described in OncoSimulR: genetic simulation with arbitrary epistasis and mutator genes in asexual populations, Bioinformatics, 33. The development is taking place in its github repo, and the package is also listed in the Genetic Simulation Resources catalogue.
An R package for variable selection using random forests, targeted towards gene expression data. All of the code is also available from varSelRF's github repo. Here is the supplementary material for the paper.
RJaCGH is an R package for the analysis of array CGH data with Hidden Markov Models; we use a full Bayesian approach using Reversible Jump MCMC. This is a package developed by Oscar Rueda and myself. The package is available from CRAN. The paper is available from PLoS Comp Biol. This package is part of Oscar Rueda's PhD thesis ("Statistical methods for the analysis of copy number alterations in the genome").
Pomelo II is a major re-writte of our popular Pomelo for finding differentially expressed genes. Pomelo II uses MPI: my original C++ code has been parallelized by Edward Morrissey, and provides clickable tables and heatmaps (using our IDClight tool) in a much nicer and configurable interface written mainly by Edward Morrissey .
SignS is a web tool for gene selection and finding molecular signatures when we have patient survival data. We implement two very different methods, and provide additional gene information in clickable tables and dendrograms thanks to calling our IDClight application. SignS is a web interface made with Python that uses R underneath (to speed up the computations, we use MPI).
GeneSrF is a web tool for gene selection in classification problems that uses random forest. Two approaches for gene selection are used: one is targeted towards identifying small, non-redundant sets of genes that have good predictive performance. The second is a more heuristic graphical approach that can be used to identify large sets of genes (including redundant genes) related to the outcome of interest. This is a web interface (using Python) to my varSelRF package.
Tnasas: a predictor-building tool
Tnasas, which stands for "this is not a substitute for a statistician", is a tool for building predictors from microarray data. It is useful as a benchmark (it offers several well tested methods) and as pedagogical tool (against overoptimism when building predictors and ignoring several selection biases). Developed with Juanma Vaquerizas , formerly at CNIO, using R. The code (R with a tiny bit of C++) is available from its github repo.
ADaCGH2, the web tool
ADaCGH2 is a web tool for the analysis of array CGH to detect gains and losses in genomic DNA. We implement several very different approaches to display additional gene information. This is a web interface made with Python that uses R underneath and uses parallelization to speed up the computations, calling our ADaCGH2 BioConductor package.
PHYLOGR is an R package for the manipulation and analysis of phylogenetically simulated data sets and phylogenetically based analyses using GLS. You can get it from CRAN, where you can download both the source and windows binaries. Here is the github repo.
Bibliography management and Zotero-related stuff
The github repositories Adios_Mendeley and Zotero-to-Referey contain code I used to move from Mendeley to Zotero (Adios Mendeley) and code that I use now daily to sync my Zotero db and PDFs with my tablet (Zotero-to-Referey).
The rest of the applications listed below are either no longer working or not maintained anymore. They are left for historical reasons or as sources of the source code.
Asterias is (was) a set of web-based applications for the analysis of genomic and proteomic data. Asterias combines Python with R and C/C++, using MPI for parallelization.
geSignatures is an R package for finding molecular signatures from gene expression data, as described in the technical report Molecular signatures from gene expression data. The code is available from geSignatures github repo .
Pomelo was a web-based tool that can be used to find differentially expressed genes. It currently implements statistical tests for two-group (via t-tests) and multigroup (via ANOVA) comparisons, regression analysis, survival data (gene-wise Cox model) and contingency tables (using Fisher's exact test). We allow control of the Family Wise Error Rate (using the maxT approach) and the False Discovery Rate. This is also superseded by Pomelo2.
FatiGO was one of the earliest tools to examine whether groups of genes are enriched in certain Gene Ontology terms. We used Fisher's exact test for contingency table with adjustments for multiple testing.
DNMAD was a tool for the diagnosis and normalization of microarray data. It was a web server for cDNA microarrays normalization and diagnosis, developed with together with Juanma Vaquerizas (jvaquerizas AT cnio DOT es).
These are a set of programs in RPL ---reverse polish lisp--- to use the HP 48 calculator as a handheld computer to record behavioral data, and help in the execution of a behavioral experiment. Included are some utility functions in C++ for the processing and cleaning of the output.
I used this code heavily for recording lizard behavior (for example, for the paper Diaz-Uriarte, R. 1999. Anti-predator behaviour changes following an aggressive encounter in the lizard Tropidurus hispidus. Proceedings of the Royal Society of London, Series B, 266: 2457-2464.PDF). Code, documentation, etc, in the BehHP48 github repository.This software is released under the GNU GPL.
Genetic algorithms, evolutionarily stable strategies, and the loser/winner effects
(A long time ago, around the year 2000) I spent some time working on the loser/winner effects. This is a problem that requires game theory (what is a good strategy depends on what your opponents do) but I could not find simple analytical solutions. So I used genetic algorithms, which seems natural enough here since we are dealing with the evolution of behavioral strategies.
There are several libraries for genetic algorithms. I started using galib, a very nice library. However, I found it hard to use of it for my problem, where fitness is the result of repeated interactions between the genotypes (and not something you evaluate in a sweep over the population at each generation); this is doable with galib, but I found it hard and awkward. Thus, to learn more C++ and to have more control, I wrote a set classes and methods for genetic algorithms (ga.cpp, ga.h). And the code for the loser/winner part (fighting.cpp), plus a few helper functions, etc.
Please note that the documentation is non-existent (you'll need to read the comments), that there are a few comments in Spanish and spanglish, and that indentation and line width are "peculiar" (set to fit my monitor and usage of XEmacs at that time). Its been a while since I worked on these issues. But I'd appreciate that, if you use this code, you let me know.
Last modified: April 2019