Software

Unless otherwise stated all the code here is released under the GNU GPL license. Please note that THERE IS NO WARRANTY FOR ANY OF THE PROGRAMS. See section 11 of the GPL for further details.

Most of this code is now available from public repos at my github page.

OncoSimulR

A Bioconductor package for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation (as specified by, say, Oncogenetic Trees or Conjunctive Bayesian Netowrks), and order effects. Also included are functions for plotting and sampling from single or multiple realizations of the simulations, including whole-tumor and single-cell sampling, as well as functions for plotting the true phylogenetic relationships of the clones. The simulation code (in previous encarnations) has been used, for instance, in the paper "Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling". BMC Bioinformatics, 2015. And in "Cancer Progression Models And Fitness Landscapes: A Many-To-Many Relationship", Bioinformatics, in press. It is described in OncoSimulR: genetic simulation with arbitrary epistasis and mutator genes in asexual populations, Bioinformatics, 33. The development is taking place in its github repo, and the package is also listed in the Genetic Simulation Resources catalogue.

ADaCGH2

A Bioconductor package for the analysis of big data from aCGH experiments using parallel computing and ff objects. A Bioinformatics paper describing it is available here (pubmed link).

varSelRF

An R package for variable selection using random forests, targeted towards gene expression data. All of the code is also available from varSelRF's github repo. Here is the supplementary material for the paper.

RJaCGH

RJaCGH is an R package for the analysis of array CGH data using Hidden Markov Models. We incorporate distance between genes (using a non-homogeneous HMM) and do not fix in advance the nubmer of states, but rather use a full Bayesian approach using Reversible Jump MCMC. This is a package developed by Oscar Rueda and myself. The package is available from CRAN. The paper is available from PLoS Comp Biol. (An older description is available from this COBRA preprint). This package is part of Oscar Rueda's PhD thesis ("Statistical methods for the analysis of copy number alterations in the genome").

Pomelo II

Pomelo II is a major re-writte of our popular Pomelo for finding differentially expressed genes. Pomelo II uses MPI: my original C++ code has been parallelized by Edward Morrissey, and provides clickable tables and heatmaps (using our IDClight tool) in a much nicer and configurable interface written mainly by Edward Morrissey in Python and JavaScript using AJAX.

SignS

SignS is a web tool for gene selection and finding molecular signatures when we have patient survival data. We implement two very different methods, and provide additional gene information in clickable tables and dendrograms thanks to calling our IDClight application. SignS is a web interface made with Python that uses R underneath. To greatly speed up the computations, we use MPI (which takes adavantage of the 66 CPUs available on our servers).

GeneSrF

GeneSrF is a web tool for gene selection in classification problems that uses random forest. Two approaches for gene selection are used: one is targeted towards identifying small, non-redundant sets of genes that have good predictive performance. The second is a more heuristic graphical approach that can be used to identify large sets of genes (including redundant genes) related to the outcome of interest. This is a web interface (using Python) of my varSelRF package.

Tnasas: a predictor-building tool

Tnasas, which stands for "this is not a substitute for a statistician", is a tool for building predictors from microarray data. It is useful as a benchmark (it offers several well tested methods) and as pedagogical tool (against overoptimism when building predictors and ignoring several selection biases). Developed with Juanma Vaquerizas (jvaquerizas AT cnio DOT es) at CNIO, using R. The code (R with a tiny bit of C++) will soon be available under the GNU GPL.

ADaCGH2, the web tool

ADaCGH2 is a web tool for the analysis of array CGH to detect gains and losses in genomic DNA. We implement several very different approaches to display additional gene information. This is a web interface made with Python that uses R underneath and uses parallelization to speed up the computations, calling our ADaCGH2 BioConductor package.

PHYLOGR

PHYLOGR is an R package for the manipulation and analysis of phylogenetically simulated data sets and phylogenetically based analyses using GLS. You can get it from CRAN, where you can download both the source and windows binaries. Here is the github repo.

ape is another R package for phylogenetics and evolution, but there is little overlap between ape and PHYLOGR.

Bibliography management and Zotero-related stuff

The github repositories Adios_Mendeley and Zotero-to-Referey contain code I used to move from Mendeley to Zotero (Adios Mendeley) and code that I use now daily to sync my Zotero db and PDFs with my tablet (Zotero-to-Referey).




The rest of the applications listed below are either no longer working or not maintained anymore. They are left for historical reasons or as sources of the source code.


Asterias project

Asterias is (was) a set of web-based applications for the analysis of genomic and proteomic data. Currently, Asterias combines Python with R and C/C++, using MPI for parallelization, and aspires to become a standard for high-performance, distributed, web-based bioinformatics and biostatistics applications.

You can visit the former development pages at either the Bioinformatics org site or The Launchpad site.

geSignatures

geSignatures is an R package for finding molecular signatures from gene expression data, as described in the technical report Molecular signatures from gene expression data. The code is available from geSignatures github repo as both tar.gz and zip files.

Pomelo

Pomelo was a web-based tool that can be used to find differentially expressed genes. It currently implements statistical tests for two-group (via t-tests) and multigroup (via ANOVA) comparisons, regression analysis, survival data (gene-wise Cox model) and contingency tables (using Fisher's exact test). We allow control of the Family Wise Error Rate (using the maxT approach) and the False Discovery Rate. This is also superseded by Pomelo2.

FatiGO

FatiGO was one of the earliest tools to examine whether groups of genes are enriched in certain Gene Ontology terms. We used Fisher's exact test for contingency table with adjustments for multiple testing.

DNMAD

DNMAD was a tool for the diagnosis and normalization of microarray data. It was a web server for cDNA microarrays normalization and diagnosis, developed with together with Juanma Vaquerizas (jvaquerizas AT cnio DOT es).

BehHP48

These are a set of programs in RPL ---reverse polish lisp--- to use the HP 48 calculator as a handheld computer to record behavioral data, and help in the execution of a behavioral experiment. Included are some utility functions in C++ for the processing and cleaning of the output.

I used this code heavily for recording lizard behavior. You can see more details (all the code, documentation, etc) in the BehHP48 github repository.This software is released under the GNU GPL.

Genetic algorithms, evolutionarily stable strategies, and the loser/winner effects

(A long time ago, around the year 2000) I spent some time working on the loser/winner effects. This is a problem that requires game theory (what is a good strategy depends on what your opponents do) but I could not find simple analytical solutions. So I used genetic algorithms, which seems natural enough here since we are dealing with the evolution of behavioral strategies.

There are several libraries for genetic algorithms. I started using galib, a very nice library. However, I found it hard to use of it for my problem, where fitness is the result of repeated interactions between the genotypes (and not something you evaluate in a sweep over the population at each generation); this is doable with galib, but I found it hard and awkward. Thus, to learn more C++ and to have more control, I wrote a set classes and methods for genetic algorithms (ga.cpp, ga.h). And the code for the loser/winner part (fighting.cpp), plus a few helper functions, etc.

Please note that the documentation is non-existent (you'll need to read the comments), that there are a few comments in Spanish and spanglish, and that indentation and line width are "peculiar" (set to fit my monitor and usage of XEmacs). Its been a while since I worked on these issues. But I'd appreciate that, if you use this code, you let me know.

To run it you will need to install Blitz++ and libRmath, the stand-alone math library from R. If you use Debian GNU/Linux, this is as easy as:

apt-get install blitz
apt-get install r-mathlib

I've also run it with other GNU/Linux distributions.

Code from gaLW's github repo. This software is released under the GNU GPL.


Last modified: April 2019