Zotero, Mendeley, a tablet, et al.

Note: As of July 2015 I am no longer using Mendeley, and some of this information no longer reflects my current workflow. I have moved to Zotero, and my workflow includes Zotero, bibtex for citations from Emacs using helm-bibtex and access in tablets, and also automated extraction of PDF comments and annotations. All of it is explained in https://github.com/rdiaz02/Adios_Mendeley, where I provide code and links for moving from Mendeley to Zotero while preserving annotations, comments, and folder/collection structure.

(Ramon Diaz-Uriarte, 2012-12-06. Minor updates on 2013-04-20. More updates (org-mode) related on 2014-08-11 and 2014-09-09, 2014-09-10. Another update about extracting annotations on 2015-02-02. Another on 2015-06-18: from 2015-03 no longer using Dropbox, and switched to Syncthing.)

1 Introduction

I recently purchased a tablet for the specific purpose of reading and annotating papers and books for work. Here I explain, for posterity, what and why I did it, so that others might benefit from it, and so that I can remember it later on.

The objective: to be able to search for, open, read and take notes in the tablet, and to have those notes in the PDF available in my other machines. Just that. It might be marginally neat to be able to add references I find while navigating (e.g., a URL from a PDF) but … not that crucial since I rarely have internet access when reading from the tablet (and I find browsing from the tablet painful, anyway).

(Please, beware: these are quick notes. They would benefit from editing, removing redundant and poorly written stuff, etc. I just don't have time now, and need to get back to reading all those papers on my tablet ;-) ).

Update (2013-04-20): last night I found this xcorr blog entry, by Patrick Mineault, which deals with basically the same issues. Please do take a look a it, for a different perspective.

2 Hardware

An iPad was not an option for me and neither was a Windows-based machine. I ended up purchasing an Asus Transformer Prime TF201, which is a 10" tablet. I got the model without keyboard, but with 64 GB of storage, which should be plenty for lots of references, and in fact possibly duplicated or triplicated copies of references for different programs.

Why 10"? For me, 7" is not large enough for reading papers comfortably, specially if they have figures, tables, and lots of equations. 9" would have been perfect, I think, but the available hardware did not convince me (e.g., limited memory and/or expandibility, poor viewing in sun light, etc). The new Amazon machines seem really nice, but I do not want to be tied by Amazon's crippled Android (which itself is a crippled OS).

Among the 10", this machine had consistently the great reviews in terms of how visible the screen is under very sunny conditions. That is important when commuting if you take long train rides. It also has Gorilla Corning glass, which is good. Since I want the device just to read papers I do not care about cameras, GPSs, etc, etc, and none of the rest of the specs of competing products were significantly better in a way that would offset the visibility under strong light. One thing, though, might be the wifi, for all the syncing of papers (the Amazon machines seem to have great wifi, but I do not think that true Androids, such as Samsung's, are significantly better here).

3 Why not "the usual e-reader"?

I found them tempting: easier on the eyes, longer lasting battery, probably lighter and more robust. But they fail for me on the following counts:

  • Notes taken on PDFs in the reader are not synchronizable to other machines. E.g: from

    https://bukvova.wordpress.com/2010/09/13/kindle-3-and-academic-reading/

    "The highlighted text and notes are not embedded into the PDF but stored in a separate TXT file."

    or

    "I haven’t found a good way to extract highlights so I actually stopped highlighting in the PDFs and started making notes about the PDF. "



These are another few links, recent as of 2012-11:

http://www.the-digital-reader.com/2012/02/24/new-9-7-ereader-launching-soon-in-europe/#.UK9ho3bf5w8

http://christopherteh.com/blog/2010/12/ebook-readers/

http://ask.metafilter.com/201184/Ebook-reader-with-good-PDF-support

4 Mendeley and Zotero

4.1 Different systems for reference management

The system chosen must work under GNU/Linux and with a tablet. That excludes a lot of the available options. Anyway, here are a few links with comparisons and overviews of some of the main reference management systems:

The Wikipedia comparison table.

http://www.techsupportalert.com/content/best-free-bibliographic-database-software-stub-only.htm

http://libguides.bodleian.ox.ac.uk/content.php?pid=294548&sid=2418329

http://works.bepress.com/anne_rauh/12

http://www.jbs.cam.ac.uk/infolib/2012/10/29/zotero-versus-mendeley-2/

(Note that, contrary to what some of the above links claim, both Mendeley and Zotero work just fine if you use LaTeX, since both produce BibTex).

Among the ones that work under Linux, "I, Librarian" (http://www.bioinformatics.org/librarian) displays and annotates PDFs as images, so it will not work for me. Docear seems a very different thing with a different purpose (I am not even sure it can annotate PDFs) and I was not able to understand how it works (I find mind maps confusing, for instance). Colwiz, as of now, does not seem to be available as something an individual user can install (just try setting up an account, and you'll see what email you get back unless, I guess, your University has an account there or something).

There are several approaches for combining just plain bibtex with other methods, for instance org mode in Emacs (and I am using org mode and Emacs to write this):

http://thread.gmane.org/gmane.emacs.orgmode/90567

http://thread.gmane.org/gmane.emacs.orgmode/78983

http://thread.gmane.org/gmane.emacs.orgmode/57095

http://blog.nguyenvq.com/2011/07/24/research-paper-management-or-library-with-emacs/

http://tincman.wordpress.com/2011/01/04/research-paper-management-with-emacs-org-mode-and-reftex/

https://github.com/novoid/extract_pdf_annotations_to_orgmode

This is all very tempting. But they seem to require much more manual input than I want. In particular, I like the (semi)automated extraction of info from PDF's metadata like done by Mendeley and Zotero and how things go into directories that are created automatically, and then I can easily manually add stuff to those directories (files with source code, compressed files stuffed with abominable excel files with supplementary data, etc, etc).

In summary, for now I am limited to Mendeley and Zotero.

4.2 Mendeley and Zotero: summary

I've been using Mendeley for a few years now but it was because I could not get myself to use Zotero the way I wanted, mainly the naming of directories with random strings and the lack, at that time, of decent software, under Linux, to annotate PDFs (not anymore: see PDF annotations in GNU/Linux). In this answer to a question in the org mode mailing list, from March 2011, I describe my Mendeley configuration in some detail.

But I've always felt uncomfortable using Mendeley as it is not free software. I also dislike a few other things a lot. For instance, the directory naming of Mendeley is very inflexible; compare that to ZotFile —I said ZotFile, not Zotero itself, which to me is even worse, though of course you can use find and links to locate exactly all of your PDFs, but that ain't of much help often. And the PDF editor/viewer of Mendeley, in its work as a viewer, is a lot worse than, say, Okular, or Evince, or MuPDF (compare, for instance, keyboard shortcut availability, or the option of opening an arbitrary number of PDFs at the same time).

So I had thought about moving to Zotero as I had to implement changes to use the tablet and now annotating PDFs in Linux is doable (see PDF annotations in GNU/Linux). However, I will not do right now, though it will likely happen soon. For me the main problem with Zotero right now is that Zandy, the "Zotero for android", will not allow me to directly access PDFs I've stored in the tablet (I do not use WebDAV nor do I plan on storing my PDFs in Zotero's servers). That is a show stopper for me. In contrast, Referey does work just fine getting my PDFs from the tablet. There are other pros and cons of each. We cover them below in detail. The following lists some of my major issues:

Zotero (and Zandy)Mendeley (and Referey)
No way to open a PDF in tabletPiece of cake to open PDFs in tablet
(unless WebDAV or Zotero storage)
Can not search notesSearches notes
Can add notesCan not add notes
Easier org mode integration (?)
Much easier file handling and
syncing to tablet

5 Mendely and tablets

5.1 Mendeley apps for Android

There are at least three Android apps to access and display Mendeley records. As it is well know from the Mendeley forum (google for Mendeley Android), Mendeley apparently has no intention to develop an "official" Mendeley app for Android (which of course annoys many people, etc, etc).

  • Droideley: apparently not actively developed now. t seems PDFs can now (v. 0.45) only be downloaded. A show stopper for me.
  • Scholarley. It does not allow changes in the metadate entries to be synced back, but it might in the future (and it seems the future is now, as of April 2013). PDFs can only be accessed if downloaded from Mendeley's server (if you store them there, of course). The last is also a show stopper for me, even if Scholarly is a very nice app with a very responsive and active developer. Changes in the PDF are, so far, not yet uploaded.
  • Referey. It does allow you to access PDFs stored in the tablet, and have those PDFs synced or transferred any way you want. This is what I use and for me it works just perfectly.

    Note that Referey's searches will include notes, and the annotations to the PDFs if those are in Mendeley's db —but, of course, it will not search in the annotations inside the PDF themselves. (For this, a hack can be used with some of the greps for android, or using Emacs for android, etc, if you also store the annotations inside the PDFs in a separate file, as explained in Extracting comments from PDFs).

    What is the only thing I do not like about Referey? That I cannot make changes to the entry (not the PDF) notes or comments, and have them synced back. (This is something that can be done with Zotero/Zandy).

5.2 Exporting all the annotations in PDFs from Mendeley

Mendeley does not store the annotations to a PDF inside the PDF itself. If you want to see them with another program (say, Okular, or one in your tablet) you must have exported those annotations before. Unfortunately, it is not possible to have Mendeley automatically do that for you (I could not do it, then I asked: http://support.mendeley.com/customer/portal/questions/638600-export-all-pdfs-that-have-annotations-in-some-folder and was told it was not possible).

Alternative: you do it manually, after locating which among your potentially thousands of PDFs are annotated. I got the sqlite database, opened it with sqlitebrowser, and exported two tables, FileNotes and Documents. And then I run a simple R program that lists the PDFs that have annotations.

## filenotes.txt is table FileNotes
## docu.txt is table Documents

fn <- read.csv(file = "filenotes.txt")
docu <- read.csv(file = "docu.txt")
## find out which PDFs I'd need to export with annotations
uniqueide <- unique(fn$documentId)
## then get title, etc. from docu$id
sort(docu$title[which(docu$id %in% uniqueide)])

"All" that remains to be done is to open each one of those from Mendeley, export the PDF with annotations, and save it (most likely in the same directory where you have the original PDF.) You did not know what to do with that lazy afternoon, right? Now here is something to keep you busy.

5.3 Annotations and notes, from now on

So this is my new modus operandi:

  • PDF annotations in the PDF itself. In the computers, use either Okular (being careful to save things as PDF) or something like PDF-XChange Viewer (via wine) (see here). In the tablet, use EzPDF or alternatives (see below).
  • Reference notes are stored in Mendeley, as usual. They are easily searched from within Mendeley and from Referey, and are exported in the bibtex file that Mendeley generates (so getting them into Zotero is immediate).

5.4 Moving from Mendeley to Zotero

I asked precisely this question in the Zotero list. The main issue is getting the folder/Collection structure.

http://forums.zotero.org/discussion/26453/moving-from-mendeley-to-zotero/

One approach is getting the folder info the sqlite file in Mendeley, adding that to the bibtex keywords entry with a recognizable tag (say "Col"), and using that in Zotero. All the info about the folders in the Mendely records is easily accessible from the sqlite file (see comment 7 in the above thread). And if one is sqlite-illiterate —like myself— one can easily export that to a text file using, say, the sqlitebrowser program. So it is doable, but takes some time.

Why is that the only remaining serious issue? Because the "notes" in a Mendeley entry — I mean the notes of the entry, not the PDF annotations in individual PDF files— are exported in the bibtex file from Mendeley, so they will be part of the Zotero record too when you import. And if you use annotations within PDFs, those will of course be part of the PDF (and can be part of the Zotero record via ZotFile's annotation extraction). So these two are not real issues.

6 Zotero, ZotFile, Zandy, tablets

Update (2013-04-20). I have not had a chance to give it a try, and so far I am reasonably happy with my Referey + Mendeley + Dropsync setup, but Patrick Mineault is developing a "Zotero Reader for tablet", which might make all this a lot easier. See his blog post.

These are some (almost random) notes about modes of operation. I found some things confusing (might be just me). These notes, of course, are likely to become obsolete.

(You might wonder: why did I try so many things with Zandy and ZotFile if I was going to use Referey? Great question! Actually, it took a while for me to realize Zandy would not work with PDFs on the tablet, so instead of really thinking things through, I was busy trying to figure out how to get ZotFile to work to my liking. Oh well, I think this is called tunnel vision or similar ;-). And of course, I really wanted to use Zotero.)

6.1 Zandy (and Zandy and ZotFile)

  • What if I use Zandy AND I also keep all of my PDFs on the tablet using Dropbox?

    For this to work nicely we would need two things at least:

    • The screen in the tablet should be able to show at least two different apps at once (something like splitting, as any reasonable tiling window manager would do). But that does not seem possible with Android for now.
    • The possibility of having links so that one can create virtual (or smart) folders. But creating links is not possible with Android (unless you root the device, and even then I am not even sure it would always work).

    If you have both, then you can always find your way around your refs, or you can search for a ref. via Zandy, and then open it in your (virtual) directory that you created with links. Since Android looks like a crippled and castrated Unix … we are out of luck. Regardless, this is still a lot less nice than just using Referey: click on the reference, and there you have the PDF.

  • Zandy allows one to add new notes (again, to the entry, not the PDF of course), and they will be synced back. This is nice.
  • Zandy and notes: editing pre-existing notes does not seem to sync them back correctly; they seem to get lost.
  • Zandy does not search inside notes (I do not mean the notes in the PDF, but the notes of the entry). Referey does.

6.2 ZotFile modes of operation

6.2.1 Two possible modes of operation using ZotFile

Given the above, I thought about this. I only tried them in small toy examples. They are here so that I know what to try in the future. None are optimal, though:

  • Use ZotFile by saving the files as linked from Zotero
    • (I can avoid renaming of all files —say so in advanced prefs., file types, and do not select "Automatically rename new attach").
    • CON: Files left behind when deleted. Known issue http://forums.zotero.org/discussion/20148/deleting-attachment-notes/
    • CON No way to later make them synced files.
    • How to do it incrementally? Just whenever a file is added, "Rename attachment".
    • What if there are two or more files with the same name?
    • Syncing issues from multiple computers? Seems OK
  • Use ZotFile by using the file stored in Zotero
    • Need to explicitly do a get from tablet
    • Whenever I add a ref, export to tablet (same as below)
    • Whenever I get from tablet, re-export Note that Zotfile can create the virtual folders "Tablet files" and "Tablet files (modified)", which do help.
    • What if there are two or more files with the same name?
    • Does it work if syncing several computers? Seems problematic

    But: the above are moot points if you plan on using Zandy. Zandy and ZotFile just do not cooperate (at least for now; look in the Zotero forums and Zandy ticktes and you'll see comments about this.)

7 Editing and viewing PDFs on the tablet

I tried EzPDF, Mantano, iAnnotate, Quiqqa, qPDF, RepliGO, and the Adobe reader that came incorporated. The last one is not that good for flexible and fast note-taking; so forget it (though the newest versions ---as of 2013-04--- do offer some improvements).

RepliGO, Quiqqa, iAnnotate are very slow with large files, or papers with figures and equations. Mantano is very fast, and with an intuitive interface, but annotations are not saved within the PDF (see also http://www.mobileread.com/forums/member.php?u=115636).

qPDF is not as fast as Mantano. About the same as EzPDF, but I do not like the interface, and there is no indication of bookmarks. I cannot test note taking, as I'd have to pay, and so far nothing suggests this will be better than EzPDF, which has much better reviews. Foxit has terrible reviews in google play.

Updates (2013-04-20).

RepliGO and ezPDF are interesting cases. Patrick Mineault, in this blog entry, has exactly the opposite experience: for him, RepliGO is much faster than ezPDF. I checked again in 2013-04-20, and in my tablet ezPDF (v. 2.1.2.1) is a lot faster than RepliGO (v. 4.2.4) both on papers with equations and figures, and in books with many (say > 500) pages. In fact, there are books that I would not bother to open with RepliGO. However, in terms of general GUI feeling and the usage of annotations, I like RepliGO better.

If you look around, some people seem to report that RepliGO is a lot faster than ezPDF and some people seem to report the opposite. It might also turn out that this depends on the hardware; see, for instance, these comments on the Transformer Prime Forum (and my tablet is a TF201).

And a few further comments about usability. When reading papers with two or more columns of text, both ezPDF and RepliGO allow you, with a single tap, to zoom in so that a single column fits all of the screen (like trimming off all white space and extra columns). This tends to work reliably well in papers and books, and I use this feature a lot (I tend to read in portrait mode, and have a single column of text fill up all of the width of the screen). However, qPDF (v3.0), iAnnotate (v. 1.1.4), and Acrobat do not get this right. You can zoom, yes, but you need to then fiddle around and adjust the zoom manually. Moreover, at least qPDF will get it wrong, so that when you turn pages, some text of the column is trimmed off and you need to re-adjust; this forces you to constantly change the zooming and centering of the page. The lack of this feature is a show stopper for me too, since being able to immediately zoom and trim to a single column is something I found essential when reading (and annotating and underlining with precission) some two- or three-column papers in the tablet.

Summary: for now, I'll stick to EzPDF, ocassionally trying RepliGO.

8 Synchronizing files to the tablet

I am using Referey, and I want to synchronize two different things:

  • the directory tree with all the PDFs
  • the database (only a single file in Mendeley)

I use Dropbox (why? Wuala works terribly from the University, Dropbox now gives to me, as members of UAM, 10 GB, and spideroak only gives 5 GB). It is important for me to have all of the PDFs really downloaded to the tablet (which is not the default mode in the tablet Dropbox).

Moreover, when syncing, you probably want to exclude some stuff. In particular, from the Mendeley directory where the database is stored, you only want the single file with the sqlite stuff. Not the rest (in particular, you want to exclude several huge directories). Thus, when syncing, you probably want to have something that allows for filtering.

Finally, at least with Referey, you do not want to sync back to the computer the (possibly) modified database file (Referey documentation says so). Therefore, you want something that will allow:

  • Two way syncing for the PDFs (so comments and annotations in the PDFs themselves get back to your computer)
  • Mirroring (one-way download, or whatever you call it) for DB files (with Referey)

At least both of Dropsync and FolderSync, both in Google Play, will do all of these (filter and true download, and allow for mirroring some dirs but two-way syncing for others). However, at least in my case, I've found Dropsync to be a lot faster (FolderSync can take up to 20 minutes to just verify that my 2000+ PDFs on over 4.2 GB have not changed at all; Dropsync will do that in about 4 to 5 minutes).

Dropsync or FolderSync will also be handy for other stuff (as they allow you to easily decide what you sync to what and how and how often).

A note on the filenames: very long file names that work without problem in my workstation and laptop gave me problems when syncing. The same for file names that ended with a period.

And a final note: Dropbox is case insensitive (https://www.dropbox.com/developers/reference/bestpractice). To me, this really, really, sucks. Get ready to run into directories and files with the "Case conflict" added.

Update from 2015-06. I am no longer using Dropbox for this (or for anything else, for that matter). For a variety of reasons (privacy, speed, ease of use, open sourceness, etc) I switched to Syncthing. Yes, I looked into the usual suspects (BTsync ---which I quickly learned to deeply dislike---; git-annex asistant, that I found cumbersome to setup and use, specially with mobile devices; Seafile, that I am using now for other, but did not quite work for this; and a bunch of others. I will eventually write about this, but Syncthing really rocks.)

9 Extracting comments from PDFs

I want to be able to search for comments in the PDFs. I used to rely on recoll, a really nice and fantastic tool, to do full text search anywhere on my home directory and my machines at large. However, recoll depends on pdftotext (from poppler) and … that will not always extract comments reliably. A lot of the things you add from Okular (and I mean in the PDF itself) or with EzPDF or other tablet applications, will not be detected with pdftotext. It turns out that pdf.js, which is used by ZotFile, is a very reliable extractor, but I have no idea how to run it from the command line. However, instead of spending hours figuring out how to run a JS program as if it were a simple unix command …

I searched around, and Leela will extract comments. You can get it here: https://github.com/TrilbyWhite/Leela. The thread of how Leela got to exist is an interesting one (https://bbs.archlinux.org/viewtopic.php?id=142309).

So what I do now is to run an R program to create an org mode file (we've referred to org mode several times; it is time we gave a link: http://orgmode.org/) that contains, under the link to each PDF, all of the extracted annotations. I run this nightly, as a cron job. Please note that this R code is an ugly kludge, but it does it job. It is a simple R program (that spends most of its time calling leela or running greps ;-). The output is a file that:

  • Contains text that will be found by recoll
  • I can search from Emacs for a comment and instantly get access to the PDF via the link.

This is the code. You'll have to change the path(s) where your PDFs and your Leela executable live.

#################################
## way toooo slow
## list.of.pdfs <- list.files(path = "/home/ramon/Mendeley-pdfs",
##                            pattern = "*.pdf", recursive = TRUE,
##                            full.names = TRUE, all.files = TRUE)

setwd("~/tmp")

list.of.pdfs <- system("find /home/ramon/Mendeley-pdfs -name '*.pdf'",
                       intern = TRUE)

anot0 <- sapply(list.of.pdfs, function(x) {
  a0 <- ""
  a <- paste("* [[", x, "]]", sep = "")
  b <- system(paste('~/Sources/Leela-master/leela annot \"',  x,
                    "\"", sep = ""),
              intern = TRUE)
  return(c(a0, a, b))
})
write(file = "anot0.txt", unlist(anot0))

## remove more stuff, as I find it, or become annoyed by it
system("egrep -v '^<[0-9]+,[0-9]+:link>$' anot0.txt | egrep -v '^<[0-9]+,[0-9]+:highlight>$' | egrep -v '^<[0-9]+,[0-9]+:widget>Citation Link$' | egrep -v '^<[0-9]+,[0-9]+:underline>$' > ~/Mendeley-pdfs/annotations-in-PDFs-of-refs.org")

## It would be nice to remove those PDFs without any annotations. Some other time.

(Added 2014-08-11). In this entry https://github.com/novoid/extract_pdf_annotations_to_orgmode, Karl Voit shows how to extract RepliGo PDF annotations and nicely place them in org-mode files. Compared to the approach above, the nice thing is that he does not require extra software (like Leela). I am not switching to Karl's method, though, as I want something that reliably works with different PDF annotation software (Okular on my computers, EzPDF on my tablet), and that does the extraction automagically for me for my complete collection of PDFs. If you use RepliGO, however, this might be a much better way to do things than my scheme above.

(Added 2015-02-02). In this entry Trilby mentions a simple C program to extract annotations. I tried it, and it works just fine (I only needed to modify a couple of lines of the Makefile for it to work on my Debian system). I have not tried to see if it extracts highlights too, since Leela covers that. But Trilby's gives more context (or easier for humans to understand) context, like page number.

10 PDF annotations in GNU/Linux

Until recently annotating PDFs comfortably, with the annotations in the PDF itself, was hard. But now Okular does that; see

https://bugs.kde.org/show_bug.cgi?id=151614

http://docs.kde.org/stable/en/kdegraphics/okular/annotations.html.

(At this time, the newest version of Okular might not be available for your Linux distro; you'll have to download and compile it yourself).

And I've also found that running the free PDF-XChange Viewer from wine is now a reasonably painless experience. PDF-XChange offers many more (too many for my taste) options to annotate than Okular. However, PDF-XChange is not free software. As well, PDF-XChange is not as fast to open, and lots of screen real state are consumed with useless stuff (for me) so viewing three PDFs at a time is hard. But you get a very nice view of all of your annotations (no need to click on each).

11 Syncing or connecting the tablet to Linux via USB

This can be a pain if you are using Android with version >= 4 as it uses MTP. Googling around will let you find a bunch of solutions. For instance:

http://www.youtube.com/watch?v=3ehnoJn6CEk

http://forum.xda-developers.com/showthread.php?t=1143044&page=5

http://forum.xda-developers.com/showpost.php?p=15232022&postcount=18

http://askubuntu.com/a/214083

But none of that really worked for me. The one that works for me is this one: https://github.com/hanwen/go-mtpfs

I tried using the native gvfs (http://intr.overt.org/blog/?p=153) but could not get the thing to compile and gave up (I had go-mtpfs working already).

Apparently, using go-mtpfs can be automated (http://bernaerts.dyndns.org/linux/247-ubuntu-automount-nexus7-mtp ) but I do not particularly care because …

… either with go-mtpfs or any of the other approaches I get problems from time to time which I find harder to understand and solve than plainly using Dropsync and looking at the errors. Of course, the second is slower (except maybe if you factor in the time to find the USB cable, mount the device, deal with errors, etc.)