LaTeX/Collaborative Writing of LaTeX Documents
||This page needs some attention. You can help improve it, request assistance, or view current progress.|
Note: This Wikibook is based on the article Tools for Collaborative Writing of Scientific LaTeX Documents by Arne Henningsen that is published in The PracTeX Journal 2007, number 3 (http://www.tug.org/pracjourn/).
Collaborative writing of documents requires a strong synchronisation among authors. This Wikibook describes a possible way to organise the collaborative preparation of LaTeX documents. The presented solution is primarily based on the version control system Subversion (http://subversion.apache.org/). The Wikibook describes how Subversion can be used together with several other software tools and LaTeX packages to organise the collaborative preparation of LaTeX documents.
- You can use one of the online solutions listed in the Installation chapter. Most of them have collaboration features.
- Another option for collaboration is dropbox. It has 2 GB free storage and versioning system. Works like SVN, but more automated and therefore especially useful for beginning LaTeX users. However, Dropbox is not a true versioning control system, and as such it does not allow you to roll the article back to previous versions.
- You can use an online collaborative tool built on top of a versioning control system, such as Authorea or ShareLatex. Authorea performs most of the actions described in this document, but in the background (it is built on Git). It allows authors to enter LaTeX or Markdown via a GUI with mathematical notation, figures, d3.js plots, IPython notebooks, data, and tables. All content is rendered to HTML5. Authorea also features a commenting system and article-based chat to ease collaboration and review.
- As the LaTeX system uses plain text, you can use synchronous collaborative editors like Gobby. In Gobby you can write your documents in collaboration with anyone in real time. It is strongly recommended that you use utf8 encoding (especially if there are users on multiple operating systems collaborating) and a stable network (typically wired networks).
- TitanPad (or other clones of EtherPad). To compile use the command:
wget -O filename.tex "http://titanpad.com/ep/pad/export/xxxx/latest?format=txt" && (latex filename.tex)
where 'xxxx' should be replaced by the pad number (something like 'z7rSrfrYcH').
- With a dedicated Linux box with LaTeX & Dropbox it's possible to use Google docs and some scripting to get automatically generated PDFs on Dropbox from updates on Google Docs.
- You can use a distributed version control system such as Fossil, Mercurial or Git. This is the definitive solution for users looking for control and advanced features like branch and merge. The learning curve will be steeper than that for a web-based solution.
The collaborative preparation of documents requires a considerable amount of coordination among the authors. This coordination can be organised in many different ways, where the best way depends on the specific circumstances.
In this Wikibook, I describe how the collaborative writing of LaTeX documents is organised at our department (Division of Agricultural Policy, Department of Agricultural Economics, University of Kiel, Germany). I present our software tools, and describe how we use them. Thus, this Wikibook provides some ideas and hints that will be useful for other LaTeX users who prepare documents together with their co-authors.
There are many ways to interchange documents among authors. One possibility is to compose documents by interchanging e-mail messages. This method has the advantage that common users generally do not have to install and learn the usage of any extra software, because virtually all authors have an e-mail account. Furthermore, the author who has modified the document can easily attach the document and explain the changes by e-mail as well. Unfortunately, there is a problem when two or more authors are working at the same time on the same document. So, how can authors synchronise these files?
A second possibility is to provide the document on a common file server, which is available in most departments. The risk of overwriting each others' modifications can be eliminated by locking files that are currently edited. However, generally the file server can be only accessed from within a department. Hence, authors who are out of the building cannot use this method to update/commit their changes. In this case, they will have to use another way to overcome this problem. So, how can authors access these files?
A third possibility is to use a version control system. A comprehensive list of version control systems can be found at Wikipedia. Version control systems keep track of all changes in files in a project. If many authors modify a document at the same time, the version control system tries to merge all modifications automatically. However, if multiple authors have modified the same line, the modifications cannot be merged automatically, and the user has to resolve the conflict by deciding manually which of the changes should be kept. Authors can also comment their modifications so that the co-authors can easily understand the workflow of this file. As version control systems generally communicate over the internet (e.g. through TCP/IP connections), they can be used from different computers with internet connections. A restrictive firewall policy might prevent the version control system from connecting to the internet. In this case, the network administrator has to be asked to open the appropriate port. The internet is only used for synchronising the files. Hence, a permanent internet connection is not required. The only drawback of a version control system could be that it has to be installed and configured.
Moreover, a version control system is useful even if a single user is working on a project. First, the user can track (and possibly revoke) all previous modifications. Second, this is a convenient way to have a backup of the files on other computers (e.g. on the version control server). Third, this allows the user to easily switch between different computers (e.g. office, laptop, home).
The Version Control System Subversion
Subversion (SVN) comes as a successor to the popular version control system CVS. SVN operates on a client-server model in which a central server hosts a project repository that users copy and modify locally. A repository functions similarly to a library in that it permits users to check out the current project, make changes, and then check it back in. The server records all changes a user checks in (usually with a message summarizing what changes the user made) so that other users can easily apply those changes to their own local files.
Each user has a local working copy of a remote repository. For instance, users can update changes from the repository to their working copy, commit changes from their own working copy to the repository, or (re)view the differences between working copy and repository.
To set up a SVN version control system, the SVN server software has to be installed on a (single) computer with permanent internet access. (If this computer has no static IP address, one can use a service like DynDNS to be able to access the server with a static hostname.) It can run on many Unix, modern MS Windows, and Mac OS X platforms.
Users do not have to install the SVN server software, but a SVN "client" software. This is the unique way to access the repositories on the server. Besides the basic SVN command-line client, there are several Graphical User Interface Tools (GUIs) and plug-ins for accessing the SVN server (see http://subversion.tigris.org/links.html). Additionally, there are very good manuals about SVN freely available on the internet (e.g. http://svnbook.red-bean.com).
At our department, we run the SVN server on a GNU-Linux system, because most Linux distributions include it. In this sense, installing, configuring, and maintaining SVN is a very simple task.
Most MS Windows users access the SVN server by the TortoiseSVN client, because it provides the most usual interface for common users. Linux users usually use SVN utilities from the command-line, or eSvn--a GUI frontend--with KDiff3 for showing complex differences.
Hosting LaTeX files in Subversion
On our Subversion server, we have one repository for a common texmf tree. Its structure complies with the TeX Directory Structure guidelines (TDS, http://www.tug.org/tds/tds.html, see figure 1). This repository provides LaTeX classes, LaTeX styles, and BibTeX styles that are not available in the LaTeX distributions of the users, e.g. because they were bought or developed for the internal use at our department. All users have a working copy of this repository and have configured LaTeX to use this as their personal texmf tree. For instance, teTeX (http://www.tug.org/tetex/) users can edit their TeX configuration file (e.g. /etc/texmf/web2c/texmf.cnf) and set the variable TEXMFHOME to the path of the working copy of the common texmf tree (e.g. by TEXMFHOME = $HOME/texmf); MiKTeX (http://www.miktex.org/) users can add the path of the working copy of the common texmf tree in the 'Roots' tab of the MiKTeX Options.
If a new class or style file has been added (but not if these files have been modified), the users have to update their 'file name data base' (FNDB) before they can use these classes and styles. For instance, teTeX users have to execute texhash; MiKTeX users have to click on the button 'Refresh FNDB' in the 'General' tab of the MiKTeX Options.
Furthermore, the repository contains manuals explaining the specific LaTeX software solution at our department (e.g. this document).
The Subversion server hosts a separate repository for each project of our department. Although branching, merging, and tagging is less important for writing text documents than for writing source code for software, our repository layouts follow the recommendations of the 'Subversion book' (http://svnbook.red-bean.com). In this sense, each repository has the three directories /trunk, /branches, and /tags.
The most important directory is /trunk. If a single text document belongs to the project, all files and subdirectories of this text document are in /trunk. If the project yields two or more different text documents, /trunk contains a subdirectory for each text document. A slightly different version (a branch) of a text document (e.g. for presentation at a conference) can be prepared either in an additional subdirectory of /trunk or in a new subdirectory of /branches. When a text document is submitted to a journal or a conference, we create a tag in the directory /tags so that it is easy to identify the submitted version of the document at a later date. This feature has been proven very useful. When creating branches and tags, it is important always to use the Subversion client (and not the tools of the local file system) for these actions, because this saves disk space on the server and it preserves information about the same history of these documents.
Often the question arises, which files should be put under version control. Generally, all files that are directly modified by the user and that are necessary for compiling the document should be included in the version control system. Typically, these are the LaTeX source code (*.tex) files (the main document and possibly some subdocuments) and all pictures that are inserted in the document (*.eps, *.jpg, *.png, and *.pdf files). All LaTeX classes (*.cls), LaTeX styles (*.sty), BibTeX data bases (*.bib), and BibTeX styles (*.bst) generally should be hosted in the repository of the common texmf tree, but they could be included in the respective repository, if some (external) co-authors do not have access to the common texmf tree. On the other hand, all files that are automatically created or modified during the compilation process (e.g. *.aut, *.aux, *.bbl, *.bix, *.blg, *.dvi, *.glo, *.gls, *.idx, *.ilg, *.ind, *.ist, *.lof, *.log, *.lot, *.nav, *.nlo, *.out, *.pdf, *.ps, *.snm, and *.toc files) or by the (LaTeX or BibTeX) editor (e.g. *.bak, *.bib~, *.kilepr, *.prj, *.sav, *.tcp, *.tmp, *.tps, and *.tex~ files) generally should be not under version control, because these files are not necessary for compilation and generally do not include additional information. Furthermore, these files are regularly modified so that conflicts are very likely.
Subversion really makes the difference
A great feature of a version control system is that all authors can easily trace the workflow of a project by viewing the differences between arbitrary versions of the files. Authors are primarily interested in 'effective' modifications of the source code that change the compiled document, but not in 'ineffective' modifications that have no impact on the compiled document (e.g. the position of line breaks). Software tools for comparing text documents ('diff tools') generally cannot differentiate between 'effective' and 'ineffective' modifications; they highlight both types of modifications. This considerably increases the effort to find and review the 'effective' modifications. Therefore, 'ineffective' modifications should be avoided.
In this sense, it is very important not to change the positions of line breaks without cause. Hence, automatic line wrapping of the users' LaTeX editors should be turned off and line breaks should be added manually. Otherwise, if a single word in the beginning of a paragraph is added or removed, all line breaks of this paragraph might change so that most diff tools indicate the entire paragraph as modified, because they compare the files line by line. The diff tools wdiff (http://www.gnu.org/software/wdiff/) and dwdiff (http://os.ghalkes.nl/dwdiff.html) are not affected by the positions of line breaks, because they compare documents word by word. However, their output is less clear so that modifications are more difficult to track. Moreover, these tools cannot be used directly with the Subversion command-line switch --diff-cmd, but a small wrapper script has to be used (http://textsnippets.com/posts/show/1033).
A reasonable convention is to add a line break after each sentence and start each new sentence in a new line. Note that this has an advantage also beyond version control: if you want to find a sentence in your LaTeX code that you have seen in a compiled (DVI, PS, or PDF) file or on a printout, you can easily identify the first few words of this sentence and screen for these words on the left border of your editor window.
Furthermore, we split long sentences into several lines so that each line has about 80 characters, because it is rather inconvenient to search for (small) differences in long lines. (Note: For instance, the LaTeX editor Kile (http://kile.sourceforge.net/) can assist the user in this task when it is configured to add a vertical line that marks the 80th column.) We find it very useful to introduce the additional line breaks at logical breaks of the sentence, e.g. before a relative clause or a new part of the sentence starts. An example LaTeX code that is formatted according to these guidelines is the source code of the article Tools for Collaborative Writing of Scientific LaTeX Documents by Arne Henningsen that is published (including the source code) in The PracTeX Journal 2007, Number 3 (http://www.tug.org/pracjourn/2007-3/henningsen/).
If the authors work on different operating systems, their LaTeX editors will probably save the files with different newline (end-of-line) characters (http://en.wikipedia.org/wiki/Newline). To avoid this type of 'ineffective' modifications, all users can agree on a specific newline character and configure their editor to use this newline character. Another alternative is to add the subversion property 'svn:eol-style' and set it to 'native'. In this case, Subversion automatically converts all newline characters of this file to the native newline character of the author's operating system (http://svnbook.red-bean.com/en/1.4/svn.advanced.props.file-portability.html#svn.advanced.props.special.eol-style).
There is also another important reason for reducing the number of 'ineffective' modifications: if several authors work on the same file, the probability that the same line is modified by two or more authors at the same time increases with the number of modified lines. Hence, 'ineffective' modifications unnecessarily increase the risk of conflicts (see section Interchanging Documents).
Furthermore, version control systems allow a very effective quality assurance measure: all authors should critically review their own modifications before they commit them to the repository (see figure 2). The differences between the user's working copy and the repository can be easily inspected with a single Subversion command or with one or two clicks in a graphical Subversion client. Furthermore, authors should verify that their code can be compiled flawlessly before they commit their modifications to the repository. Otherwise, the co-authors have to pay for these mistakes when they want to compile the document. However, this directive is not only reasonable for version control systems but also for all other ways to interchange documents among authors.
Subversion has a feature called 'Keyword Substitution' that includes dynamic version information about a file (e.g. the revision number or the last author) into the contents of the file itself (see e.g. http://svnbook.red-bean.com, chapter 3). Sometimes, it is useful to include these information not only as a comment in the LaTeX source code, but also in the (compiled) DVI, PS, or PDF document. This can be achieved with the LaTeX packages svn (http://www.ctan.org/tex-archive/macros/latex/contrib/svn/), svninfo (http://www.ctan.org/tex-archive/macros/latex/contrib/svninfo/), or (preferably) svn-multi (http://www.ctan.org/tex-archive/macros/latex/contrib/svn-multi/).
The most important directives for collaborative writing of LaTeX documents with version control systems are summarised in the following box.
If the users are willing to let go of the built-in diff utility of SVN and use diff tools that are local on their workstations, they can put to use such tools that are more tailored to text documents. The diff tool that comes with SVN was designed with source code in mind. As such, it is built to be more useful for files of short lines. Other tools, such as Compare It! allows to conveniently compare text files where each line can span hundreds of characters (such as when each line represents a paragraph). When using a diff tool that allows convenient views of files with long lines, the users can author the TeX files without a strict line-breaking policy.
Visualizing diffs in LaTeX: latexdiff and changebar
The tools latexdiff and changebar can visualize differences of two LaTeX files inside a generated document. This makes it easier to see impact of certain changes or discuss changes with people not custom to LaTeX. Changebar comes with a script
chbar.sh which inserts a bar in the margin indicating parts that have changed. Latexdiff allows different styles of visualization. The default is that discarded text is marked as red and added text is marked as blue. It also supports a mode similar to Changebar which adds a bar in the margin. Latexdiff comes with a script
latexrevise which can be used to accept or decline changes. It also has a wrapper script to support version control systems such as the discussed Subversion.
An example on how to use Latexdiff in the Terminal.
latexdiff old.tex new.tex > diff.tex # Files old.tex and new.tex are compared and the file visualizing the changes is written to diff.tex pdflatex diff.tex # Create a PDF showing the changes
Managing collaborative bibliographies
Writing of scientific articles, reports, and books requires the citation of all relevant sources. BibTeX is an excellent tool for citing references and creating bibliographies (Markey 2005, Fenn 2006). Many different BibTeX styles can be found on CTAN (http://www.ctan.org) and on the LaTeX Bibliography Styles Database (http://jo.irisson.free.fr/bstdatabase/). If no suitable BibTeX style can be found, most desired styles can be conveniently assembled with custombib/makebst (http://www.ctan.org/tex-archive/macros/latex/contrib/custom-bib/). Furthermore, BibTeX style files can be created or modified manually; however this action requires knowledge of the (unnamed) postfix stack language that is used in BibTeX style files (Patashnik 1988).
At our department, we have a common bibliographic data base in the BibTeX format (.bib file). It resides in our common texmf tree (see section 'Hosting LaTeX files in Subversion') in the subdirectory /bibtex/bib/ (see figure 1). Hence, all users can specify this bibliography by only using the file name (without the full path) --- no matter where the user's working copy of the common texmf tree is located.
All users edit our bibliographic data base with the graphical BibTeX editor JabRef (http://www.jabref.org). As JabRef is written in Java, it runs on all major operating systems. As different versions of JabRef generally save files in a slightly different way (e.g. by introducing line breaks at different positions), all users should use the same (e.g. last stable) version of JabRef. Otherwise, there would be many differences between different versions of .bib files that solely originate from using different version of JabRef. Hence, it would be hard to find the real differences between the compared documents. Furthermore, the probability of conflicts would be much higher (see section 'Subversion really makes the difference'). As JabRef saves the BibTeX data base with the native newline character of the author's operating system, it is recommended to add the Subversion property 'svn:eol-style' and set it to 'native' (see section 'Subversion really makes the difference').
JabRef is highly flexible and can be configured in many details. We make the following changes to the default configuration of JabRef to simplify our work. First, we specify the default pattern for BibTeX keys so that JabRef can automatically generate keys in our desired format. This can be done by selecting Options → Preferences → Key pattern and modifying the desired pattern in the field Default pattern. For instance, we use [auth:lower][shortyear] to get the last name of the first author in lower case and the last two digits of the year of the publication (see figure 3).
Second, we add the BibTeX field location for information about the location, where the publication is available as hard copy (e.g. a book or a copy of an article). This field can contain the name of the user who has the hard copy and where he has it or the name of a library and the shelf-mark. This field can be added in JabRef by selecting Options → Set up general fields and adding the word location (using the semicolon (;) as delimiter) somewhere in the line that starts with General: (see figure 4).
Third, we put all PDF files of publications in a specific subdirectory in our file server, where we use the BibTeX key as file name. We inform JabRef about this subdirectory by selecting Options → Preferences → External programs and adding the path of the this subdirectory in the field Main PDF directory (see figure 5). If a PDF file of a publication is available, the user can push the Auto button left of JabRef's Pdf field to automatically add the file name of the PDF file. Now, all users who have access to the file server can open the PDF file of a publication by simply clicking on JabRef's PDF icon.
If we send the LaTeX source code of a project to a journal, publisher, or somebody else who has no access to our common texmf tree, we do not include our entire bibliographic data base, but extract the relevant entries with the Perl script aux2bib (http://www.ctan.org/tex-archive/biblio/bibtex/utils/bibtools/aux2bib).
This wikibook describes a possible way to efficiently organise the collaborative preparation of LaTeX documents. The presented solution is based on the Subversion version control system and several other software tools and LaTeX packages. However, there are still a few issues that can be improved.
First, we plan that all users install the same LaTeX distribution. As the TeX Live distribution (http://www.tug.org/texlive/) is available both for Unix and MS Windows operating systems, we might recommend our users to switch to this LaTeX distribution in the future. (Currently, our users have different LaTeX distributions that provide a different selection of LaTeX packages and different versions of some packages. We solve this problem by providing some packages on our common texmf tree.)
Second, we consider to simplify the solution for a common bibliographic data base. Currently it is based on the version control system Subversion, the graphical BibTeX editor JabRef, and a file server for the PDF files of publications in the data base. The usage of three different tools for one task is rather challenging for infrequent users and users that are not familiar with these tools. Furthermore, the file server can be only accessed by local users. Therefore, we consider to implement an integrated server solution like WIKINDX (http://wikindx.sourceforge.net/), Aigaion (http://www.aigaion.nl/), or refBASE (http://refbase.sourceforge.net/). Using this solution only requires a computer with internet access and a web browser, which makes the usage of our data base considerably easier for infrequent users. Moreover, the stored PDF files are available not only from within the department, but throughout the world. (Depending on the copy rights of the stored PDF files, the access to the server --- or least the access to the PDF files --- has to be restricted to members of the department.) Even Non-LaTeX users of our department might benefit from a server-based solution, because it should be easier to use this bibliographic data base in (other) word processing software packages, because these servers provide the data not only in BibTeX format, but also in other formats.
All readers are encouraged to contribute to this wikibook by adding further hints or ideas or by providing further solutions to the problem of collaborative writing of LaTeX documents.
Arne Henningsen thanks Francisco Reinaldo and Géraldine Henningsen for comments and suggestions that helped him to improve and clarify this paper, Karsten Heymann for many hints and advices regarding LaTeX, BibTeX, and Subversion, and Christian Henning as well as his colleagues for supporting his intention to establish LaTeX and Subversion at their department.
- Fenn, Jürgen (2006): Managing citations and your bibliography with BibTeX. The PracTEX Journal, 4. http://www.tug.org/pracjourn/2006-4/fenn/.
- Markey, Nicolas (2005): Tame the BeaST. The B to X of BibTeX. http://www.ctan.org/tex-archive/info/bibtex/tamethebeast/ttb_en.pdf. Version 1.3.
- Oren Patashnik. Designing BibTeX styles. http://www.ctan.org/tex-archive/info/biblio/bibtex/contrib/doc/btxhak.pdf.