Managing third-party source code with CVS

Luke Mewburn
Department of Computer Science, RMIT
<lukem@cs.rmit.edu.au>

Copyright © 1999 Luke Mewburn. All rights reserved.

Abstract

This paper describes techniques for using revision control and documentation to install and maintain third-party source code. The automated building and distribution of software is outside the scope of this paper.

Introduction

One of a system administrator's tasks is managing the installation and maintenance of third-party software, which is often compiled from source code.

Over the last ten years I have been in situations where I had to maintain third-party software which had insufficient documentation to rebuild or upgrade the software if necessary. In many situations /usr/local/src consists of a haphazard collection of tar files and directories, possibly with local modifications, and generally only the original system administrator has any idea on how the product was installed.

I started using CVS [1] over five years ago for the management of my own source, and since that time I have also been using it for managing the third-party source that I was responsible for.

This paper provides an introduction to CVS, and recommendations (based on my experiences) on how to use it in conjuction with appropriate documentation to improve the installation and maintenance of third-party software.

What is CVS?

CVS (Concurrent Versions System) is a version control system. Generally version control allows you to track changes in files, and see the who, when and what information for each change.

System administrators who are familiar with RCS (Revision Control System) software should find using CVS fairly easy since CVS was based on RCS.

CVS...

(The following is derived from [2] and [3] (limitations)

Who uses CVS?

CVS was primarily aimed at groups of programmers developing medium to large software projects. Most of the large open-source software efforts which have developers distributed globally use CVS, including: NetBSD, FreeBSD, Linux, XFree86, Apache, and Mozilla.

CVS is also proven to be useful for a wide range of other tasks, including the subject of this paper...

The CVS repository

CVS uses a central repository, which is a hierarchical directory structure where all of the files and the associated version history is stored.

Developers do not work directly on the files in the repository; instead they use private checked-out copies of the sections of the repository they are interested in.

As the repository is a directory based tree, there is no reason that the repository can not be used for a variety of different purposes, including third-party software, in-house projects, host configuration files, etc.

The repository can be accessed via a pathname (either a local or NFS filesystem), or via a variety of remote protocols including rsh or ssh, and CVS's internal pserver protocol.

It is important to ensure that repository is secure, especially if the source to security tools is to be kept in it.

Preparing for CVS

CVS uses a few environment variables. Some of the more commonly used ones are:

Set these in your shell's startup file for convenience.

Initialising the repository

Choose a machine with suitable reliable disk capacity for the respository. The CVS info documentation [4] suggests to allocate three times the size of the code stored in it. Our current CVS repository is 1250 MB in size, and there are over 220 packages in there (including large packages such as X11R6 and egcs).

To initialise the repository, on the server run:

	% cvs -d /path/to/repository init

Alternatively, set $CVSROOT appropriately and run

	% cvs init

Some tweaking of the CVS configuration files may be required to customise for the local environment. The CVSROOT module at the top level of the repository contains these files. CVSROOT/loginfo controls where the commit messages are to be sent (e.g., append to a local file, email to interested users, etc.) Refer to the CVS documentation for more information on configuring these files.

Basic CVS commands

All of the CVS operations are performed with the cvs command. cvs takes a word argument to describe the operation, with optional arguments to vary the operation.

The basic commands are:

As well as the examples below, [3] provides a sample session.

The CVS info documentation [4] and the CVS Reference Manual [5] provide further information on these commands.

Required documentation

The use of CVS in itself is not a panacea to the problem of managing third-party source code. The use of accurate documentation (which some system administrators fear and loathe, probably because they do not know to utilitise it correctly) is mandatory. CVS is just a tool; it does not define policies.

The following documentation should be maintained:

Note that CVS and documentation should be used in conjunction with effective communication between the developers and system administrators, not as a replacement for communication!

Third-party database

For each separate software package imported into CVS, the following should be documented:

I maintain a file called 3RDPARTY, which is kept in a CVS module called docs. It consists of separate entries per package. An example entry is:

	Package:	python
	Version:	1.5.2
	Current:	1.5.2
	Maintainer:	
	URL:		http://www.python.org/
	Archive Site:	ftp://ftp.python.org/pub/python/src/py152.tgz
	Mailing List:	
	CVS location:	lang/python
	Vendor tag:	PYTHON_DIST
	Release tag:	python-1-5-2
	Responsible:	lukem
	Compiler:	/opt/SUNWspro/bin/cc
	Environment:	SunOS wombat.cs.rmit.edu.au 5.6 Generic_105181-12
			sun4u sparc SUNW,Ultra-2
	Date:		Fri Apr 16 16:18:37 EST 1999
	Depends upon:	blt 8.0-unoff
	Depends upon:	readline 4.0
	Depends upon:	tcl 8.0.5
	Depends upon:	tk 8.0.5
	Depends upon:	x11r6.3
	Depends upon:	zlib 1.1.3
	Notes:
	* created Modules/Setup.local to contain modules we use, based on
	  commented-out entries in Modules/Setup.in that we want.
	* configured, compiled and installed with
		env CC="cc -mt" ./configure --with-threads
		make
		make test
		make install

Currently this is just a flat text file, maintained by hand.

Hints:

Possible future enhancements include:

Site standards

Whether your site installed programs in /usr/local, /opt/local, or /opt/gnu, there should be sufficient general documentation on where third-party software is to be installed so that there should be consistency in installation, especially when there are multiple system administrators responsible for software installation.

It is also sensible to document why the policies were contravened if it was necessary to do so (as it is in some cases).

Importing a new package

To import a new package, we require a fresh copy of the source. Suggested locations to look for software are described below in Finding third-party source.

In this example we will use the Internet Software Consortium's DHCP server - dhcp 2.0b1 pl18.

Updating an existing package

It is rare for third-party software to remain unchanged forever; updates are available (even on a regular basis).

An important part of managing third-party software is upgrading to a newer version in a sane way.

The process to import an updated version of a package is similar to importing the original package. However, because changes in the newer version may conflict with any local changes that you may have made to the product, there are a couple of steps that are different.

In this example, we'll be upgrading dhcp 2.0b1 from pl18 to pl27.

Advanced CVS operations

CVS has other operations which are useful to know. Whilst these may be more relevant to a software developer, they are still useful for our purposes:

Tips and Tricks

CVS is a powerful tool and it has useful functionality that is often only hinted at in the documentation. This sections covers some of that functionality, and other tips and tricks that I felt might be relevant.

cvs checkout -j old-vendor-release -j new-vendor-release module

In my opinion, this is one of the most useful underdocumented operations in CVS, especially for the purpose of maintaining third-party packages (or anything that uses vendor branches).

As shown in the example in Updating an existing package, this operation checks out a copy of module and merges any changes made by the vendor between release tags old-vendor-release and new-vendor-release. This includes files that were added or removed between releases.

Tagging the installed release

It may be useful to tag a package at a known working state after successful installation or prior to the import of a new version. This can simplify determining the local changes made to the previous vendor release.

The sequence of operations would be something like:

Minimising conflicts

It is possible to modify code to minimise conflicts when the vendor changes the same sections of the code. Rather than change a line provide an override line which has the same effect.

Binary files

Certain packages are distributed with binary files that must be stored unmodified in the repository and be checked out unmodified upon a check-out.

The simplest way to support a package with some binary files is:

Symbolic links

CVS ignores symbolic links upon import (the files have a status of "L").

A workaround is to add a local script to module which is run manually after checkout to regenerate the symlinks. I call this fixlinks, and generate it with:

	% find . -type l -print | perl -e 'while (<>) { chomp ; \
	    print "ln -s ", readlink($_), " $_\n"; } ' > fixlinks

An example fixlinks (from our net/netatalk package) is:

	ln -s ../sys/netatalk ./include/netatalk
	ln -s ../codepage.h ./etc/afpd/nls/codepage.h
	ln -s kpatch-4.2 ./sys/ultrix/kpatch-4.4
	ln -s kpatch-4.2 ./sys/ultrix/kpatch-4.3
	ln -s ../../ultrix/sys/cdefs.h ./sys/solaris/sys/cdefs.h

Upon checkout, regenerate the links with:

	% sh ./fixlinks

Automatically generating modules file

The CVSROOT/modules file provides a mapping from a module alias to a directory or directories. I find it useful to have shortcuts to the packages in the repository. For example, this allows me to run cvs checkout dhcp instead of cvs checkout net/dhcp.

I wrote a script a few years ago called genmodules [7] which which parses CVSROOT/commitlog and updates CVSROOT/modules as necessary.

Working with multiple repositories

If you are working with multiple repositories (e.g., a local repository and that of an open source software project), it may help to have shell aliases which do the right thing.

For example, in my .cshrc I have:

	alias   ncvs    'env CVSROOT=cvs.netbsd.org:/cvsroot cvs'
	alias	cscvs	'env CVSROOT=wombat:/src/cvsroot cvs'

Setting defaults for CVS commands

If you are always invoking a CVS command with the same set of options, you can simplify your typing by adding a relevant entry in $HOME/.cvsrc. For example, I have a line of the form:

	update -dP

which means that cvs update ... is run as cvs update -dP ... ("-d"; build directories, "-P"; prune empty directories).

Organising the repository

It is highly recommended to have a sensible organisation for the third-party packages in the repository. For example, we have the following directories; it should be trivial to infer the contents of the directories:

	% ls $CVSROOT
	CVSROOT     devel      lang       rmit       www
	ai          docs       mail       security   x11
	archivers   file       net        text
	audio       graphics   news       utils

Finding third-party source

An element of managing third-party source is finding the source in the first place :-) A few good places to look include:

Summary

I believe that CVS is well suited to the task of managing third-party software. I have been using it this role for over five years for three different employers. For over three years I have also been using CVS as one of the many distributed developers of the NetBSD project [8].

Before I started my current position at RMIT Computer Science nearly two years ago, /usr/local/src was a two gigabyte directory with a haphazard structure (I hesistate to call it "organisation"). In many cases there multiple copies of the same product, without any obvious indication of which was the currently installed version (in the case of the Columbia Appletalk Package there were six different versions of the source code). As we rebuilt systems (including rebuilding /usr/local from scratch on new machines), we kept all new products in the CVS repository.

Hand in hand with CVS is the maintenance of relevant documentation. Without documentation a CVS tree is almost as bad as the proverbial unorganised /usr/local/src. NetBSD provided the inspiration for the 3RDPARTY file, although I have expanded upon it since then.

Acknowledgements

Thanks to Matt Green for teaching me the checkout -j old -j new module trick (amongst others); in my opinion it is one of the most useful commands to know when maintaining third party source.

Giles Lean's assistance by reviewing the paper and providing a wealth of feedback was much appreciated.

References

[1] Cyclic CVS site,
http://www.cyclic.com/cvs/info.html
[2] CVS Overview,
http://www.cyclic.com/cyclic-pages/overview.html
[3] Introduction to CVS,
http://www.cyclic.com/cvs/doc-blandy-text.html
[4] The CVS info documentation.
This should be installed as part of the CVS installation.
[5] CVS Reference Manual,
http://www.loria.fr/~molli/cvs/doc/cvs_toc.html
[6] SSH secure shell,
http://www.ssh.fi/
[7] genmodules,
http://www.cs.rmit.edu.au/~lukem/src/genmodules
[8] The NetBSD Project (Australian mirror),
http://www.au.netbsd.org/