The Mirror Program Page
Overview
In order to have as much information as possible in the archive, we
download data from other sites and incorporate it into our overall
subject trees. This data is gathered using the mirror
program. This program will update the archive so that files in it are
mirroring copies of the files on remote machines scattered all over
the world. The mirror program can be obtained from one of these sites:
- ftp://src.doc.ic.ac.uk/computing/archiving/mirror.
- ftp://ftp.th-darmstadt.de/pub/networking/mirror.
- ftp://ftp.sun.ac.za/pub/packages/mirror.
How does mirroring work?
The mirror program is located in /u/coast2/ftp-admin/mirror/mirror. It is a
PERL script which is run by the nightly
script. It compares the remote and local directory trees. Files are updated depending on the following conditions:
- If the timestamp of a file on the remote site is newer than the timestamps of the same file in the local directories, the remote file is downloaded
- If the remote site has files which do not exist in the local directories, they are downloaded.
- If a file exists both locally and remotely, and it has changed in size on the remote site, it is downloaded to the local directories.
- If a file exists locally, but has disappeared remotely, the file is deleted from the local directories.
What is a package?
The key to understanding how to use mirror is the concept of a package. A package is simply a collection of files and directories. Nothing more, nothing less. It can be a single file, or thousands of files and nested sub-directories. A package tells the mirror program what to obtain from the remote site. It specifies exactly a collection of files to obtain. Naturally the files can be specified using wildcards and pattern matching, so this allows many files to be obtained using a simple package.Why do we need this? Well otherwise it would be a real pain to tell the mirror program to obtain every file related to a particular tool. Consider a typical tool on a remote ftp server. It probably consists of at least a README file, and some archive files, generated by tar and probably compressed using gzip or compress. The author of the tool probably has written it for multiple platforms, so there could be a number of files, with very similar names. Finally, the author may have left an older version of the code there too.
- A tool on a remote site could look similar to this:
- mytool.README
- mytool.solaris-v1.0.tar.Z
- mytool.solaris-v1.5.tar.Z
- mytool.sunos-v1.0.tar.Z
- mytool.sunos-v1.0.tar.Z
- mytool.ultrix-v1.5.tar.Z
- READ.ME.TOO
- mytool.solaris-v1.0.tar.Z
So to actually present this tool usefully to our archive users, we must gather all these files. Notice how the most important ones all start with a common name: mytool. Let us assume that this is the name of the actual tool. Well that would also be a good name for the package we are going to tell the mirror program about. We will write description telling the mirror program exactly where to locate these files, which ones to get, which ones to ignore and where to put them locally. We will then name all this information. This will become a package from our point of view. We no longer view these files in isolation - they are part of some larger structure.
Mirror Packages file
The mirror packages file gives information on what packages to mirror. It is located in /u/coast2/ftp-admin/packages. This file consists of a series of records, each giving a specific item to mirror. The format of the entries are as follows (these are the more important fields):
package=
comment =--> <--
site =
remote_dir =
local_dir +
recurse =
get_patt =
exclude_patt =
As an example, here is a sample entry:
package=legal_bytes
comment =--> Newsletter of Emerging Legal Issues <--
site =ftp.eff.org
remote_dir =/pub/Publications/E-journals/Legal_Bytes
local_dir +mirrors/ftp.eff.org/Legal_Bytes/
Notice how this sample entry does not use all the fields shown above. If a field is not given, it is filled in with a default value. The package field is a unique name for the package. Every package in the file must have a unique package name.
The site field specifies the address of the remote site. This should be striped of any header information (e.g. ftp://) and trailing pathnames.
The remote_dir specifies where to locate the files on the remote server. If only a single file, or a group of files are to be downloaded, this is the directory containing them. If a directory hierarchy is to be mirrored, this is the root of that directory tree on the remote site.
The local_dir entry specifies where to place the files locally in the archive. In our archive, we have split the data into a subject tree and a mirrors tree. The data obtained by the mirror program is placed in the mirrors tree - the path to the files is obtained by concatinating the default path to the mirrors directory (from the defaults file), with the path specified in the package entry.
For example, if the path to the local directory was /u/coast3/ftp/pub, then the above package would place any files and subdirectories downloaded into /u/coast2/ftp/pub/mirrors/ftp.eff.org/.
Mirror Defaults File
The mirror defaults file controls overall aspects of the mirror program.
How to run the mirror program
The mirror program takes a selection of command line arguments, and a
package file. The arguments control the behaviour of the program. They
override the options specified in the defaults file. The package file is a
file consisting of at least one package entry.
To run the mirror program, simply have the directory
/u/coast2/ftp-admin/mirror in your path. Then execute:
mirror -d ../packages/security-archive.biweekly
The -d option causes the mirror program to report what it is doing on the standard output. This is used to generate the biweekly mail message giving the output of the mirror program.
In this case, the mirror program To mirror a specific package, you must specify that package by name on the command line. The -p option is used for this. The name of the package is whatever it was called in the packages file. This gives a way of conviently telling the mirror program if you want just one specific package mirrored, as against the whole packages file. If you don't want to actually do anything, but just see what would have happened had you mirrored a specific package, use the -n option. As an example:
mirror -n -d ../packages/security-archive.biweeklyTo mirror an specific package, you can use the -p option to the mirror program. This takes the name of a package and will update the archive to reflect any changes that may have taken place in that package on the remote site:
mirror -d -p<package1> ../mirrors/security-archive.biweekly
Built by Mark Crosbie and
Ivan Krsul.
COAST Internal WWW Page ---
COAST Project page ---
Purdue CS Dept page
Last Modified: 6 March, 1995.
security-archive@cerias.purdue.edu (COAST Security Archive)