[Thu May  3 09:41:54 PDT 2001]

This note documents an approach for detecting and creating changeset boundaries in an imported CVS tree. It actually has nothing to do with CVS; this would work for Teamware imports as well. The Teamware imports are harder because they have branches and merges, but they could be done. For now, let’s just look at CVS straightlines.

The method described here assumes that the repository history can fit completely in memory. Given that I can buy 1.5GB for $400 now, this is an OK assumption. If you think about it after reading the description, you could see how to do a multi pass version of this that worked in less memory. I wouldn’t bother.

Data structures

delta list - a linked list (through d->next) of all deltas over all files
sfile array - an array of sfile names
We reuse the d->merge number as an index into the sfile array; so that
sfiles[d->merge] is the file name associated with d.  This means we
need an assert that the number of sfiles is less than 64K.

Algorithm

   The sccslog.c code is quite close to what we want.  Start with that.
   Init each file, filling in the sfile array as we go.  Use an MDBM for
   the sfile array.
   sccs_close() the file after initing it - this will drop the mmap and
   we don't want to have all the files mapped.
   Make sure that the deltas all have fudges such that
d->date < d->kid->date
   in all cases.  I think rcs2sccs does this already, but make sure.
   If the datefudge is greater than an hour, then print a warning.
   Create a linked list of all deltas in all files (code exists in sccslog).
   Sort the list on d->date (code exists in sccslog).
   Create a predicate, sameCset(d1, d2).  As a starting point, we could
   just use "(d2->date - d1->date) > GAP".
   Walk the list, gathering up the range of deltas which meet the conditions
   from the first to the last delta.
   Walk the gathered up list, building up an mdbm of the _last_ delta for
   each file in the list (i.e., there may be more than one delta in a
   changeset for a particular file).
   get -eg ChangeSet file.
   For each delta in the cset MDBM {
   	sccs_sdelta() the key and print it into the changeset file
// this is slow but it will work
sccs_init the file
add the cset mark
sccs_newchksum() the file
sccs_free()
   }
   sccs_delta() the changeset file with a some sort of comment.
   We need delta the changeset file with a date that equals the last delta
   in the changeset.  The user should somebody in the list of deltas in
   the changset.
   The comment is an open issue - we could use the checkin comments
   on the files but that is going to make the changeset file huge.
   I'm open for suggestions here.

That’s it. Then bk -r check -ac the tree, it should be fine.