Document old clonemod and new clone -@url in nested

Clone Baseline

Old version

The original clonemod was a shell script and looked like this:

bk clone -lq BASE DEST
cd DEST
bk parent -s SOURCE
bk undo -fa`bk repogca`
# remove any local tags that the above undo missed
bk changes -qfkL > BitKeeper/etc/csets-in|| exit 1
if [ -s BitKeeper/etc/csets-in ]
then	bk unpull -sfq || exit 1
else	rm BitKeeper/etc/csets-in || exit 1
fi
bk pull -u

clone local baseline
rollback to GCA
pull from remote source

Notes:

This worked on nested of all sources used the same aliases. (like default)
that unpull is tricky side-effect of our tag graph
this form doesn’t handle proper undo of tags in components.

Current version

In the current code the above behavior is obtained with:

bk clone -@BASE SOURCE [DEST]

It does exactly the same steps with clone.c:clonemod_part1() switching the clone to use a local baseline. And then clone.c:clonemod_part2() does the rollback and pull steps to get the final version.

Unlike the shell version the C version handles all the other options to clone. -q -v -B<BAM url> -rREV --sccsdirs -sALIAS etc…

Nested Behavior

Nested is a bit tricky to get right. The BASE SOURCE and DEST repos can all have a different set of components populated. So the approach is to build the correct final product and then use nested_populate() to get the right components.

So the local BASE→DEST clone ignore -sALIAS and clone_default and just uses -sPRODUCT. Then we undo to the GCA of the product. Now the final HERE file is written before calling bk pull SOURCE.

The way pull_ensemble() works it will use nested_populate() to fetch any components in the final HERE alias that are not already populated. It doesn’t care or notice if these were missing because the remote side edited aliases or if the component was just missing.

The urllist will contain the urllist from BASE, plus all the URLs in the SOURCE urllist. So it will find the components if the clone without the baseline would have worked. The urllist will sort URLs from closest to furthest so all components that exist locally will be fetched from the local sources.

The downfall is if a large component as one cset that only exists at the remote location we will clone the entire component form that remote location. This is unfortunate, but we will fix that in the future. See blow.

Like the shell version, proper cloning of only the remote component tags isn’t done, as the clone -r used by populate brings as much of the component graph as it can. Note, that’s not a limitation of this clonemod, but a limitation of nested populate, so this shortcoming effects our current clone and pulls which clone components.

Future directions

That clone/undo/pull sequence is slow in general and so clonemod currently only makes sense when access to the BASE repo is much faster than the remote SOURCE repo. But we can learn from Rick’s fastpatch and partition code and just start with pull and make the clone/undo happen as a side-effect of calling takepatch.

Takepatch is already weaving the remote deltas into the local weave and so we have already calculated which deltas are local. Then fastpatch could very easily just drop local deltas while write the new sfile. And some fixup code could copy over any sfiles not modified by the pull. This way the clone with local baseline could be just about as fast as a straight localdisk clone and we can start using clonemod by default.

At this point nested_populate() can be extended to automatically use a local baseline for components that are being fetched from a remote machine. This fixes our nested clonemod issue above, but also all component clones.

In a world where the product is small, this gets us a lot savings. But the next step is to allow the original clone of the product to automatically select and use a local baseline. We just need some why to allow the user to record a baseline or make bk able to discover one.

Another thing that would help is an efficient bidirectional key sync. This means on a pull --identical that we would know which tags to strip, since the non-tag is single tipped, but the tag graph after an undo can be multi-tipped.