Source Control HOWTO: Repositories


text. As an ancillary benefit, the VCDiff algorithm compresses the data at the same time.

Binary deltas are a critical feature for some SCM tool users, especially in situations where the binary files are large. Consider the case where a user checks out a 10 MB file, changes a few bytes, and checks it back in. In CVS, the size of the repository will increase by 10 MB. In Subversion and Vault, the repository will only grow by a small amount.

Deltas and Diffs are Different
Please note that I make a distinction between the terms "delta" and "diff"

  • A "delta" is the difference between two versions. If we have one full file and a delta, then we can construct the other full file. A delta is used primarily because it is smaller than the full file, not because it is useful for a human being to read. The purpose of a delta is efficiency. When deltas are done at the level of bytes instead of textual lines, that efficiency becomes available to all kinds of files, not just text files.
  • A "diff" is the human-readable difference between two versions of a text file. It is usually line-oriented, but really cool visual diff tools can also highlight the specific characters on a line which differ. The purpose of a diff is to show a developer exactly what has changed between two versions of a file. Diffs are really useful for text files, because human beings tend to read text files. Most human beings don't read binary files, and human-readable diffs of binary files are similarly uninteresting.

As mentioned above, some SCM tools use binary deltas for repository storage or to improve performance over slow network lines. However, those tools also support textual diffs. Deltas and diffs serve two distinct purposes, both of which are important. It is merely coincidence that some SCM tools use textual diffs as their repository deltas.

The Evolution of Source Control Technology
At this point I should admit that I have presented a somewhat idealized view of the world. Not all SCM tools work the way I have described. In fact, I have presented things exactly backwards, discussing tree-wide deltas before file deltas. That is not the way the history of the world unfolded.

Prehistoric ancestors of modern programmers had to live with extremely primitive tools. Early version control systems like RCS only handled file deltas. There was no way for the system to remember folder-level operations like add, renaming or deleting files.

Over time, the design of SCM tools matured. CVS is probably the most popular source control tool in the world today. It was originally developed as a set of wrappers around RCS which essentially provided support for some folder-level operations. Although CVS still has some important limitations, it was a big step forward.

Today, several modern source control systems are designed around the notion of tree-wide deltas. By accurately remembering every possible operation which can happen to a repository, these tools provide a truly complete history of a project.

What can be Stored in a Repository?
People sometimes ask us what kind of things can be stored in a repository. In general, the answer is: "Any file." It is true that I am focusing on tools which are designed for software developers and web developers. However, those tools don't really care what kind of file you store inside them. Vault doesn't care. Perforce, Subversion and CVS don't care. Any of these tools will gratefully accept any file you want to store.

Best Practice: Checkin all the Canonical Stuff, and Nothing else
Although you

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.