Approaching Code Access for Distributed Development

[article]
Summary:

There was a time when mainframe development was the norm and teams were in close physical contact only having to walk down a few feet to interact with their colleagues. However, times have changed and access to code has to be considered in a much more serious manner. Some companies have had multiple sites participate in their development efforts for upwards of two decades, but a majority of them have only been at this for the last 5 years or just undertaking this venture. The crux of distributed development is the ability to share code for development across sites via the network or via tape or disk.

 

In the early 1990s I worked at Open Software Foundation (OSF). We had development occurring in our Cambridge, Massachusetts and Grenoble, France development shops as well as distributed development with HP, Dec, IBM, Bull, Siemens-Nixdorf, and others. We used a CM system called OSF Development Environment (lovingly pronounced “ODE”), which was a client/server TCP/IP SCM system based on RCS, which included a build component. ODE originated from the Carnegie Mellon University (CMU) b-tools via development by Glen Marcy, et al., and was extended by Damon Poole (co-founder of AccuRev, an innovative SCM tool released in 1999) making it truly client-server, adding merge tracking, increasing the performance, and making it more transactional in nature.

Even in those early days, ODE had a partner distribution tool called Software Update Protocol (lovingly pronounced “SUP”) developed by Steven Shafer from CMU. At OSF, we used SUP as a code synchronization tool that could replicate and distribute code to and from multiple sites. The SUP utility is still around today included with NetBSD distributions.

Why Distributed Development
Beginning in the mid-eighties, there has been continued actions by companies to acquire or merge with other companies. This led to companies having multiple sites with expert talent. When development projects are initiated, the talent needs to work together in a distributed yet effective manner. In the early days of code sharing, this would often be done via tapes and disk. Then file transfer protocol (FTP) or similar protocols came into existence, which allowed for sending or retrieving bundles of files via the network. With the advent of high bandwidth networks, this has allowed the possibility of real-time sharing of files without the need of transporting or sending the files. Personnel simply logged into a system from their remote site. However, while the network is broader than before, serious throughput and performance issues continue to exist.

Around year 2000, companies not only needed their own company talent accessible to code, but there also began a huge push in the industry to utilize low cost resources (e.g., developers and testers) from countries like India. This drive to lower cost has led to a rapid increase in distributed development. Based on a 2003 survey of IT executives, the number of companies outsourcing applications to offshore service providers was expected to grow by 50% in 2004, with 35% of all users to outsource some piece of IT to offshore resources. This expectation has become a reality particularly with India.

Distributed development, as many of us know, is upon us. It is important for us to focus on how we can manage the distributed nature of projects. While there are many aspects to managing a distributed development project, this article will guide you through an approach for ensuring you have selected the appropriate distributed code access technology based on your distributed development needs.

Advice on Approaching Distributed Development
When considering your needs for code access in a distributed environment, it is important to perform an analysis of your situation. From this, then it is important to understand what distributed code access technology approaches there are and which may be best suited for you. From there, they you can identify a specific technology within a distributed code access technology approach. Distributed Analysis
The first aspect to approaching distributed development is performing a Distributed Analysis. This identifies the characteristics of each application or product that is being considered for distributed development. For each application:

·       Identify the number of developers that will participate in development activities from each site

·       Identify when in the lifecycle the additional sites will begin and end their work

·       Identify the complexity of the development technology used. Complexity may be derived by how much RAM the development technology requires to run and how network intensive it is. Complexity levels can include both low and high complexity.  Low complexity development technology that has low RAM requirements, low network dependency or few network transactions, and ascii-text based development. High complexity development technology that have high RAM requirements, high network dependency or constant network transactions, and object based development.

Distributed Direction

Once you have completed your analysis, consideration should be given to appropriate distributed code access technology approach. The application characteristics from the distributed analysis should drive the decision. Distributed Code Access Technology Approach
Identify the technology approach in which code may be retrieved from the remote sites for development. The approach selected becomes the requirements for a technology that will support that approach. There are two primary categories of distributed technology. They include:

 

·       Distributed site:  This is when the code physically resides in two or more locations. This is applicable for all complexity (e.g., low to high) development technologies and recommended for medium to high complexity. This can be implemented in two ways (and not limited to):

1.     Remote client snapshot:  This is when the application code is populated (checked out or retrieved) directly from the local server to the client at the remote site. The initial population of the code baseline to the remote client may take time and varies according to the network performance across sites. However, once a snapshot of the full baseline is on the client, single or multiple checkouts/check-ins of code may be relatively quick (pending any network performance issues). This method requires low setup effort, has low network dependency (unless interacting with the server at the local site for version control or retrieval operations), and has the client working on their own without continuous reliance on the WAN or LAN. This may be recommended for projects where a small to medium number of people from remote sites are working with the local site

2.     Remote server repository:  This is when the application code is populated (replicated or retrieved) from the local server to a central server at the remote site and the remote clients at the remote site retrieve code from this central remote server. The initial population (or retrieval) of the code baseline to the remote server may take time and varies according to the network performance between sites. Once the full baseline is on the remote server, periodic changes (from each site) are replicated/retrieved relatively quickly pending network performance. The clients at the remote site use the server at the remote site to version control or retrieve the code baseline with no WAN dependency. This method requires medium setup effort, has low to medium network dependency unless the code repositories are being updated by each other, and has the client working on their own without continuous reliance on the WAN. This may be recommended for projects where a medium to high number of people from remote sites are working with a local site. 

·       Single site with distributed site access: This is when the code resides only at one site and all remote sites use the resources of the local server and/or clients where the code resides.  It is applicable for low to medium complexity development technologies and recommended for low complexity development technologies depending on network performance. This can be implemented in two ways (but not limited too):

1.     Terminal emulation:  This is when remote personnel remotely logon to the local systems to perform development work. The remote personnel perform version control on the local server or client similar to what the local personnel do. This allows personnel to remotely log on directly to a local system where the code resides and is worked on. This method requires low setup effort and has high network (WAN) dependency. The number of personnel at remote sites that may work in this setup is directly proportional to the network bandwidth and performance to the local site, but is typically a low number of personnel. This is only recommended for low complexity development technologies.

2.     Terminal services:  This is when remote personnel use a local terminal services server or client to host the development technology and application code, which comes directly from the local server. The activity from the local server or client is viewed via the remote client with low network utilization. The remote personnel would perform version control on the local server or client as if it were their own client. This technology minimizes network bandwidth challenges and allows personnel to remotely utilize local systems to access and work on the code on a local system. This method requires medium setup effort and has medium network (WAN) dependency. This may be applicable for a small to medium number of personnel at remote sites working with the local site. This is only recommended for low to medium complexity development technologies.

Summary

It is important to understand that there are many ways for a distributed development team to share code. An analysis to identify the project characteristics should be input to the decision to determine which approach to take. On the other hand, selecting too simple of a distributed code access technology approach may limit or constraint development due to poor performance. On the one hand, selecting too complex a distribute code access technology approach may lead to more administrative effort that can overwhelm a small team. Ensure you select the distribute code access technology approach that is right for your development team.

For more information on how to establish a more thorough global distributed development/SCM strategy for your needs which includes working templates to help you through the distributed analysis and distribute direction process, consider reading section “5.3 Define a Global SCM/Development Strategy” in Chapter 4 (“Establish an SCM Infrastructure for an Application”) of the book the “Software Configuration Management Implementation Roadmap”. In the same book, consider reading the section titled, “7.5 Establish the Global SCM/Development Infrastructure” for guidance on implementing the distributed code access technology approach you selected.

References

1. Section “5.3 Define a Global SCM/Development Strategy” in Chapter 4 (Establish an SCM Infrastructure for an Application), p90-95 of the “Software Configuration Management Implementation Roadmap”. by Mario E. Moreira, 2004, John Wiley & Sons, Ltd Publishing

2. Section “7.5 Establish the Global SCM/Development Infrastructure” in Chapter 4 (Establish an SCM Infrastructure for an Application), p128-130 of the “Software Configuration Management Implementation Roadmap”. by Mario E. Moreira, 2004, John Wiley & Sons, Ltd Publishing

3. “Managing Distributed Software Development”, by Randy Guck, StickyMinds (www.StickyMinds.com), November 2002

4. “AMR Research Survey Results: 35% of Users To Outsource IT Work to Offshore Companies”, by Lance Travis, October 2003

5. Software Update Protocol (SUP)

 

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.