Estimating Software Effort
By Sydney du Plooy
Estimating the effort it takes to produce a software product is a fairly difficult process. There are a couple of reasons why. They range from management politics to subjective guesses of how long programming tasks will take.
How then do you estimate the effort of a project with such uncertainties?
At the start of the project we should be able to get some idea of what the end-product will look like. From that, we can start estimating how long the bits and pieces will take to develop. Now that seems easy, but estimating how long the bits and pieces will take is not as simple as it seems.
Over- and Under-estimating Effort
Let’s look at Parkinson’s Law. It says that “Work expands so as to fill the time available for its completion.” If the task ends up being easy, then we will waste time and work less hard.
On the other hand, under-estimating effort will result in an unreliable and poor quality system. This is a manifestation of Weinberg’s Zeroth Law of Unreliability which says that “If a system doesn’t have to be reliable, it can meet any other objective.” Many people will make this sacrifice simply to complete the product before the deadline.
When a project starts falling behind, project managers will typically add more people to the project. Brook’s Law explains that “Putting more people on a late job makes it later.” Why is that?
Let’s look at an example. In a team of three members we have 3 “communication channels”. Add two more people and we have a team of five. This means that we now have 10 “communication channels” between people. We calculate it using [n * (n – 1)] / 2 where n is the number of people. Frederick P. Brooks covers this in his book The Mythical Man-Month.
In his book Software Engineering Economics by Barry Boehm he suggests a few ways in which effort estimates can be derived.
Bottom-up breaks the project into its components and those components into its components and so on. Take each of the components and estimate the lines of code and multiply this with some factor that adds fat for complexity. Based on that number, calculate the number of days it will take using a ratio between lines of code and effort. [It is called bottom-up because the effort accumulates upward.]
Top-down is based on two parameters, size and productivity. Effort is then calculated as effort = size * productivity. Where size is an estimate of the number of lines of code and productivity is the time spent by the developer doing the work. The productivity parameter is scaled according the developers experience. There is a more advanced calculation. It uses the least squares regression model which is calculated as effort = constant1 + (size * constant2).
Expert judgement relies on the knowledge and experience of someone who is already involved in the project. This estimation doesn’t only rely on the person but also takes into account similar projects and supplemented by the bottom-up approach.
Case-based reasoning finds the differences and similarities between completed projects, or source cases, with the new project, the target case. Take the similarities and differences and adjust the source cases so that you get an estimate for the target case. There’s a fancy way of doing this by using the Euclidean distance between the cases. This technique is also called estimation by analogy.
Function point analysis assigns a complexity value to each instance for each of the following components, which is then summed to get the function point processing size:
- External input types – inputs that change the internal data;
- External output types – outputs from the system, such as reports;
- External inquiry types – inputs that point the system to information without modifying it;
- Logical internal file types – the information system’s data store;
- External interface file types – input and output exchanged by the information system.
Function points MarkII is an improvement on the original Allan Albrecht function point analysis technique. Create three weightings, one for input (Wi=0.58), one for entity types (We=1.66) and one for output (Wo=0.26). Then, multiply the weightings with the number of elements corresponding to each of the weightings and calculate the proportions of effort. [The values for the variables Wi, We and Wo have been set based on industry averages.]
COSMIC full function points is used for sizing real-time and embedded systems. Typically, these systems are made up of component layers which may communicate with each other. Assign a value to each data group and sum the counts to calculate the functional size units.
There are four data groups which are the inputs and outputs of these components:
- Entries – moves the data group into the component;
- Exits – moved the data group from the component;
- Reads – moves data from storage into the component;
- Writes – moves data to storage from the component.
COCOMO II is a constructive cost model where effort is calculated as person-months based on 152 hours and it’s size is measured in lines of code. Effort is calculated using the formula effort = c(size) ^ k where the constants c and k are dependent on the nature of the product and the development environment:
- Organic mode (small system developed in-house) [c=2.4, k=1 .05];
- Embedded mode (tight constraints, expensive to change) [c=3.0, k=1.12];
- Semi-detached mode (hybrid of organic and embedded) [c=3.6, k=1.20].
The way in which we calculate the effort is dependent on where we are in the development process. During application composition (user interface design) we use the number of physical features of the product such as screens, reports and so on. This is known as object points.
At the early design stage (architecture) we use function points to estimate the size of the system. There is a neat little trick here to convert the function points to the equivalent number of lines of code. To do that, we multiply the function points by a factor for the programming language used.
After we have gathered all the data we can now calculate the effort in person-months using the formula pm = 2.94(size) ^ (sf) * (e1) *…* (en)† where size is the number of lines of code in thousands (kdsi) and sf is the scale factor which is calculated by sf = 0.91 + 0.01 * Σ(exponent driver ratings)†. Exponent driver ratings are there so that we can compensate for the loss of productivity on large projects.
Determine the scale factor (sf) by assigning points from the table below to each of the following exponent drivers:
- Precedentedness – how novel is the system? The more novel, the more uncertainty, the higher the exponent;
- Development flexibility – how easy is it to meet the requirements? If it’s tough assign a higher exponent value;
- Architecture/risk resolution – how likely are the requirements to change? Very likely, up the exponent;
- Team cohesion – are your team members friends? If not, up the exponent;
- Process maturity – do you know what you are doing? If you do, go low on the exponent.
Here’s a table to help you out on the exponent driver values:
†Pssst… I changed the formula a little. There are variables which have been set for many years and so the formulas should really be pm = A(size) ^ (sf) * (e1)*…*(en) and sf = B + 0.01 * Σ(exponent driver ratings).
- Hughes, B. & Cotterell, M. 2009. Software Project Management, 5e. Berkshire: McGraw-Hill Education.
© 2011 Sydney du Plooy
Sydney du Plooy is currently working as a full time analyst developer for a financial institution, while studying part time. You can read more from Sydney on his blog, the third shelf.