How to Think Like a Programmer: Part 2

Part II: Building an Application from Scratch

While many business contracts involve supporting existing applications, you may also be asked to create entirely new software – particularly if you are writing it for yourself. In some cases an 'upgrade' may consist of starting again with the same requirements, particularly in languages where the codebase is often small.

Getting a Spec

It's important before you start that you have some idea what it is that the client expects the finished software to do. With experience you may be able to spot requested features that will be impossible or very difficult at this stage and attempt to negotiate them away, but you should be able to spot the features that are the core of the problem and those which look the hardest to implement right away.

At this point you do not need a formal specification document. (The client's bureaucracy may require it, but you should hide it under your desk if so.) The client will not know exactly what he wants until he sees what you are making, and a formal spec will likely miss out some essential point that he remembers two weeks later. Writing software to a rigid spec is the cause of a lot of business problems.

What you do want is a discussion with the client, preferably with the people who will actually be using the system. Make sure at this point that you understand what the system is supposed to do. Get the client to show you the system you are replacing or upgrading, if there is one, so you can get a feel for the type of things your program is going to have to do (i.e., if you're building an algorithmic trading program, get them to show you their trading screens, some sample trading rules that traders use, etc).

Even if you're writing the software for your own use, you should think about what it is you want it to do, and think your way through the problem. It may be helpful to make diagrams as if you were explaining the system to someone else, or talk about it with a business partner.

Split up the Problem

Any serious system is too big to think about as one problem. You need to identify separable parts as sub-problems that you can build and test individually. The correct size of subproblem is rather subjective; small enough you can understand it, but large enough that (a) each part is a coherent whole and a solution to a real problem, and (b) there are few enough of them you can think about the interactions between them. More than about 10 parts would suggest that either the problem is too complex for one person to deal with, or that you had divided the problem up too finely.

As an analogy, think about building a car. It's not helpful to take on the whole car at once, and nor is it helpful to think of it in terms of fibres of rubber and slivers of metal. It is best to think of it in terms of wheels, driveshaft, engine etc. Determining how 'big' a software component ought to be can be difficult and it will come with experience. A component should be something which contains many links (function calls) between code within the component, and relatively few to other components. A problem may be best split up into multiple tiers of subcomponent; for example with the car, a wheel would be a component but to manufacture a wheel means you need to think about the axle, brake discs, the hub and so on.

Start designing from the fixed points: things the program must do. Typically the user interface will be one of these, so you'll want a front end module, although for server applications this may not be the case. (Web sites are something of an in-between case; you are writing a user interface – the web site – but you are also writing code that will be executed on a server and not run directly by the user.) If you are accepting data from, or posting data to, an external source (like an exchange), you will need a data I/O module – although for web servers this is done for you. If there are certain common operations which you know for sure will be needed, perhaps statistics or file handling, you may want a module dedicated to such utilities. This is commonly called a library and commonly required ones will be available for most languages. Finally, you will want one or more application logic modules, where the data is processed.

Bear in mind that you may sometimes be asked to write one component (a maths library, a link between applications, a new set of trading rules). In that case you won't have to worry about this, but you will need to be able to recognise when this is the case.

A component in Java or C#, or another modern object oriented language, will usually translate to a tightly coupled group of classes – a package (Java) or a namespace (C#). For some cases using C# you may want a component to be in a separate assembly, particularly if it is a library or other reusable code. A component in some languages, such as JavaScript, C or many minority languages, will usually translate to a script.

Order of Coding

Only when you have some idea what the software should do should you start coding. Then, using the component-based plans you've thought up above, start coding, beginning with the components which have the most fixed requirements. (Remember that at this point the client doesn't necessarily know exactly what he wants, so most of the requirements will be subject to change.) The ones to start with are:

Function libraries based on printed material, e.g. a collection of financial formulae. These functions have a fixed formal definition which will not change.
I/O, particularly if the data source is a publically documented exchange. The protocol and data format in this case is very unlikely to change, although which data you are trying to capture may do so.

You may also want to create a UI shell at this point, though it won't have the functionality it will need yet, as you can discover things you forgot to mention when the client says "Where's the button to do X?". This depends on the project you are doing (sometimes there will be no UI in the final design), the client's way of working and your own preferences.

When you have a component provisionally completed, test it. This is the single most important point in the whole process! If each (relatively simple) component is tested and certified to individually work correctly, there is a far better chance that the whole application will work correctly. Work out what data will be passed in and out of the component, and create simple data-driven tests to find out if it is working correctly. This can be as simple as:

some_data: some test data
expected_answer: what the data should turn into, calculated by hand or through a different program
test_succeeded: answer = Component.DoSomething(some_data)

It's better if conforming data and the expected answer can be produced algorithmically so you're not always testing the same numbers. For example, if there is a public website that will allow you to run the same algorithm you are trying to test, or there is a well known algorithm that you are trying to optimise. However, for many applications there is no 'gold standard' to test against, so you should use some representative but constant data and answers. At least that will tell you if you have a major problem.

Once you have built (or found), and tested, the components for which the requirements are fixed, it is time to start on the application logic. Ideally, you will want to be doing this with the client observing so you can find out where the original design wasn't quite right instantly; you'll certainly want to have your work checked frequently.

Coding Application Logic

For how to actually convert ideas into code, refer to the 'How to code' resources for the language of your project. However, there are a few useful hints I can give you at this point:

Code defensively. Defensive coding is such that bad input, or an unexpected message, won't crash your application or (possibly worse) result in inconsistent or damaging output (like a financial system making a huge and unwanted trade). In some cases speed requirements will result in the checks and so on being taken out later, but you should always assume that anything which goes outside your component could be broken. This is particularly important when communicating with external systems: a database, an exchange, the Internet or (particularly) the user.
Avoid code duplication. Code duplication is a real problem if the requirements change, because it's very easy to end up with two functions that you think do the same things, but actually don't any more. Take common code out into a function which is called from several places, instead of copying the code in several places. In the case of classes in Java or C#, if you have two classes which do similar things, use inheritance to put the common code in a base class. This can take a little time sometimes, but it will save you (or whoever inherits the system) far more time in the future.
Think about what your code should do, not how it's doing it, and comment it. In most programming languages, the structure and limitations of the language mean that the purpose of code is not instantly recognisable except to a very experienced reader. For example, to perform any tasks on every item of a list requires a loop – which is purely language grammar and not related to the problem at hand. Comments help you, and others, to understand the purpose of the code.
Separate UI and data processing. The classes which are visual components should not store application data; they should be a view on the data, which is stored elsewhere. They can store a copy of the data, and often will, to make it easier to display the data, but the primary data store should be elsewhere. (In the case of a web application, the data may well be in a database, i.e. outside the application entirely.) If the UI includes the ability to change data, instead of just viewing it, those changes should be made by asking the data store to make them. This point is very important if the data can, now or in the future, be viewed in different ways (i.e. through your application, through another visual application, via a HTTP interface, at the console).