COBOL has been in the news quite a lot recently and I have been reading that there are still huge amounts of COBOL code running and being written. This led me to wonder why this language was still being used. I therefore decided to look at a few sites about COBOL and see what they said was good about the language. The main benefits appeared to be that it is portable and self-documenting. Indeed, I often read about how COBOL programmers say that they can go to code written 10-15 years ago and still easily understand what is happening.
In 1994, while at college, I did a year of COBOL. I haven’t touched it since that time, and have barely even thought about it. I therefore thought that this would be a good test. I admit that the premise above, about the ease of understanding code written a long time ago, refers to people with more COBOL experience than I. However, I was curious to see how much I would understand.
The code I have chosen was originally written to run under MS-DOS, but unfortunately I can’t remember which compiler was used. It probably isn’t the best COBOL code ever written, but I hope that it will help me to explore how easy it is to return to old code. At times throughout this article I may well refer to things by the wrong name, please bear in mind that the purpose of the article is to test the premise “Is COBOL really understandable after 14 years?”, not to teach COBOL.
I have put the full code, associated data file, example output and instructions on how to compile and run the program at the bottom of this article.
Analysis of the code
After taking a quick look at the code I remember that COBOL is split into DIVISIONs, SECTIONs and paragraphs. I will go through each DIVISION in turn and try to explain what is happening. I have purposely not looked up anything while doing this to try and test the premise fully.
The IDENTIFICATION DIVISION
This is fairly unremarkable and is just used to identify what the code is and who wrote it.
The ENVIRONMENT DIVISION
This DIVISION is used to specify the environment in which the code will run. The power of this DIVISION is that you can easily change the environment in which the program is running just by making alterations here.
Within the above DIVISION we can see the INPUT-OUTPUT SECTION and within that we can see the FILE-CONTROL paragraph. This is telling the program what names to use to refer to the files and what those files are called. I can see from this that we have a file called TAG2.DAT and another called PRN. Under MS-DOS, PRN referred to the printer. However when this is run under Linux it will just create a file called PRN.
The DATA DIVISION
This DIVISION specifies how data is stored and structured.
The first SECTION here is the FILE SECTION, which is specifying how the files assigned in the ENVIRONMENT DIVISION to TAG2-FILE and REP-FILE are structured.
If we take TAG2-FILE we can see that it’s biggest structure is TAG2-RCD which is representing a record. This record is split into 7 fields. The PIC statements after the names of the field specify the format of the field. For TAG2-ORD-NBR this is 9(4) representing 4 numeric characters. For TAG2-ACC-NBR this is A9(7)A representing an alphabetical character then 7 numeric characters then an alphabetical character. Finally TAG2-DAT has X(6) representing 6 characters of any type.
The next section is the WORKING-STORAGE SECTION. This is where the variables are specified. The variables are set out in the same way as the files. If we look at RESULT-CALC we can see that it is made up of 6 fields each with an initial value of 0. The V used in the last 3 PIC statements represents a decimal point.
The variables can be referred to by any of the names specified above and will include each of the subdivisions of that name. So you can refer to RESULT-CALC and it will also automatically be referring to each of its fields as well, or you can just refer to the fields directly e.g. WS1-BLU-ADD. There is no need to refer to the fields by way of the record. Which does mean that you have to be careful about not reusing names.
The PROCEDURE DIVISION
This is the DIVISION where all the processing gets done. It can be split into SECTIONs so that you can have subroutines. However, this piece of code doesn’t have any SECTIONs within the PROCEDURE DIVISION.
This is opening the files mentioned earlier for INPUT and OUTPUT. Then today’s date is put into WS1-DATE. After that, the fields in WS1-DATE are specified directly, e.g. WS1-YR. These fields are moved into fields from WS1-HDR-AREA, e.g. WS1-YER. Note the Y2K problem here as it is only using a 6 byte date.
A01-OP-LINE is a paragraph label that is used to form a loop. The next line reads a record from TAG2-FILE and if the end of file is reached it jumps to the paragraph A90-END.
This puts the colour of the tag record read in from the TAG2-FILE into SUB-COLOUR which is a field of WS1-SUB-HDR.
This prints a header for the page if more than 57 lines have been printed and also prints a heading for the current colour table. WS1-LCNT is initialised with the value of 70 in the WORKING-STORAGE SECTION, so a header is printed on the first page. The code if fairly self explanatory. The MOVE statements copy the literal or variable being referred to into the variable specified. SPACES represent a full field of space characters. ZERO is the numerical literal 0. WRITE REP-RCD… writes the record specified by FROM to the file REP-FILE which is associated with REP-RCD. The THEN statement will execute everything up to the period, this is called a sentence.
This checks if the record just read from TAG2-FILE is a different colour from the current table colour. If the colour is the same then it skips to the NEXT SENTENCE, i.e. after the period. Otherwise it prints a heading for the table. The AFTER ADVANCING 3 LINES outputs 3 newlines to the file before writing the record.
This copies some of the fields of the record TAG2-RCD read in from TAG2-FILE to some of the fields of the record REP-RCD. The REP-RCD record is then written to the associated REP-FILE. If you look at REP-RCD in the FILE SECTION you will see that each field is seperated by FILLER spaces. This is so that a neat table can be built for output.
This determines which colour the current record is from TAG2-COLOUR, then adds the quantity of the current record to the relevant variable. E.g. for the case of TAG2-COLOUR = ‘BLU’ the quantity TAG2-QTY is added to the variable WS1-BLU-ADD.
This determines the cost to add to each colour’s total. There are a choice of two quantities than can be ordered. Either 24 or 36. It can be seen that here the code should be re-written to have the costs for each quantity stored as a variable for each colour.
This adds 1 to the line count (WS1-LCNT). Then jumps back to paragraph label A01-OP-LINE. A90-END is a paragraph label which is used near the start of the code to jump to when the end of the file TAG2-FILE is reached.
The code finishes by copying the running cost and quantity totals to the fields in the records WS1-RESULT1, WS1-RESULT2, WS1-RESULT3. Then writing a header to REP-FILE using WS-CONTROL-HDR, and writing the quantity and cost for each colour. The files are then closed and the program is stopped.
Summary of the program
The program reads in name tag records from TAG2.DAT and outputs a report to PRN.
Once I had remembered how COBOL is set out, I found this program easy to understand and believe that it would be easy to maintain and expand. I admit that this is only a trivial example, but then I haven’t used COBOL for a long time, so I think it is a fair investigation. I can see from the code why COBOL is so good at processing transactions and in particular batch processing. A similar program written in Java, the new standard in the business world, would be considerably more complex and difficult to understand. That said, I do recognise that while most of the added complexity would be with the code to read in the structured input file, the output report writing would be considerably easier by just using formatted strings.
After writing this article I decided to look at COBOL in more depth. My main conclusion is that COBOL has advanced quite considerably since this code was written. In fact I think that the style of COBOL we were taught in 1994 was already quite dated. I have enjoyed this trip down memory lane, but now want to see if COBOL can offer anything for the present and into the future. The most recent advance in COBOL appear to be an object orientated standard, which offers some considerable improvements while maintaining backwards compatibility. Unfortunately the uptake on this seems to be slow. COBOL does have a bad press, and after looking at modern COBOL, most of the complaints refer to problems that have been fixed over 30 years ago. If COBOL is to halt its decline it needs more vocal advocates to show why it is so good and to help explain the COBOL mindset. One further problem COBOL has long had is the lack of free compilers. There are a couple of projects out there but they need support. The most advanced project for Linux appears to be OpenCOBOL. Another project which, though incomplete, should offer a great compiler in the long term is Cobol for GCC. I wish these projects well, and hope that COBOL can regain the respect it deserves.
The full code (ASSIGN1.CBL)
Uuencode data file (TAG2.DAT)
Compiling and running the program
To compile this code I am using OpenCOBOL under Linux. I compile it to en executable using: cobc -x ASSIGN1.CBL
To create the TAG2.DAT file that is needed copy the uuencoded data above to a file, say tag2.uue. Then run: uudecode tag2.uue
To execute the program, make sure that TAG2.DAT is in the same directory as the executable ASSIGN1. Then run ./ASSIGN1
After running the program you should find a file called PRN has been created which contains the report.
Example report (PRN)
26/08/08 NAME TAG REPORT - 2 PAGE 1
COLOUR - RED
Acct.Nbr. Name on Tag Type Qty.
a0000001a Jj SC 24
a0000002a D.Brown SC 36
a0000003a T.Jones ST 24
a0000004a R.Talbot ST 36
COLOUR - BLUE
Acct.Nbr. Name on Tag Type Qty.
a0000005a R.J SC 24
a0000006a H.L SC 36
a0000007a R.P ST 24
a0000008a J.K ST 36
COLOUR - BLACK
Acct.Nbr. Name on Tag Type Qty.
a0000009a R.P SC 24
a0000010a P.P SC 36
a0000011a Q.Q ST 24
a0000012a O.O ST 36
RED.........QUANTITY - 120 COST $10.50
BLUE........QUANTITY - 120 COST $10.50
BLACK.......QUANTITY - 120 COST $10.50