Introduction to File Processing
The information and exercises in this chapter will introduce you to the next four chapters, all of which discuss file processing, a basic requirement of data processing. When you finish with this chapter you should be able to:
- Understand the process and purpose of accessing a file.
- Name three different types of file (internal, external and display), and the characteristics and purposes of each.
- Understand the purpose of the OPEN and CLOSE statements.
- Explain three ways that you can use data files (input, output and outin).
- List and describe the three different methods of access (sequential, relative and keyed).
- Explain the difference between a READ data statement and a READ file# statement.
- Briefly describe the uses of the six I/O statements.
13.1 The DATA file
If you think back to the chapter on memory structure, you might remember that programs are only one of many types of files that you can save in a computer’s permanent memory. Another kind of file that you can use is a data file. Just like the name implies, a data file is used to store data. It will hold onto any information that you tell it to hold, and it’ll allow you to take it back or change it as well. Data files can be created through BR itself, a text editor, or any third party program editor.
What kinds of information does a data file normally hold? Often it contains statistical information - about organizations, customers, cities, countries, sales records, or just about anything else you can think of. It might contain names and addresses for a mailing list, for example. Perhaps it holds a whole year’s worth of miles per gallon information for a particular car or set of cars. Statistical information, while very common, is not the only information that a data file can hold. Data files can hold messages, paragraphs of text, or just about anything else you would want them to hold.
The programming statements and techniques that allow you to create data files and add to, change, or access them are referred to as the file processing part of Business Rules!. File processing will be the topic of this and the next four chapters.
Types of DATA files
This tutorial will focus on two different types of data fields: internal and display. (There is also a third type of data field, external, but it is beyond the scope of this tutorial.)
The main difference that you will notice between internal and display files is that Business Rules! will not allow you to directly look at or print out an entire internal file (even though you can access individual records and fields within the file). This is because internal files are designed to be maximally efficient; they utilize the extended character set and a format that is meaningful to the system but that makes no sense to most people.
Business Rules! designs display files so that they both store information and are easy for people to read. Once you have created a display file, you are able to print the entire file with the Business Rules! TYPE command.
The TYPE command
The Business Rules! TYPE command tells the system to display the entire contents of a file on the screen or send it to the printer. If you wish to try this command out right now, you can enter the following (Access your supplemental programs to do this). This command will instruct the system to display the file DISFILE on your screen.
If you wish to send the same file to the printer, you can add the PRINT parameter to the end of the command:
TYPE C:\DISFILE PRINT
The syntax for the TYPE command is as follows:
TYPE drive:\file name.ext (Ext only if an extension exists)
The benefits of internal files are that Business Rules! operates much more efficiently and provides you with several extra methods of access when you use them instead of display files. (The term access describes the process involved in passing information into or out of a data file.)
How do programs use data files?
While the fact that this tutorial devotes more than four chapters to the topic of file processing may make it sound complicated, the concept behind it really is not all that difficult to understand.
When a program requires the use of a data file, it uses an OPEN statement to “open” the file for access. When the program is done with the file, it uses the CLOSE statement to “close” the file. A single program can work with up to 1,000 open files at a time. (OPEN and CLOSE statements will be discussed in the next chapter.)
There are only three reasons that a program would even want to open a data file: for input, for output, and for outin, which is a combination of both.
You must always identify one of these three types of uses in the OPEN statement when you open the file, so it is important for you to understand which is which.
When a program opens a file to get information that it needs, the file is open for input (because the file will input information into the program). When a program opens a file to send it new information, the file is open for output (because the program will output information to the file). When a program opens an internal file to change information that is already there, the program is open for outin (because the current information will be input into the program, and the changed information will output back to the program, and the changed information will output back to the data file). Internal files can be opened for outin, whereas display files cannot. (The information in display files can still be changed, but through a technique you will learn about in the display files chapter.)
It is important for you to be clear on the meanings of the word input and output; beginning (and advanced!) programmers frequently get them mixed up. It may help you to think of the program on the screen as the central processing point. Information can be input into this program, or it can be output from this program.
The exercises in the Quick Quiz should help you differentiate between all three types of file use.
Quick Quiz 13.1
1.You wish to send information from a program to a data file. How will you be using the data file?
a) For input.
b) For output.
c) For outin.
2. Your program needs information that is contained in a data file. How will you be using the data file?
a) For input.
b) For output.
c) For outin.
3. You wish to change the information that is already in an internal data file. How will you be using the data file?
a) For input.
b) For output.
c) For outin.
4. A data file receives information from a program. What does it consider this information to be?
5. Which of the following types of file use is valid for internal files but not display files?
6. A data file will never receive which of the following?
7. A program will never receive which of the following?
8. True or False: It is impossible to change the information in a display file.
13.2 Structure of a data file
Earlier we told you that internal files cannot be seen; Business Rules! stores their contents in a format that cannot be understood by people. This is true, but it makes it hard for you to understand how an internal file is organized. So we’re going to ask you to pretend that the structure of most files resemble the one in the following example (because they would, if in fact internal files were designed to be easy for you to read):
|051887||Moby Dick||Melville Herman||569|
|070487||Love Story||Erich Segal||234|
|071787||Third Wave||The Toffler Alvin||689|
Let’s imagine that the above file contains information about all the books that Murray Kressel has read since April, 1987. Each of the lines across the page in the file contains information about one book that Murray has read; this line is called a record. How many records are in the above example?
Each record in the above example is divided into four fields of information. A field is a portion of a data record which is reserved to store a particular type of data (i.e. name, social security number, date, etc.).
The programmer who defines a data file is free to determine the length, placement and content of all fields in a record. In the above example, the date that the book was finished is held in the first field. The name of the book, the name of the author, and the number of pages in the book are held in the second, third and fourth fields. There is no limit to the number of fields that a record can hold. Programmers generally use a record layout sheet to keep track of the types and lengths of fields in a record; you will see an example of such a record layout in an upcoming lesson.
You may be wondering why the date field (042787) contains no slashes or dashes between the day, month and year such as you are probably used to seeing. This is because the data was output to the file as a single numeric field (rather than character), and hyphens or slashes are not allowed in numeric fields.
Although the above sample file holds only four records, most data files consist of many more.
An important thing to understand is that several programs can be designed to use the information in the same data file. You could for example, write one program to add up the total number of pages that Murray has read. You could write another program to list just the names of the books Murray has read, and you could write a third program to add up how many books Murray read in July. Each of these programs would need information from different fields in the sample file.
The way that the program gets the information it needs is by accessing the data file.The next lesson will provide details on the three methods of access-sequential, relative and keyed, that Business Rules! allows you to use with data files. All three of these methods are available for internal data files, but only the sequential access method is allowed for display data files.
Quick Quiz 13.2
1) True or false; The programmer who defines a data file is free to determine the length, placement and content of all fields in a record.
2) What will programmers generally use to keep track of the types and lengths of fields in a record?
a) A data folder.
b) A folder layout sheet.
c) A record layout sheet.
d) An input record sheet.
3. In the data file DISFILE, how many records are there?
13.3 The three methods of access
Let’s consider a data file called MILES.DAT. Imagine that the file contains 100 records of information concerning the mileage performance of a certain car, and you decide that you would like to tally up the car’s overall fuel efficiency.
First you should try to imagine what the MILES.DAT file might look like. Let’s assume that it contains 100 records, each with four fields: date (sans slashes), total miles traveled since last gas fill, total gallons used since last fill, and total miles per gallon since last fill. The following example (showing just five of the one hundred records) should help you visualize the file:
Now imagine that there is a file pointer associated with each open file, which is a Business Rules! internal variable that starts out at the beginning of the file when it is opened; it then moves throughout the file as information is used and read to keep track of which piece of information might be accessed next, hanging around in the file. This file pointer keeps very close track over which record is to be accessed next.
To find out the overall fuel efficiency, you need to write a program that totals the miles and gallons, and then computes overall miles per gallon. In order to do this, the program must access the fourth field of every record within the MILES.DAT data file (bold in the example)
The way that it could do this best is with the sequential method of access. This method allows you to access every single record in a data file in the same order that the records were entered. The file pointer, in other words, starts at the beginning of the file and moves down one record at a time as each one is read (or accessed) until the end. The first record that was entered would be the first record accessed, and the last record entered would be the last record accessed.
Now let’s imagine that you used a high-octane fuel in your car on the last thirty tank fills, and you would like to see how the average mileage for just these thirty fills compares to the overall average. You could use the sequential method of access for getting at these figures, but it would mean accessing all 70 of the previous records first--and this would be a big waste of time.
The relative method of access would be a better choice. This method allows you to indicate which record you would like to access according to its numeric position within the file. In the MILES.DAT data files for instance, information about the 71st tank of gas would be held in the 71st record in the file. With the relative access method, the programmer could move the file pointer to begin with the 71st record.
Now let’s consider another situation. You started to notice sometime in the beginning of July that you were able to travel fewer miles than usual on a single tank of gas. You suspect that this trend held out until the end of the really hot season in late August, and you wish to find out if your suspicions are true. The only problem is that you can’t remember which tank-full of gas you were on (the 18th? the 25th?) when you started noticing the difference.
There is one thing you do know however, and that is that you would like to find out the average mileage for all tanks of gas used between July 1 and August 31.
The sequential method of access would again be a substantial waste of time for this programming problem. And the relative method cannot be used because you do not remember the exact position of the record you want to access.
The keyed method of access could help you find the right records. This method causes the file pointer to locate the record you want starting with the one containing 070196 in the data field. If your guess about the data was wrong and the system finds that there is no record with such data, this method even lets you access the first record containing a data value higher than the date you specified.
Quick Quiz 13.3
1) Sequential Access allows you to:
a) Access every single record in a data file in the same order that the records were entered.
b) Use the file pointer to locate the record you want.
c) Indicate which record you would like to access according to its numeric position within the file.
d) List all records in order beginning with the first record entered and ending with the last record entered.
2) If you know which record of information you would like to use, which method would be the most efficient to use?
a) the keyed method.
b) the sequential method.
c) the cursor method.
d) the relative method.
13.4 Manipulating the data file
You have already learned that data files are opened for input, for output or for outin depending on whether a program needs to get information from, send information to, or change the contents of the data file. Opening the data file is the first step a program must take before accessing the file. The next step is to execute the statement that does the actual getting, sending or changing of information.
Business Rules! provides you with six input/output (I/O) statements for the processing of internal files: READ, REREAD, WRITE, REWRITE, RESTORE and DELETE.
READ and REREAD are used in the getting of information from a data file (the file is opened for input or outin);
WRITE is used in the sending of information to the data field (the file is opened for output or outin).
RESTORE is used to reset the position of the file pointer. RESTORE can be used when a file is opened for output.
REWRITE and DELETE-along with all the other I/O statements--are used in the changing of information in a data file (the file must be opened for outin).
The processing of display files involves the use of the PRINT, LINPUT and INPUT statements.
READ data vs. READ #filenum
Previously we told you that the READ statement can be used to obtain the information in a data file. This type of READ statement (sometimes called READ #filenum) is much different from the READ statement that you have already learned about (the one that works with DATA).
This type of READ works only with file processing; the other type works only with DATA statements. The big difference is that the data file READ statement always has a number sign (#) following the space after the READ keyword. The # indicates that a file number is following. This must be the same number as was used in the corresponding OPEN statement. For example: READ #348
Quick Quiz 13.4
1. What are each of the following statements is used for?
|1. READ/REREAD||a) reset the position of the file pointer|
|2. WRITE||b) getting information from a data file.|
|3. RESTORE||c) are used in the changing of information in a data file, but the file must be opened for outin.|
|4. REWRITE/DELETE||d) sending of information to the data field.|
2. What must the READ statement (sometimes called READ #filenum) always have following the space after the READ keyword?
a) a comma (,).
b) an asterisk (*).
c) a number sign (#).
d) a dollar sign ($).