Index Facility: Difference between revisions
Line 47: | Line 47: | ||
===BADKEYS=== | ===BADKEYS=== | ||
{{:Badkeys}} | |||
==BTree Facility== | ==BTree Facility== |
Revision as of 20:35, 29 May 2013
Sometimes called key file processing, index file processing, or ISAM (Indexed Sequential Access Method), Business Rules! index facility is a powerful, yet easy-to-use collection of features. The major concept underlying this index facility is building a key file based on a particular field or set of fields in the master file. This key file allows you to selectively access particular records in the master file by knowing only the value of the key field -or sometimes just the first part of the key field.
This section presents an introductory overview of Business Rules index facility. For more detailed information, you should also see the DELETE, OPEN internal, READ, RESTORE, REWRITE and WRITE statements; the Index command and the KLN, KPS and KREC functions.
Why use indexing?
Business Rules index is a powerful tool that can help you accomplish the following tasks quickly and efficiently:
- 1.) On-line maintenance.
- 2.) Sequential reports by key field.
- 3.) On-line queries.
- 4.) File organization.
Terminology
The following terminology is used in discussing the index facility:
Duplicate keys
A duplicate key is a key field that is exactly the same in two or more records in the master file. The Index command's DUPKEYS parameter allows you to choose whether or not to receive a warning message if duplicate keys are found in the index file. Including duplicate keys does not cause duplicate entries in the index file, as the relative numbers for each entry are different.
With or without the warning message, the Index command always creates an index file which always includes the duplicate keys. The Index command's LISTDUPKEYS parameter can be used to list all duplicate keys. This optional listing is sent to the screen by default, but can also be sent to a printer or display file.
Index command
The Index command is the instruction that tells Business Rules to begin the indexing process. This command is used both to create a new index file and to reorganize existing files. See Index for the complete description of its syntax. Business Rules will execute the Index command when it is issued in any one of three ways: from READY mode, from a procedure file, or with the EXECUTE statement.
INDEX file
An index file is created when an Index command is executed or (less frequently) when an OPEN internal statement creates a new master file. Also called a key file, the index file helps a program quickly access the information in a master file.
An index file contains a key field and an address (or relative record number) for each record in the master file, similar to the way in which a telephone directory contains the name and address of each resident in a city.
Key field
A key field is the section (or sections) of the master file record that identifies that record. As an example, if you wish to access a particular record by specifying a customer name, the customer name would be the key field; it would exist both in the master file and in the index file.
Business Rules allows you to join up to six sections of a record (in a combined total of not more than 128 bytes) to form the key field. When more than one section of a record is used to form the key field, it is frequently called a split key.
Key file
Another name for an index file.
Master file
The master file is the data file for which the index is created. It contains a large number of unordered records that must be accessed quickly by a program.
Multiple Index Files
A master file can have more than one index file, each with a different key field. Using multiple index files allows you to access the master records using different key fields at different times.
Primary and Overflow Areas
Business Rules always searches for a key field first in the primary area and then, if needed, in the overflow area.
Relative Record Number
The index file generally consists of two parts: a primary (or sorted) area and an overflow (or unsorted) area. A relative record number is the numeric position of a record in the master file. The first record in a master file has a relative record number of 1; the 20th record has a relative record number of 20. This number is stored in an index file along with the key field.
Split Key
When more than one section of a record is used to form the key field (up to six sections).
BADKEYS
The INDEX command may now specify BADKEYS or LISTBADKEYS to effect the same processing as DUPKEYS and LISTDUPKEYS performs. If both LISTBADKEYS and LISTDUPKEYS are specified, then all bad keys are listed ahead of dupkeys. They can be distinguished by whether or not they are duplicate. Badkeys are keys that specify string Y2K (baseyear) key values where the master record has non-numeric data in the field.
BTree Facility
The BTREE_VERIFY statement is now operational. Note that the keyword previously was BTREE2VERIFY.
To activate the BTREE2 facility, specify OPTION 22 in the BRConfig.sys file. For more details, see New Btree Facility under 3.90 Initial Changes below.
The BTree index facility is included. It is referred to as BTree2. BTree2 provides significantly faster (25 - 35%) performance in shared file processing. We have had a few reliability problems with the present BTree in situations where many concurrent updates are performed. Additionally, our users have experienced other media, network, and disk controller weaknesses that have corrupted indexes. So we have added diagnostic capabilities to BTree2.
A new BRConfig.sys statement is supported:
BTREE_VERIFY (master-filename) [ OFF ]
This statement causes BR to separately audit the structure of the current index branch after each operation that alters an index relating to the named file. Obviously some performance penalties are in effect when processing files in diagnostic mode. However, it can pinpoint what update operations are failing when failures occur.
Also, the INDEX command has a new keyword, VERIFY, that can be used in place of REPLACE or REORG. If used with a BTREE2 file, the index will be audited instead of being rebuilt. Furthermore, REORG, in relation to BTREE2 INDEX commands, means "audit and if needed rebuild."
To activate this feature, set OPTION 22 in your BRConfig.sys file. The system will automatically create new indexes in BTREE2 format and will keep track of which files utilize BTREE2 indexes. Note that BR does not permit mixing Btree index types within related files (multiple opens of the same master file with different indexes).
Indexing and sorting were significantly improved with respect to speed.
The INDEX command may now specify BADKEYS or LISTBADKEYS to effect the same processing as DUPKEYS and LISTDUPKEYS performs. Bad keys are keys with invalid Y2K data in them. If both LISTBADKEYS and LISTDUPKEYS are specified, then all bad keys are listed ahead of dupkeys. They can be distinguished by whether or not they are duplicate.
The creation of indexes can be interrupted with Ctrl-A, and can then be resumed with GO.
Structure
Index files are special internal files that can be read, written to, and processed as normal Business Rules internal files for special processing needs (although this is rarely done).
An index file generally consists of two parts: a primary (or sorted) area and an overflow (or unsorted) area. Immediately after the index file is created or reorganized, all the records in the master file are represented by sorted entries in the primary area of the index file. Whenever a record is added to the master file, or a key field is changed, an entry is also sent to the overflow area of the index file. These entries are not in sorted order. This means that processing time can be improved by periodically reorganizing the index file so that all records are in the sorted portion of the index file (see especially the REORG parameter on the Index_command).
An index file record is composed of the master record's key plus the master record's relative record number, in binary (B4) format. The following statement reads the key file of a master file with a 10-character key:
80 READ #1, USING "FORM C 10,B 4": KEY$,RECNBR
Although the above information can be helpful in unusual cases, most programmers do not need to know details about the record layout of an index file because the built-in features of the index facility are quite sufficient.
Sample Illustration
The following table illustrates the set-up of an index file as compared to a master file's relative record number and key field. The key fields from the first seven records of the master file, as demonstrated by the relative record number (first column), are shown. The key field of each record is a two-character state name (second column). When the index procedure has been performed on the master file, an index file is created which contains the state names in alphabetical order, followed by their relative position in the master file (third column).
If you had the master file shown in the above table, but not the index file, the retrieval of a particular record would have to be done sequentially. For example, if you wrote a program that needed to access the record for New York, the program would have to read each record sequentially, starting at the beginning of the master file, until it found the record for New York. In this example, that would require the program to read only five records to locate the proper one, but in long master files, the process can be very time-consuming and cumbersome.
If, on the other hand, the master file has an index file, retrieval can be done using the key-indexed access method. To find the record for New York using this method, the program goes to the index file and searches it for New York. This search does not take long, because the keys are arranged in ascending (alphabetical) order, and a binary search is used. The program then reads the corresponding relative record number for New York, which is 5. This number tells it to go directly to the fifth record in the master file; it does not have to search the master file from the beginning.
Creating With Index
There are two ways to create index files: with the Index_command and with the OPEN internal statement. This section addresses the use of INDEX. See the OPEN internal statement (particularly the KPS and KLN parameters) for more information about the second method. Before you run INDEX, you must know the four required parameters: the name of the master file, the name of the index file, the starting position(s) and the length(s) of the key section(s) of the key field you wish to use. See the Index_command for complete details of all the required and optional parameters.
Of the optional parameters, the most frequently used is REPLACE, which indicates the old index file should be deleted and replaced by this new file.
The REORG parameter greatly speeds up the Index_command because it reads only the index file (not the master file) and does not resort the portion of the index file which is already sorted. In the special case where an index file is already completely sorted, using the REORG parameter means INDEX will run very fast because it quits as soon as it determines that there is nothing to sort.
Other optional parameters specify where workspace should be found (usually defaulted to the current directory), whether a warning message should be provided for duplicate keys (DUPKEYS), whether duplicate keys should be listed (LISTDUPKEYS), whether the screen or something else should be used for LISTDUPKEYS and whether the alternate collating sequence should be overridden (NATIVE).
When using the Index_command from within a procedure file, Business Rules also accepts System/23 format statements that control the indexing process. We do not recommend, however, that you use System/23 format when writing new programs.
Once the Index_command is entered -or once the procedure containing the Index_command has begun - Business Rules performs a CLEAR operation to free up memory and then creates or recreates the index file as specified. (In the special case where INDEX is issued with the EXECUTE statement, no CLEAR operation is performed.)
The creation of indexes can be interrupted with Ctrl-A, and can then be resumed with GO.
Finding records with the KEY and SEARCH parameters
- KEY and SEARCH parameters
When an internal file is opened with the KEYED parameter, DELETE, READ, RESTORE file and REWRITE statements will accept an optional clause for using the associated index file. In the technical discussions for these statements in the Statements chapter, this optional clause is referred to as the "KEY=" clause. In the READ and RESTORE statements, the "KEY=" clause can have any of the following four forms: KEY=n$, KEY>=n$, SEARCH=n$, and SEARCH>=n$. Only KEY= can be used with DELETE statements.
All forms of the KEY= clause will return the first match, which is the record with the lowest record number when there are duplicate keys. As used in this section, n$ can be any string expression. String constants, simple variables, subscripted variables, and string functions can be used. These elements may be combined with one another and used in concatenation and substring operations.
Line 300 illustrates the use of string functions to take the numeric value in the variable ACCT and make it a 5-character string by padding it with blanks on the left:
00300 RESTORE #1, KEY=LPAD$(STR$(ACCT),5):
The four forms of "KEY=" are constructed from the two keywords KEY and SEARCH, and the two operators = and >=. These keywords and operators are explained below.
KEY | indicates that the length of string n$ must be identical to the length of the key field (i.e., KLN(file-num)). |
SEARCH | indicates that the length of n$ can be less than or equal to the length of the key field; the search process looks only at enough characters to match the length of n$. |
= | indicates n$ must be an exact match. |
>= | indicates that the next record in the key sequence is to be used when an exact match cannot be made. |
Consider the following four examples for an index file. The date is a 6-character key field and contains the year in the first two positions:
00400 READ #1,KEY="880101": NOKEY 920 00410 READ #1,KEY>="880101": NOKEY 920 00420 READ #1,SEARCH="88": NOKEY 920 00430 READ #1,SEARCH>="88": NOKEY 920
Line 400 reads the first record with a key field of 880101; if no exact match exists, the NOKEY error occurs. Line 410 reads the first record with a key field of 880101; if no exact match exists, this statement reads the record with the next highest value in the key field, e.g., 880103. The NOKEY error occurs in this case only when no record in the entire file has a key field with a value greater than or equal to 880101.
Line 420 reads the first record with 88 in the first 2 positions of the key field. If there is no match for this criterion, the NOKEY error occurs -even if records exist with 87 or 89 in these positions. Line 430 also reads the first record with 88 in the first two positions of the key field; if there is no match for this criterion, the statement goes on the read the record with the next highest value in the first two positions of the key field. The NOKEY error occurs only when no record in the entire file has a key field with a value greater than or equal to 88 in the first two positions.
Techniques for using multiple index files
Several key files may be opened and automatically maintained with each master file. The total number of key files is limited only by the operating system limit for the total number of open files.
The use of multiple index files allows processing records in a master file by different key fields, such as customer number, last name, zip code, date, etc. If it is not possible or convenient to locate a record by one key, the same program can try to retrieve it by another key.
Each index file requires a separate OPEN statement. When two or more OPEN statements specify the same master file for KEYED access, Business Rules automatically links each index file together so that when a record is added, deleted, or changed in the master file, every file in the group of index files is updated simultaneously. If there are any other index files that are not open in programs that could change the master file, you should run INDEX on them occasionally. At a minimum, you should update all index files every time you back up the master file. You also must replace (not reorganize with REORG) all index files after you copy the master file and remove deleted records (see the COPY command).
To run Business Rules programs on a multi-user system, OPEN statements should include a share parameter to specify what is to occur when a program at another workstation attempts to read or write to the same file. A share parameter is required (at least for the first OPEN statement) when several key files "share" the same master file. NOSHR may be used to exclude other workstations, but still share a master file with a group of index files at the same workstation. For a complete description of the four share parameters, see Share Specs in the Definitions section; see also Multi-User Programming.
Lines 2100 to 2400 will open four key files for a file named FILE:
02100 OPEN #1: "name=FILE,kfname=KEY1,shr", INTERNAL,OUTIN,KEYED 02200 OPEN #2: "name=FILE,kfname=KEY2", INTERNAL,OUTIN,KEYED 02300 OPEN #3: "name=FILE,kfname=KEY3", INTERNAL,OUTIN,KEYED 02400 OPEN #4: "name=FILE,kfname=KEY4", INTERNAL,OUTIN,KEYED
In the group of related OPEN statements in lines 2100 to 2400, adding a record to the master file with a WRITE #1 statement adds a record not only to KEY1, but also adds a record to KEY2, KEY3 and KEY4 because these four key files are linked and because output is enabled by specifying OUTIN for all four. Similarly, rewriting a record with a REWRITE #3 statement updates records in KEY1, KEY2, KEY3 and KEY4 if the value in the key fields is changed.
Open Statements
For each index file, there must be a separate OPEN statement.
- The rules for setting up multiple index files are
- 1.) The master file name (specified after NAME=) must be the same for all OPEN statements.
- 2.) All files must have different file numbers (selecting consecutive numbers is recommended, but not required).
- 3.) All share parameters must be consistent with the share parameter from the first OPEN statement encountered.
The best and easiest way to achieve this is to omit share parameters after the first OPEN statement; in this case, Business Rules uses the share parameter from the first keyed OPEN statement for all subsequent keyed OPEN statements using the same master file. Share parameters may be coded explicitly after the first OPEN statement, but some combinations of share parameters across OPEN statements may cause error code 4148 at execution time (see the table of OPEN file restrictions in Multi-User Programming for more information). For example, line 290 will generate error code 4148 because the SHRI parameter in line 280 conflicts with the SHR parameter in line 290.
00280 OPEN #1: "name=FILE,kfname=KEY1,shri", INTERNAL,OUTIN,KEYED 00290 OPEN #2: "name=FILE,kfname=KEY2,shr", INTERNAL,OUTIN,KEYED
- 4.) The entire group of files may be opened NOSHR. This means no other workstation can access the master file or key files, but the master file is shared among the key files at this workstation. If NOSHR is desired for the group of files, you must code NOSHR on the first OPEN statement, then you must omit the share parameter for subsequent OPEN statements. For example,
00040 OPEN #1:"name=FILE,kfname=KEY1,NOSHR", INTERNAL,OUTIN,KEYED 00050 OPEN #2:"name=FILE,kfname=KEY2", INTERNAL,OUTIN,KEYED
- 5.) The group of OPEN statements for keyed processing with a single master file can mix INPUT, OUTIN or OUTPUT processing with one exception. The first OPEN must be OUTPUT or OUTIN if subsequent OPEN statements use OUTPUT or OUTIN. Error code 0608 occurs when a conflicting secondary OPEN statement is executed. Keys are only updated for files opened OUTPUT or OUTIN. This feature can save system overhead on files opened with INPUT, but should be used with caution because this method can lead to index files that do not match the recently changed master file. If you use this method, index files that are not updated should be replaced (not used with REORG) as soon as possible.
In the example in lines 1100 to 1400, KEY1 and KEY2 will be automatically updated because they are opened OUTIN, but KEY3 and KEY4 will not be changed in this program.
01100 OPEN #1: "name=FILE,kfname=KEY1,shr", INTERNAL,OUTIN,KEYED 01200 OPEN #2: "name=FILE,kfname=KEY2", INTERNAL,OUTIN,KEYED 01300 OPEN #3: "name=FILE,kfname=KEY3", INTERNAL,INPUT,KEYED 01400 OPEN #4: "name=FILE,kfname=KEY4", INTERNAL,INPUT,KEYED
REWRITE #1 could change all four key fields in the master file, but KEY3 and KEY4 will not be updated because they are only opened for input.
- 6.) New index files can be created at any time with the Index command. New index files can also be created with the OPEN statement, but only when creating a new (and empty) master file. To create a new master file and multiple new key files, you must code these four parameters: NEW (or USE or REPLACE), RECL=, KPS= and KLN= on all OPEN statements. For example:
05100 OPEN #1:"name=MASTER,new,recl=31,kfname=KEY1,KPS=1,KLN=4,shr",INTERNAL,OUTIN,KEYED 00520 OPEN #2: "name=MASTER,new,recl=31,kfname=KEY2,KPS=9,KLN=5,shr",INTERNAL,OUTIN,KEYED
Business Rules will remember when line 5200 is executed that the file was just created. It will create the new index file, and it will link file numbers 1 and 2 for multiple key processing.
Limit on Number of Open Files
The number of key files that can be opened in Business Rules is restricted by the operating system maximum of open files for any single program.
Each unique master file, index file or procedure file counts as one file against this open file limit. However, opening the same file under two or more different file numbers counts only once. Also, opened window files do not count against the operating system limit on open files.
File Processing
Rules for file processing with multiple key files are the same as rules for file processing with only one key file. Also, all aspects of file sharing, record locking and file name locking apply unchanged to multiple key files. The major addition is the use of different file numbers to provide access by different key fields.
READ, WRITE, REWRITE, RESTORE and DELETE statements for a given file number use the key file specified in the OPEN statement for that file. When WRITE or DELETE statements make changes to any one of the group of key fields, corresponding changes are also made in all key files if they are opened OUTIN or OUTPUT. To improve performance speed, before updating any index files, REWRITE statements first compare the changed record to a memory-resident copy of the original record so that no index file changes are made when no key fields are rewritten.
If a program opened four different key files under four different file numbers, then KEY=, KEY>=, SEARCH= and SEARCH>= could be used with any of the file numbers by following the same rules used with only one key file.
There is a file pointer to keep track of the position in each index file; thus, reading sequentially by key is simply a matter of opening the file for keyed access, then using READ statements without any KEY or SEARCH parameter. By switching between READ statements with different the file numbers, a program could read sequentially by one key, then sequentially by a second key from a second key file, then come back to the first key file and pick up where the pointer was before the switch.
The KPS(n) and KLN(n) functions return key starting position and key length for the key file given in the OPEN statement for file n.
Technique for Using Split Key Fields
It is not necessary for all characters in a key field to be next to each other (sometimes called adjacent or contiguous). A key field may be split into as many as six fields with a maximum combined length of 128 bytes. When split keys with up to six sections are combined with multiple index files for a single master file, the indexing capabilities are quite powerful and flexible.
For details of how to create an index file with split keys see the Index command or see the OPEN internal statement.
Also, to retrieve information about the individual sections of a key field, see the KPS and KLN functions.
To improve execution speed on sorted reports, some programmers have chosen to add another index file with a split key. For example, if a report is supposed to be sorted on three fields, you could maintain an index file with a split key consisting of those three fields in the desired order. When a report needs to be quickly generated on demand, a program can generate it immediately by reading sequentially by key using this existing index file. A general rule is that when you are frequently generating reports on small sections of the file, use indexing. When you are generating infrequent reports on major portions of files, sorting is more efficient.
Baseyear dependent indexes
INDEX key position or key length values may now specify a trailing "Y" character to indicate that the first two digits of the field are BASEYEAR dependent. For example:
INDEX masterfile keyfile 10/23Y/55 8/6/30 REPLACE DUPKEYS
Indicates that a date field begins in position 23 with A TWO DIGIT YEAR that is BASEYEAR dependent.
If the position or length of a numerical (BH or PD) field is followed by Y, then it should also specify a B or P to indicate the storage format. In this case it will be processed as a number with baseyear applied in accordance with the length as follows:
FIELD TYPE & LENGTH | ASSUMED DATA FORMAT |
PD 1 | Not Significant |
PD 2 or BH 1 | YY |
PD 3 or BH 2 | YYMM |
PD 4, BH 3, BH 4, DT 3, | |
DT 4, DL 4 or DH 4 | YYMMDD |
Note that the specification for PD is P and the specification for BH is B in the INDEX statement, for example:
INDEX masterfile keyfile 10/31 2BY/10 REPLACE
This creates 'keyfile' applying BASEYEAR to the first two digits of each key. In this example, the first part of the key is assumed to be binary in YYMM format.