Str2Mat: Difference between revisions
| Gordon.dye (talk | contribs)  | |||
| (16 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| The '''Str2Mat'''  [[Internal Functions|Internal Function]] will split a string variable based on a [[delimiter]] and place the resulting strings into an array which STR2MAT dynamically re-dimensions. | The '''Str2Mat'''  [[Internal Functions|Internal Function]] will split a string variable based on a [[delimiter]] and place the resulting strings into an array which STR2MAT dynamically re-dimensions. The string to mat and mat to string functions have been extended to ease parsing of CSV and XML data (as of 4.3).   | ||
|   STR2MAT(<string variable>, MAT <array name>, <delimiter$>) |   STR2MAT(<string variable>, MAT <array name>, [MAT] <delimiter$>, [<quote-type:trim>]) | ||
| [[image:str2mat.png|800px]] | |||
| ===Parameters=== | |||
| "String Variable" is the variable that contains the data to be converted into an array. | |||
| "MAT array-name" is the name of the array into which the variable will be placed. | |||
| "Delimiter$" is a string containing the character in the string variable which will be used to separate it into items to be placed in the array. For example, a comma ",". In 4.3 Delimiter can be an array. | |||
| "Quotes:Trim" is an optional parameter which handles quotes within the string variable. Quotation marks can suppress the recognition of separators so that any delimiter (such as a comma) that occurs between the specified quotes will not split the data into separate array elements. Quote-type can be Q, QUOTES, ('), or ("), and is case insensitive. Q and QUOTES means that BR will recognize from the data which type of quotes (double or single) will be recognized by examining the first nonblank character. Q and QUOTES also means the enveloping quote characters will be stripped off of the data placed into the receiving array. The trim flags can be :LTRM , :TRIM or :RTRM , and denote post processing of extracted elements. In essence these indicate that leading and/or trailing blanks should be stripped from each resultant array element. The leading colon is only present when quote-type is specified (as of 4.3).  | |||
| When examining str$ left to right, the first character (and the first character after each separator) is checked to see if is either (') or ("). If the first character is a quote, then it suppresses the recognition of separators until quotation processing is deactivated by another occurrence of the leading quote. The string is copied until it ends or until an odd number of successive occurrences of the governing quote type is encountered. During this processing, two adjacent occurrences of the governing quote character denote a single embedded occurrence of the quote character and is disregarded as a quote deactivator. | |||
| ===Defaults=== | |||
| Actually, the delimiter parameter is also optional in a sense. But practically speaking it is mandatory.  | |||
| The default delimiter searches for the following combinations of [[line feed]] and [[carriage return]] characters: | The default delimiter searches for the following combinations of [[line feed]] and [[carriage return]] characters: | ||
| Line 10: | Line 26: | ||
| *[[carriage return]] | *[[carriage return]] | ||
| This enables multiple CSV rows to be contained in a single string. | |||
|   00010 str2mat( | ===Further Explanation=== | ||
| 1. When more than one occurrence of the same delimiters are used next to each other, BR honors all of them making an empty string element in the resulting array for all but the first occurrence of the delimiter. Consider the following example: | |||
|   00010 let namelist$="Mary,John,,Salomi,Thomas,,,David,Sonia" | |||
|  00020 str2mat(namelist$,mat customer$,",") | |||
|   00030 print mat customer$ | |||
| Output: | Output: | ||
|   Mary | |||
|  John | |||
|  Salomi | |||
|  Thomas | |||
|  David | |||
|  Sonia | |||
| Customer$(3), Customer$(6), and Customer$(7) would have a value of "". | |||
| 2. If the delimiter is "", every character will be put in a separate element of the array. | |||
| 3. Str2Mat performs the opposite action of [[Mat2Str]] | |||
| 4. Str2Mat dynamically redimensions the array (mat customer$ in the above example) as needed to include all of the items from the source string variable. It returns the number in the final array. | |||
| 5. When the delimiter is an array, both will signify the start of a new element in the final array. But when two consecutive delimiters are different, they will not create a blank element in the array. For example: | |||
|  dim namelist$*256,customer$(7),delim$(2) | |||
|  let namelist$="Mary,Jo.hn,,.Salomi.Thomas,,,David,,.Sonia" | |||
|  let delim$(1)="," | |||
|  let delim$(2)="." | |||
|   str2mat(namelist$,mat customer$,Mat delim$) | |||
|   print mat customer$ | |||
| will return: | |||
| [[image:readc2.jpg]] | |||
| To restate this: when elements of a delimiter array occur adjacent to each other within the source string, they are grouped as one separator substring. When the same occur consecutively, it creates a null element in the final array output. | |||
| ===CSV Parsing (4.3)=== | ===CSV Parsing Example (4.3)=== | ||
| The following code spinet demonstrates how to open a CSV/Tab File, read in the fields from the header, and then loop through  the records. | |||
|   01000    dim CSV_LINE$*999,CSV_FILE$*256, CSV_DELIM$*1,CSV_HEADER$*999,CSV_FIELDS$(1)*40,CSV_DATA$(1)*60 |   01000    dim CSV_LINE$*999,CSV_FILE$*256, CSV_DELIM$*1,CSV_HEADER$*999,CSV_FIELDS$(1)*40,CSV_DATA$(1)*60 | ||
| Line 81: | Line 98: | ||
| STR2MAT may also be used to Parse XML data. | STR2MAT may also be used to Parse XML data. | ||
| This is a bit more complex than parsing CSV files, but  | This powerful tool is a bit more complex than parsing CSV files, but useful nonetheless. | ||
| The following example will parse XML$ into "MAT XML_LINE$" | The following example will parse XML$ into "MAT XML_LINE$" | ||
| Line 91: | Line 108: | ||
| This makes the parsing of XML a bit more convenient. The following XML sample shows how the function will parse the data | This makes the parsing of XML a bit more convenient. The following XML sample shows how the function will parse the data | ||
| Data: | |||
|    <XML> |    <XML> | ||
|     <NODE> |     <NODE> | ||
| Line 97: | Line 115: | ||
|    </XML> |    </XML> | ||
| Results: | |||
|    <XML |    <XML | ||
|    <NODE |    <NODE | ||
| Line 104: | Line 123: | ||
|    </XML |    </XML | ||
| If the node names are known, a more complete and useful technique can be performed. You may use an array of Delimiter$ values to parse the data.  Take the following example: | |||
|   100    dim XML$*999999,XML_LINE$(1)*32000,DELIM$(4)*32 | |||
|   100    dim XML$*999999,XML_LINE$(1)*32000, | |||
|   110    let XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM><ITEM2>ITEM2 VALUE</ITEM2></NODE></XML>" |   110    let XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM><ITEM2>ITEM2 VALUE</ITEM2></NODE></XML>" | ||
|   120    read MAT SEP$ |   120    read MAT SEP$ | ||
|   130    data </XML>,</NODE>,</ITEM>,</ITEM2> |   130    data </XML>,</NODE>,</ITEM>,</ITEM2> | ||
|   140    let STR2MAT(XML$,MAT XML_LINE$,MAT  |   140    let STR2MAT(XML$,MAT XML_LINE$,MAT DELIM$,"TRIM") | ||
|   150    print MAT XML_LINE$ |   150    print MAT XML_LINE$ | ||
| Line 121: | Line 139: | ||
| Notice that "Nested Nodes" are listed before the initial data, this may be used to identify the node. | Notice that "Nested Nodes" are listed before the initial data, this may be used to identify the node. | ||
| ===Quote Processing=== | ===Quote Processing Examples=== | ||
| The following chart demonstrates how data in the source string is handled if * and , are delimiters: | |||
| {|border="2" style="border-collapse:collapse;" | |||
| |-style="background:#a5c7ff;" | |||
| !String | |||
| !Final Array Element | |||
| !Explanation | |||
| |- | |||
| |*"abc,def" | |||
| |abc,def | |||
| |the comma is not recognized as a separator and is part of the data since it is within quotes. | |||
| |- | |||
| |*abc"def | |||
| |abc"def | |||
| |embedded quotes may occur anywhere within a string after the first character | |||
| |- | |||
| |*"abc"def" | |||
| |abcdef" | |||
| |quotation processing is deactivated by the center quote mark | |||
| |- | |||
| |*"abcdef" | |||
| |abcdef | |||
| |normal data | |||
| |- | |||
| |*"abc'def" | |||
| |abc'def | |||
| |the single quote is treated like any other character while double quotes govern | |||
| |- | |||
| |*'abc"def' | |||
| |abc"def | |||
| |double quotes are treated like any other character while single quotes govern | |||
| |- | |||
| |*"abc""def" | |||
| |abc"def | |||
| |pairs of governing quotes denote a single embedded quote | |||
| |- | |||
| |*"abc"""def" | |||
| |abc"def" | |||
| |the third successive occurrence deactivates quote processing | |||
| |} | |||
| ===Reading a CSV file=== | |||
| [[image:CSVexample.jpg]] | |||
| The following program takes this CSV file and uses STR2MAT to store the information in two separate arrays.  | |||
|  ! This Sample Program Reads A Csv and puts it into two arrays, which could easily be written into a BR data file. | |||
|     dim Wholepiece$*10000, codelist$(1)*256, description$(1)*256, code$(1), nextline$*256 | |||
| * |     open #1: "name=hcodeexport.csv, recl=500",display,input | ||
|     mat Code$(0) | |||
|     mat Description$(0) | |||
|     let delim$="," | |||
|     do while file(1)=0 | |||
|        linput #1: nextline$ eof ignore | |||
|        if file(1)=0 then | |||
|           ! at this point we have 1 line of the CSV file in NextLine$ | |||
|           let str2mat(nextline$,mat codelist$,delim$,"Q:trim") | |||
|           ! at this point we have mat CodeList$ sized 2 with each element in it | |||
|           ! lets do something useful, lets build them together into 2 arrays | |||
|           index=udim(mat code$) +1 ! find new position in the arrays | |||
|           mat Code$(index)         ! Make the arrays bigger | |||
|           mat Description$(index) | |||
|           ! Put the stuff we found into the new arrays | |||
|           let code$(index)=codelist$(1) | |||
|           let description$(index)=codelist$(2) | |||
|        end if | |||
|      loop | |||
|   ! print mat code$ | |||
|   ! print mat description$ | |||
|    for i=1 to 15 | |||
|       print code$(i)&" "&description$(i) | |||
|    next i | |||
|    close #1: | |||
| ====Sample Program==== | ====Final Sample Program==== | ||
| The following program demonstrates quote processing and trimming: | The following program demonstrates quote processing and trimming, using both STR2MAT and MAT2STR: | ||
|   00010 ! Rep Str2mat |   00010 ! Rep Str2mat | ||
|   00020    dim LINE$*400,DESC$(5)*30,SEP$(1)*20,QTYPE$(2)*20 |   00020    dim LINE$*400,DESC$(5)*30,SEP$(1)*20,QTYPE$(2)*20 | ||
| Line 223: | Line 259: | ||
| <noinclude> | <noinclude> | ||
| See also [[Mat2Str]] | |||
| [[Category:Internal Functions]] | [[Category:Internal Functions]] | ||
| </noinclude> | </noinclude> | ||
Latest revision as of 13:02, 13 April 2022
The Str2Mat Internal Function will split a string variable based on a delimiter and place the resulting strings into an array which STR2MAT dynamically re-dimensions. The string to mat and mat to string functions have been extended to ease parsing of CSV and XML data (as of 4.3).
STR2MAT(<string variable>, MAT <array name>, [MAT] <delimiter$>, [<quote-type:trim>])
Parameters
"String Variable" is the variable that contains the data to be converted into an array.
"MAT array-name" is the name of the array into which the variable will be placed.
"Delimiter$" is a string containing the character in the string variable which will be used to separate it into items to be placed in the array. For example, a comma ",". In 4.3 Delimiter can be an array.
"Quotes:Trim" is an optional parameter which handles quotes within the string variable. Quotation marks can suppress the recognition of separators so that any delimiter (such as a comma) that occurs between the specified quotes will not split the data into separate array elements. Quote-type can be Q, QUOTES, ('), or ("), and is case insensitive. Q and QUOTES means that BR will recognize from the data which type of quotes (double or single) will be recognized by examining the first nonblank character. Q and QUOTES also means the enveloping quote characters will be stripped off of the data placed into the receiving array. The trim flags can be :LTRM , :TRIM or :RTRM , and denote post processing of extracted elements. In essence these indicate that leading and/or trailing blanks should be stripped from each resultant array element. The leading colon is only present when quote-type is specified (as of 4.3).
When examining str$ left to right, the first character (and the first character after each separator) is checked to see if is either (') or ("). If the first character is a quote, then it suppresses the recognition of separators until quotation processing is deactivated by another occurrence of the leading quote. The string is copied until it ends or until an odd number of successive occurrences of the governing quote type is encountered. During this processing, two adjacent occurrences of the governing quote character denote a single embedded occurrence of the quote character and is disregarded as a quote deactivator.
Defaults
Actually, the delimiter parameter is also optional in a sense. But practically speaking it is mandatory.
The default delimiter searches for the following combinations of line feed and carriage return characters:
This enables multiple CSV rows to be contained in a single string.
Further Explanation
1. When more than one occurrence of the same delimiters are used next to each other, BR honors all of them making an empty string element in the resulting array for all but the first occurrence of the delimiter. Consider the following example:
00010 let namelist$="Mary,John,,Salomi,Thomas,,,David,Sonia" 00020 str2mat(namelist$,mat customer$,",") 00030 print mat customer$
Output:
Mary John Salomi Thomas David Sonia
Customer$(3), Customer$(6), and Customer$(7) would have a value of "".
2. If the delimiter is "", every character will be put in a separate element of the array.
3. Str2Mat performs the opposite action of Mat2Str
4. Str2Mat dynamically redimensions the array (mat customer$ in the above example) as needed to include all of the items from the source string variable. It returns the number in the final array.
5. When the delimiter is an array, both will signify the start of a new element in the final array. But when two consecutive delimiters are different, they will not create a blank element in the array. For example:
dim namelist$*256,customer$(7),delim$(2) let namelist$="Mary,Jo.hn,,.Salomi.Thomas,,,David,,.Sonia" let delim$(1)="," let delim$(2)="." str2mat(namelist$,mat customer$,Mat delim$) print mat customer$
will return:
To restate this: when elements of a delimiter array occur adjacent to each other within the source string, they are grouped as one separator substring. When the same occur consecutively, it creates a null element in the final array output.
CSV Parsing Example (4.3)
The following code spinet demonstrates how to open a CSV/Tab File, read in the fields from the header, and then loop through the records.
01000 dim CSV_LINE$*999,CSV_FILE$*256, CSV_DELIM$*1,CSV_HEADER$*999,CSV_FIELDS$(1)*40,CSV_DATA$(1)*60 01020 form C," " 01040 let CSV_FILE$="Sample_File.tab" : let TAB$=CHR$(9) 01060 open #(CSV_HANDLE:=10): "name="&CSV_FILE$&",shr",display,input 01080 linput #CSV_HANDLE: CSV_HEADER$ 01100 let CSV_DELIM$=TAB$ 01120 if POS(CSV_HEADER$,TAB$) <= 0 then 01140 let CSV_DELIM$="," 01160 end if 01180 let STR2MAT(CSV_HEADER$,MAT CSV_FIELDS$,CSV_DELIM$,"QUOTES:TRIM") 01200 print using 1020: MAT CSV_FIELDS$ 01220 do 01240 linput #CSV_HANDLE: CSV_LINE$ eof Exit_Csv 01260 let STR2MAT(CSV_LINE$,MAT CSV_DATA$,CSV_DELIM$,"Q:trim") 01280 print using 1020: MAT CSV_DATA$ 01300 loop 01320 Exit_Csv: !
You might wish to copy any CSV file to Sample_File.tab and run this program to view the content.
XML Parsing Enhancements
STR2MAT may also be used to Parse XML data.
This powerful tool is a bit more complex than parsing CSV files, but useful nonetheless.
The following example will parse XML$ into "MAT XML_LINE$"
10 DIM XML$*999999,XML_LINE$(1)*32000 20 XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM></NODE></XML>" 100 LET Str2mat(XML$,Mat XML_LINE$,">","TRIM")
This makes the parsing of XML a bit more convenient. The following XML sample shows how the function will parse the data
Data:
 <XML>
  <NODE>
    <ITEM>ITEM VALUE</ITEM>
  </NODE>
 </XML>
Results:
<XML <NODE <ITEM ITEM VALUE</ITEM </NODE </XML
If the node names are known, a more complete and useful technique can be performed. You may use an array of Delimiter$ values to parse the data. Take the following example:
100 dim XML$*999999,XML_LINE$(1)*32000,DELIM$(4)*32 110 let XML$="<XML><NODE><ITEM>ITEM VALUE</ITEM><ITEM2>ITEM2 VALUE</ITEM2></NODE></XML>" 120 read MAT SEP$ 130 data </XML>,</NODE>,</ITEM>,</ITEM2> 140 let STR2MAT(XML$,MAT XML_LINE$,MAT DELIM$,"TRIM") 150 print MAT XML_LINE$
This program would return the following results:
<XML><NODE><ITEM>ITEM VALUE <ITEM2>ITEM2 VALUE
Notice that "Nested Nodes" are listed before the initial data, this may be used to identify the node.
Quote Processing Examples
The following chart demonstrates how data in the source string is handled if * and , are delimiters:
| String | Final Array Element | Explanation | 
|---|---|---|
| *"abc,def" | abc,def | the comma is not recognized as a separator and is part of the data since it is within quotes. | 
| *abc"def | abc"def | embedded quotes may occur anywhere within a string after the first character | 
| *"abc"def" | abcdef" | quotation processing is deactivated by the center quote mark | 
| *"abcdef" | abcdef | normal data | 
| *"abc'def" | abc'def | the single quote is treated like any other character while double quotes govern | 
| *'abc"def' | abc"def | double quotes are treated like any other character while single quotes govern | 
| *"abc""def" | abc"def | pairs of governing quotes denote a single embedded quote | 
| *"abc"""def" | abc"def" | the third successive occurrence deactivates quote processing | 
Reading a CSV file
The following program takes this CSV file and uses STR2MAT to store the information in two separate arrays.
! This Sample Program Reads A Csv and puts it into two arrays, which could easily be written into a BR data file.
   dim Wholepiece$*10000, codelist$(1)*256, description$(1)*256, code$(1), nextline$*256
   open #1: "name=hcodeexport.csv, recl=500",display,input
   mat Code$(0)
   mat Description$(0)
   let delim$=","
   do while file(1)=0
      linput #1: nextline$ eof ignore
      if file(1)=0 then
         ! at this point we have 1 line of the CSV file in NextLine$
         let str2mat(nextline$,mat codelist$,delim$,"Q:trim")
 
         ! at this point we have mat CodeList$ sized 2 with each element in it
 
         ! lets do something useful, lets build them together into 2 arrays
         index=udim(mat code$) +1 ! find new position in the arrays
         mat Code$(index)         ! Make the arrays bigger
         mat Description$(index)
 
         ! Put the stuff we found into the new arrays
         let code$(index)=codelist$(1)
         let description$(index)=codelist$(2)
      end if
   loop
 
 ! print mat code$
 ! print mat description$
  for i=1 to 15
     print code$(i)&" "&description$(i)
  next i
   
  close #1:
Final Sample Program
The following program demonstrates quote processing and trimming, using both STR2MAT and MAT2STR:
00010 ! Rep Str2mat
00020    dim LINE$*400,DESC$(5)*30,SEP$(1)*20,QTYPE$(2)*20
00030 ! 
00040    print NEWPAGE
00050    let LINE$='"  TEST1",, " TEST,,3","TEST4" ,,"TE""S""T6 "'
00060    print LINE$;TAB(1);"no augmentation or quote recognition"
00070    print "note column 3 gets split up and quotes are data"
00080    let STR2MAT(LINE$,MAT DESC$,',')
00090    let MAT2STR(MAT DESC$,LINE$,',')
00100    for X = 1 to UDIM(DESC$) !:
            print DESC$(X), LEN(DESC$(X)) !:
         next X !:
         print LINE$
00110    linput Z$
00120 ! 
00130    print LINE$;TAB(1);"strip quotes and trim outside the quotes"
00140    print "convert commas to tildes"
00150    let STR2MAT(LINE$,MAT DESC$,',',"Q:TRIM")
00160    let MAT2STR(MAT DESC$,LINE$,'~')
00170    for X = 1 to UDIM(DESC$) !:
            print DESC$(X), LEN(DESC$(X)) !:
         next X !:
         print LINE$
00180    linput Z$
00190 ! 
00200    let LINE$='"  TEST1",, " TEST,,3","TEST4" ,,"TE""S""T6 "'
00210    print LINE$;TAB(1);"strip quotes and notrim"
00220    print "convert commas to tildes"
00230    print "column 3 is broken up because the leading quote is embedded without trim"
00240    let STR2MAT(LINE$,MAT DESC$,',',"Q")
00250    let MAT2STR(MAT DESC$,LINE$,'~')
00260    for X = 1 to UDIM(DESC$) !:
            print DESC$(X), LEN(DESC$(X)) !:
         next X !:
         print LINE$
See also Mat2Str


