STR2MAT - Quote Processing
Moderators: Susan Smith, admin, Gabriel
-
- Posts: 717
- Joined: Sun Aug 10, 2008 4:24 am
- Location: Southern California
STR2MAT - Quote Processing
Can I have embedded quotes in strings where I use STR2MAT to parse? I do not understand the information on the wiki for MAT2STR Quote Processing. I am using BR 4.2 and I am parsing CSV file records into an array. One of the fields in the CSV file is a name with an embedded comma - in the format of LASTNAME, FIRSTNAME
In order to keep that embedded comma, I have encapsulated (surrounded) the name with double quotes as you can see directly below in the sample csv file. STR2MAT is ignoring the quotes and separating the line at the comma.
CSVFILE contents (the columns are account number, name, zipcode):
12345,"Anderson, Chloe",91105
12346,"Brown, Tom",91104
00010 DIM PART$(3)*100,QLINE$*200
00012 open #1,"Name=csvfile.csv",display,input
00020 LINPUT #CSVFILE: QLINE$ eof endfile
00030 LET STR2MAT(QLINE$,mat part$,",")
00040 pause
00050 goto 20
00060 endfile: pause
print part$(2)
"Anderson
print part$(3)
Chloe"
And there is a part$(4) - even though I dimmed it at (3) - which contains the zipcode 91105
Why wouldn't part$(2) be
Anderson, Chloe
and part$(3) be
91105
Perhaps I don't understand the documentation in terms of quote processing. Or is it that I cannot deal with quote processing in 4.2? If the latter is the case, I will use David Blankenship's FNPARSETEXT utility to parse them instead.
-- Susan
In order to keep that embedded comma, I have encapsulated (surrounded) the name with double quotes as you can see directly below in the sample csv file. STR2MAT is ignoring the quotes and separating the line at the comma.
CSVFILE contents (the columns are account number, name, zipcode):
12345,"Anderson, Chloe",91105
12346,"Brown, Tom",91104
00010 DIM PART$(3)*100,QLINE$*200
00012 open #1,"Name=csvfile.csv",display,input
00020 LINPUT #CSVFILE: QLINE$ eof endfile
00030 LET STR2MAT(QLINE$,mat part$,",")
00040 pause
00050 goto 20
00060 endfile: pause
print part$(2)
"Anderson
print part$(3)
Chloe"
And there is a part$(4) - even though I dimmed it at (3) - which contains the zipcode 91105
Why wouldn't part$(2) be
Anderson, Chloe
and part$(3) be
91105
Perhaps I don't understand the documentation in terms of quote processing. Or is it that I cannot deal with quote processing in 4.2? If the latter is the case, I will use David Blankenship's FNPARSETEXT utility to parse them instead.
-- Susan
Re: STR2MAT - Quote Processing
You want to use the flag$ parameter to tell BR to handle the quotes for you.
http://brwiki2.brulescorp.com/index.php?title=STR2MAT
Try this:
or
However, it does look like that ability is available in BR 4.3 only.
Gabriel
http://brwiki2.brulescorp.com/index.php?title=STR2MAT
Try this:
Code: Select all
00030 LET STR2MAT(QLINE$,mat part$,",","Q")
Code: Select all
00030 LET STR2MAT(QLINE$,mat part$,",","Q:TRIM")
Gabriel
Re: STR2MAT - Quote Processing
You will still have problems with embedded Carriage Returns though, because LINPUT will only read up to the first Carriage Return.
If you're trying to read a CSV file, you can use this function if you like. Add line numbers using Lexi.
If people wanted, I was thinking of including this functionality into FileIO or something like it. So you can just "open" a CSV file using a function and it automatically reads the first line and uses that to make the subscripts. Then you can call "fnReadCSVFile" and it will automatically read a line at a time of a CSV file, including embedded CR's and handling Quotes properly, and you get back an Array which you can then use the CSV file subscripts to access.
If you're trying to read a CSV file, you can use this function if you like. Add line numbers using Lexi.
If people wanted, I was thinking of including this functionality into FileIO or something like it. So you can just "open" a CSV file using a function and it automatically reads the first line and uses that to make the subscripts. Then you can call "fnReadCSVFile" and it will automatically read a line at a time of a CSV file, including embedded CR's and handling Quotes properly, and you get back an Array which you can then use the CSV file subscripts to access.
Code: Select all
dim CSVImportString$*30000
dim CSV_TempImport$(1)*10000
dim CSV_Temp$(1)*10000
def library fnReadCSVFile(CsvFile,mat CsvFile$;Delimiter$,Quote$,___,Index,Length,TopIndex,EndOfFile)
if Delimiter$="" then let Delimiter$=","
if Quote$="" then let Quote$=""""
mat CsvFile$=("")
linput #CsvFile: CSVImportString$ eof Ignore
! if CSVImportString$(1:3)="463" then pause
if file(CsvFile)=0 then
mat Csv_TempImport$=("")
if CSVImportString$="" then let CSVImportString$=" " ! Need at least one space.
str2mat(CSVImportString$,mat Csv_TempImport$,Delimiter$)
! go through all of them, strip the quotes. If one has a front quote but no end quote, we need to
! put it together with the next one. If there is no next one, read another line and put it together with the next one.
let Index=0
do while Index<udim(mat Csv_TempImport$)
let Index+=1
let Csv_TempImport$(Index)=trim$(Csv_TempImport$(Index))
if Csv_TempImport$(Index)(1:1)=Quote$ then
let Length=len(Csv_TempImport$(Index))
if Csv_TempImport$(Index)(Length:Length)=Quote$ and Length>1 then
! Strip the Quote$ off, and we're good.
let Csv_TempImport$(Index)=Csv_TempImport$(Index)(1+len(Quote$):Length-len(Quote$))
else
! We need to read the next one.
if Index<udim(mat Csv_TempImport$) then
! There's a next one, slide it into this one and keep going
let Csv_TempImport$(Index)=Csv_TempImport$(Index)&","&Csv_TempImport$(Index+1)
! Go through the rest of them and slide them up one
let fnMoveUpOne(Index+1,mat Csv_TempImport$)
let Index-=1 ! Slide back one to retest now that they're put together
else
! We need to read the next line and put it together and keep going.
linput #CsvFile: CSVImportString$ eof Ignore
if file(CsvFile)=0 then
mat CSV_Temp$=("")
if CSVImportString$="" then let CSVImportString$=" " ! Need at least one space.
str2mat(CSVImportString$,mat CSV_Temp$,Delimiter$)
let TopIndex=udim(mat Csv_TempImport$)
mat Csv_TempImport$(TopIndex+udim(mat Csv_Temp$))
mat Csv_TempImport$(TopIndex+1 : udim(mat Csv_TempImport$))=Csv_Temp$
let Csv_TempImport$(Index)=Csv_TempImport$(Index)&hex$("0A0D")&Csv_TempImport$(Index+1)
let fnMoveUpOne(Index+1,mat Csv_TempImport$)
let Index-=1 ! Slide back one to retest now that they're put together
else
! We hit the end of the file, and we're still in open quote territory .. what do we do here?
let EndOfFile=1
end if
end if
end if
else
! We have no quotes, not a quoted string, keep going.
end if
loop until EndOfFile
if ~EndOfFile then
mat CsvFile$(udim(mat Csv_TempImport$))=Csv_TempImport$
let fnReadCSVFile=1
end if
end if
fnend
def fnMoveUpOne(Index,Mat Array$;___,Jndex)
for Jndex=Index to udim(mat Array$)-1
let Array$(Jndex)=Array$(Jndex+1)
next Jndex
mat Array$(udim(mat Array$)-1)
fnend
-
- Posts: 717
- Joined: Sun Aug 10, 2008 4:24 am
- Location: Southern California
Re: STR2MAT - Quote Processing
Thanks Gabe. Before I saw your reply, I changed my program to use David's FNPARSETEXT routine. It handles all of the quotes beautifully so I just had to drop (one line) it in where I had previously had the STR2MAT command. It wasn't apparent to me, from the wiki description, exactly WHAT was in BR 4.3 and what was in BR 4.2 in terms of this command. So I couldn't tell if it didn't work because I didn't have the FLAG$ parameter set up, or I made some other mistake, or it wasn't in my BR version at all. Since I don't know what my data could like (there could be an embedded comma in a CSV file), it seems that it's probably not a good idea to try to use STR2MAT in BR 4.2 at all then...just in case it stumbles over an unexpected comma.
-- Susan
-- Susan
Re: STR2MAT - Quote Processing
To elaborate a tag on Gabriel's comments.
I find it highly unusual to have newlines or CRLF combinations embedded in valid CSV data. However, the operative word is there is "valid". If you think your users will have CRLF characters in their data and you don't want such characters to denote end-of-input-line, then using Gabriel's function may be your best bet. It is possible to still use STR2MAT for such data, but it requires a bit of special processing and why bother if there is a tool for it.
Otherwise, for the normal case, the 4.3 "Q" flag is sufficient to handle things like embedded quotes and other embedded delimiters such as commas.
I find it highly unusual to have newlines or CRLF combinations embedded in valid CSV data. However, the operative word is there is "valid". If you think your users will have CRLF characters in their data and you don't want such characters to denote end-of-input-line, then using Gabriel's function may be your best bet. It is possible to still use STR2MAT for such data, but it requires a bit of special processing and why bother if there is a tool for it.
Otherwise, for the normal case, the 4.3 "Q" flag is sufficient to handle things like embedded quotes and other embedded delimiters such as commas.
-
- Posts: 717
- Joined: Sun Aug 10, 2008 4:24 am
- Location: Southern California
Re: STR2MAT - Quote Processing
Gordon,
In this particular case, I won't have to worry about embedded CR or LF because it's an export from another software package. But what I DO have to deal with is that I am using BR 4.2, so the "Q" flag is not available to me. Luckily, my alternate solution is working beautifully now, so I'm good. But for those who ARE using BR 4.3, they would have the ability to handle the parsing with STR2MAT and the "Q" flag.
-- Susan
In this particular case, I won't have to worry about embedded CR or LF because it's an export from another software package. But what I DO have to deal with is that I am using BR 4.2, so the "Q" flag is not available to me. Luckily, my alternate solution is working beautifully now, so I'm good. But for those who ARE using BR 4.3, they would have the ability to handle the parsing with STR2MAT and the "Q" flag.
-- Susan
Re: STR2MAT - Quote Processing
CRLF in CSV files are used when a "Memo or Note Record" is included in the data. It does add an extra complication for processing CSV files.
BR 4.3, really does make it much easier to parse CSV data.
http://brwiki2.brulescorp.com/index.php ... _.284.3.29
http://brwiki2.brulescorp.com/index.php ... Processing
My recommendation is to create a function in BR4.2 that uses similar processing, so that one day, you can just "Update the function" to leverage the BR 4.3 functionality.
BR 4.3, really does make it much easier to parse CSV data.
http://brwiki2.brulescorp.com/index.php ... _.284.3.29
http://brwiki2.brulescorp.com/index.php ... Processing
My recommendation is to create a function in BR4.2 that uses similar processing, so that one day, you can just "Update the function" to leverage the BR 4.3 functionality.
Re: STR2MAT - Quote Processing
Memo fields are common in exports from other programs, including Access and Excel (the two most common programs people export from in CSV format.)
In order to support CRLF you have to do complicated logic during which its very easy to also support Quotes.
The problem with CRLF is that it requires complicated logic around your read statement. Gordon could perhaps address this by updating LINPUT to intelligently check quotes but he probably has more important things to fix first, such as Local Variables, graphical glitches, unhandled exceptions, and many others.
So once you write your function to handle CRLF, you might as well just use that function everywhere.
The function I posted in this thread yesterday supports CRLF, supports Quotes and Embedded Commas, even under BR 4.2.
But its not a simple replacement for Str2mat because it can't be, because embedded CRLF requires special logic around the read statement.
Gabriel
In order to support CRLF you have to do complicated logic during which its very easy to also support Quotes.
The problem with CRLF is that it requires complicated logic around your read statement. Gordon could perhaps address this by updating LINPUT to intelligently check quotes but he probably has more important things to fix first, such as Local Variables, graphical glitches, unhandled exceptions, and many others.
So once you write your function to handle CRLF, you might as well just use that function everywhere.
The function I posted in this thread yesterday supports CRLF, supports Quotes and Embedded Commas, even under BR 4.2.
But its not a simple replacement for Str2mat because it can't be, because embedded CRLF requires special logic around the read statement.
Gabriel
Re: STR2MAT - Quote Processing
I like the suggestion of changing LINPUT (with a retro OPTION of course) to read and append the next "line" if quotes are unbalanced.
That would make STR2MAT sufficient for all cases that I'm aware of.
That would make STR2MAT sufficient for all cases that I'm aware of.
Re: STR2MAT - Quote Processing
There probably needs to be some type of option in the open statement, and of course the crlf needs to be retained in the line that is being read.