CURL: Difference between revisions
No edit summary |
m (1 revision) |
(No difference)
|
Latest revision as of 03:08, 19 June 2013
Please note that Business Rules! 4.20 provides native HTTP support. |
cURL is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS and FILE. cURL supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos...), file transfer resume, proxy tunneling and a busload of other useful tricks.
Curl is free and open software that compiles and runs under a wide variety of operating systems. Curl exists thanks to efforts from many contributors. See cURL Home Page
See also: Wikipedia:Web scraping
The basic idea is you call cURL using a system call and you give it a URL and a file name to save the page under. Curl goes to the URL you enter, just as if you entered it in Internet explorer, and it saves the resulting page as an html file.
In this example we wanted to write a function to perform a reverse phone number look-up. I can't send you the whole source code because it is proprietary, but I don't think a little excerpt will hurt. We call Curl and we give it a URL at whitepages.com. We figured out that if you go to the URL: [1] it will display an html page showing who the phone number 817 274 5220 belongs to (which happens to be Nizza Pizza).
00100 REV_LOOK: ! Reverse Lookup By Phone Number 00120 def library Fnrev_Look(Number$,Mat Result$) 00140 let Fnrev_Look=1 ! Assume Success !: mat Web_Page$(0) 00150 if Exists("address.html") then execute "*free address.html" 00160 execute 'sy -M curl http://www.whitepages.com/15055/search/ReversePhone?phone=' & Number$ & ' -A "Mozilla/4.0" -o address.html -s' 00200 if Fnread_Page("address.html",Mat Web_Page$) then 00270 let Response_Type=Fnget_Type(Mat Web_Page$) 00272 if Response_Type then ! If Respose Type Found !: let Fnrev_Look=Fnparse_Page(Response_Type, Mat Web_Page$, Mat Result$) ! Parse It !: else !: let Fnrev_Look=0 ! Failed To Get Parse Response Type 00280 else 00285 let Fnrev_Look=0 ! Failed To Read Page, Check Internet 00290 end if 00340 _END_REV_LOOK: fnend
This function takes the phone number to look up, and it builds the URL and passes it to curl in line 160. Line 160 tells Curl to preform the look-up and save the resulting page as address.html in the current directory.
After that, on line 200, we call a function that reads the results into a matrix. After that we call various functions to parse through the matrix looking for the Address information.
During our investigation of the web site we discovered that whitepages.com returns several different web pages depending on if there are 0 result(s), 1 result(s) or many result(s) found. Our parser functions look at the format of the address.html file that curl saved, to determine which type it is. Then, based on that information, it parses the results and builds an address array called Mat Result$ that it returns to the caller.
If you call this function using 817 274 5220, you end up with a mat results$ similar to:
- Results$(1)="Nizza Pizza & Pasta"
- Results$(2)="1430 S Cooper St"
- Results$(3)="Arlington"
- Results$(4)="TX"
- Results$(5)="76013"