REBOL Programming/Language Features/Recursion/wikichanges

From Wikibooks, open books for an open world
< REBOL Programming‎ | Language Features‎ | Recursion
Jump to: navigation, search

Here is an example of how to spider a wikibook using a recursive function

REBOL [
   file: %wikichanges.r
   author: "Graham"
   date: 18-Sep-2005
   rights: 'BSD
   purpose: {
       To display the changes in a wikibook in reverse date order using a recursive function
   }
   
]

home: http://en.wikibooks.org/wiki/REBOL_Programming
relative-root: "/wiki/REBOL_Programming/"
root: http://en.wikibooks.org/

pages: []
updates: []

rebuild-date: func [ 
   { builds a date value from the update stamp on a wikipage }
   date [string!] 
   /local d
][
   d: load date
   to-date rejoin [ d/2 "-" copy/part form d/3 3 "-" d/4 "/" d/1 ]
]

get-link: func [ 
   { recursive function to get the links from a wikipage }
   page [url!] 
   /local tags internal-link content
][
   wait .5 ; don't overload the wikibook server with too many requests at once
   print [ "loading ... " page ]    
   content: read page
   if parse find/last/tail content "This page was last modified" [ copy updated to "." to end ][
       print [ "Updated: " updated ]
       repend updates [ rebuild-date updated page ]
   ]
   tags: load/markup content
   foreach tag tags [
       if parse tag  [ to relative-root skip copy internal to {"} to "title=" to end ][
           if not find pages internal-link: join root trim internal [
               append pages internal-link
               get-link internal-link
           ]
       ]
   ]    
]

append pages home

; grab all the links
get-link home

updates: sort/skip/reverse updates 2

; print them out
foreach [ date page ] updates [ print [ date page ]]

which gives this type of output

17-Sep-2005/23:13 http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/VID
17-Sep-2005/5:18 http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Control
16-Sep-2005/18:41 http://en.wikibooks.org/wiki/REBOL_Programming/Third_Party
...