XQuery/Freebase

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Motivation[edit | edit source]

You want to access a web service that takes a JSON document as a query parameter using REST that returns a JSON document as a result. This example uses Freebase, a popular open database that was purchase by Google.

Basic Example[edit | edit source]

We will create a Freebase query that will return the names of the first three albums with the artist ID: '/en/bob_dylan'.

We will use the EXPath http client library and send a HTTP GET request to the freebase server using a URI to encode our query.

Sample Code[edit | edit source]

xquery version "3.0";

let $freebase:="https://www.googleapis.com/freebase/v1/mqlread?query="

let $queries:= 
   (
     '[{
        "type": "/music/album",
        "name":null,
        "artist":{"id":"/en/bob_dylan"},
        "limit":3
      }]','cursor'
   )
 
let $responses := http:send-request(<http:request
     href="{$freebase || string-join(for $q in $queries return encode-for-uri($q),'&amp;')}"
     method="get"/>)
                           
return
<results>
  {if ($responses[1]/@status ne '200')
     then
         <failure>{$responses[1]}</failure>
     else
       <success>
         {util:base64-decode($responses[2])}
         {'' (: todo - use string to JSON serializer lib here :) }
       </success>
  }
</results>

Sample Results[edit | edit source]

The result returned is a string in JSON format. I have added spaces to show the structure of the result

<results>
   <success>
     {"cursor": "eNp9js1qxDAMhB-ml5hiVrKUWBJl6XuYHNz80EBJQtIuyz59nW7p3joXgUajb7qvbV82ix2wpcWoTbMhREIAZWrz3NsT-LIBFmkILVha99txuFwM2mqcrkNv9-GnPRtJYCmJhjU2wsE-Xnx1mVZTFWTS8PgWbHvWACFLwQOUCI4Cv0LMg_SDPzdYK2oQl96n-bMwnasO0E-toygz_UFQaqbA9PD4P8hdnT8fOaiFnEvrthTICZGCgp5e45tGZB6ZQKJmBCSAbyilTh0=",
   "result": 
   [
      {
         "artist": {"id": "/en/bob_dylan"}, 
         "name": "Blood on the Tracks", 
         "type": "/music/album"
       },
       {
         "artist": {"id": "/en/bob_dylan"},
         "name": "Love and Theft",
         "type": "/music/album"
       },
       {
           "artist": {"id": "/en/bob_dylan"}, 
           "name": "Highway 61 Revisited",
           "type": "/music/album"
       }
   ]}
   </success>
</results>

Freebase API Key[edit | edit source]

Here is a link to a blog that describes the process of obtaining a Freebase API key

http://anchetawern.github.io/blog/2013/02/11/getting-started-with-freebase-api/

Using Cursors to Get Additional Data[edit | edit source]

The default freebase query limits the number of results returned. Freebase provides database cursors to facilitate retrieval of the entire set of query results. You ask for a cursor (see the example API call below for the form of the initial request) to be returned with your query results and this acts as a link to the next set of query results.

https://developers.google.com/freebase/v1/mql-overview#querying-with-cursor-paging-results

The next set of results is obtained by supplying the value of the cursor returned from the previous invocation. Along with that next set you get another cursor that points to the set after that. When the final set of results are retrieved the cursor is set to the string 'false'.

The example on the Freebase query overview webpage has sample Python code that invokes libraries to take care of all the cursor handling for you.

https://developers.google.com/freebase/v1/mql-overview#looping-through-cursor-results

However the same thing can easily be achieved from XQuery with a little bit of tail recursion. We will use as an example the following MQL query that returns all films with their netflix_id's.

[{
  "type": "/film/film",
  "name": null,
  "netflix_id": []
}]

A few brief comments about MQL. You ask for something by giving the field name and a value null. Null gets replaced by the actual value. However if the field can have multiple values MQL will return an array and cause your null query to error. This may happen even when you are expecting a singular value so you can avoid this problem by using the symbol for an empty array instead of null as in the query above.

You can paste the query above into http://www.freebase.com/query to see the results (we will take care of the cursor in the code example). Now to the code, which assumes XQuery 3.0

xquery version "3.0";
import module namespace xqjson="http://xqilla.sourceforge.net/lib/xqjson";

Freebase returns JSON so we use the above package to convert it to XML. From eXist you can install the package by just clicking it on the eXist Package Manager which you can get to from the eXist Dashboard.

We declare a variable for our query.

declare variable $mqlQuery {'[{
   "type": "/film/film",
  "name": null,
  "netflix_id": []
}]'};

declare variable $freebase {'https://www.googleapis.com/freebase/v1/mqlread'};
declare variable $key {obtain an API key from freebase and puts it's value here'};

Since we are going to be doing tail recursion we need to put the API call in a function that will will make the API call and store the results in the db.

  declare function local:freebaseCall($cursor as xs:string,$i as xs:integer)

This function will take two parameters. The first is the cursor and the second an integer that provisions an auto-incremented unique file name.

declare function local:freebaseCall($cursor as xs:string, $i as xs:integer)
{
  if ($cursor eq 'false')

    (: termination condition :)
    then $i || ' pages loaded'
    else

     let $params :=  ('query=' || encode-for-uri($mqlQuery), 'key=' ||
       $key, 'cursor=' || encode-for-uri($cursor))

     (: Above uri encodes the parameters to the API call - we have three the
       MQL query, the API key and the cursor :)

      let $href := $freebase || '?' || string-join($params, '&amp;')

     (: This constructs the API call - again thanks to Michael Westbay for
      showing the correct way to do this by string joining the parameters
      with a separator of &amp; :)

     let $responses :=
       http:send-request(<http:request href="{$href}" method="get"/>)
      (: Make the API call. :)

    return
        if ($responses[1]/@status ne '200')
            then <failure
                   href="{xmldb:decode-uri(xs:anyURI($href))}">{$responses[1]}
                </failure>
            else

        let $jsonResponse:= util:base64-decode($responses[2])
        (: Standard EXPATH http error checking - don't forget to base64 decode
           the body of the response.
         :)

         let $freebaseXML:= xqjson:parse-json($jsonResponse)

    (: Convert the returned JSON to XML because we are going to construct an
       http PUT to store it in our xml db. :)

      let $movieData := http:send-request(
         <http:request
            href="{concat(path to store the data in your repostiory,$i,'.xml')}"
               username="username"
               password="password"
               auth-method="basic"
               send-authorization="true"
               method="put">
              <http:header name="Connection" value="close"/>
              <http:body media-type="application/xml"/>              
          </http:request>,
          (),
          <batch cursor="{$cursor}">{transform:transform($freebaseXML,doc(identity.xsl'),())}
          </batch>)

         (: Standard EXPATH PUT request. On the last line we are wrapping the
            returned XML with an element that carries the value of the cursor that
            was used to obtain the page. Identity.xsl is of course the standard
            XSLT identity transform, you can use it as a placeholder for the
            insertion of your own custom transform.
         :)

          return local:freebaseCall($freebaseXML//data(pair[@name="cursor"]), $i + 1)

         (: Finally the tail recursive call. We extract the cursor from the
            returned JSON for parameter 1 and increment $i to give us a unique
            document name for the next page to store.

};

To kick it all off pass the null string as the initial cursor value and initialise your counter

   local:freebaseCall('',1)

Alternatively you can prime the function call with a cursor to restart retrieval from where you left off.

   local:freebaseCall($freebaseXML//data(pair[@name="cursor"]), $i + 1)};

Acknowledgements[edit | edit source]

Original example provided by Ihe Onwuka and Michael Westbay on eXist list in March of 2014

References[edit | edit source]