Pages

Monday, April 2, 2012

COUNTER SUSHI harvester, pt 4: Using Sushi Py

My SUSHI client for harvesting COUNTER reports is in a relatively stable condition at the moment, so now it's time to share.

How to Use sushi_py

  1. Make sure you've read my previous post on setting up your environment
  2. Download sushi_py.zip or sushi_py.tar.gz and unzip to a location of your choice.
  3. Set up your CSV file and/or MySQL tables. Fill them with parameters for all the SUSHI reports you want to regularly schedule. You can find more about how each vendor handles the parameters at the SUSHI server registry.
    • For CSV: Open sushi.csv and fill the rows with parameters for your SUSHI requests. Note that sushi_py assumes you have column headers in the first row and will ignore any data you enter in it.
    • For MySQL: Download and run this SQL script that will automatically generate the tables used by sushi_py. If you are using MySQL Workbench, open your database in query mode and go to File > Open SQL Script. Fill the table counter_sushi with the parameters for your SUSHI requests.
  4. The file counter_harvest.py is an example of how to use sushi_py. The easiest way to use it is to put it in the same directory as sushi_py.py (unless you want to fiddle with paths and modules). Open counter_harvest.py in a text editor. Edit the MySQL database variables and the report locations. By default, the script is set up to use MySQL. To read SUSHI parameters from a CSV file, uncomment the line with "sushi.GetParamsCSV" and make sure it's pointing to the file you use for your SUSHI parameters. Then, comment out the "sushi.GetParamsSQL" line. To write to CSV, uncomment the line with "sushi.WriteResponseCSV", and comment out the line with "sushi.WriteResponseSQL".
  5. Run counter_harvest.py. I recommend you run it in a command line so that you can view the printout of which reports worked and which failed. To enable debug mode, uncomment or add the line "sushi.DebugMode = True". Warning: debug mode will crank out _a lot_ of text. Consider running just one report at a time for debug mode. You can run one report at a time easily by modifying the for loop in counter_harvest.py so that it only runs CallServer and WriteResponse if i == x, where x is the row in your parameter list containing the report you want to test.
  6. Once you are comfortable that everything is working, set up counter_harvest.py as a Linux cron job or a Windows scheduled task.
For now, sushi_py works for JR1, DB1, DB2, and DB3 COUNTER reports. It can be easily expanded to work with other reports, but I don't have access to any SUSHI services that allow other reports to test them.

In developing Sushi Py, I've ran into a handful of problems with individual vendor implementations. At the time of writing, Sushi Py has been confirmed to work with the following vendors:

  • American Chemical Society
  • EBSCO
  • Ovid
It has been confirmed _not_ to work with the following:
  • Cambridge
  • OCLC
  • ProQuest
For the three that are not working, I am in contact with technical support to determine the cause. I am fairly sure that it is a problem with their SUSHI implementation because I am not even able to make a successful request using SoapUI. Hopefully, they will be working shortly.

For the most part, Sushi Py does what I need it to do for now, and I will mostly be updating the code on an as-needed basis. Feel free to contact me with questions.


COUNTER SUSHI harvester, pt 3: Setting up your environment

Last time, I explained why I am building a SUSHI client in Python and how I did it. I'll be sharing my code shortly! But it won't do you any good unless you have a working Python environment running. Here are the prerequisite software components:
  • Python 2.7.x (version 3.x will not work!)
  • Python SUDS package
  • Python MySQLdb package (if you plan on using MySQL)
  • a MySQL database
  • GUI software for managing the MySQL database
In my case, I wanted to install it first to a development server on my local (Windows) computers, and then I wanted to be able to run it on a production server (Ubuntu Linux). Here are the Windows instructions:

  1. Install Python
    • Download the free version of ActivePython
    • Run the Python Package Manager
    • In PyPM type "pypm install suds"
    • In PyPM type "pypm install mysql-python"
  2. Install MySQL
    • Download XAMPP for Windows. This comes with a MySQL server. Set it up during installation. Note the username and password you choose.
  3. Connect with MySQL GUI client
    • Install MySQL Workbench on your office computer
    • Set up a new connection to your MySQL database at "localhost" and connect with the credentials you supplied when you installed MySQL.
    • Create a database. Create the following tables (the names are flexible):
    • (insert table create statements/scripts)
    • Using the Server Administration module, add a new user with SELECT and INSERT privileges on your database that will be used to allow the script to connect.


For Linux, you will have to do some tinkering of your own. Hopefully, if you even care about how to do this on Linux, you have a good amount of experience using it in the first place and probably know better than I do. Here are the main steps I discovered as a Linux n00b. I am running Ubuntu 11.10:
  1. Install Python
    • sudo apt-get install python-dev
    • sudo apt-get install python-suds
    • sudo apt-get install python-mysqldb
  2. Install MySQL
    • sudo apt-get install mysql
    • create a strong password for the root user
    • modify my.conf to allow connections from ALL hosts
    • create a user with all privileges and a strong password
  3. Connect with MySQL GUI client
    • sudo ufw allow 3306 (3306 is default)
    • install MySQL Workbench on your office computer
    • set up a new connection to your MySQL database
    • create a database with tables as described above

COUNTER SUSHI harvester, pt 2: A summary of how I made it

After a month of on-and-off coding and head-banging, my Python SUSHI harvester is in a working condition. Hello, Sushi Py!

How It Works

Here are the steps for making a SUSHI client:

  1. Input the parameters for a SUSHI request (ie Requestor ID, Customer ID, date range, etc). In Sushi Py, instead of manually running the script for each SUSHI request, it reads a list of them either from a CSV file or a MySQL table and runs requests for all of them.
  2. Create a SOAP envelope. You can do this manually (not recommended) by creating an XML file from scratch, or you can utilize a SOAP service's WSDL file. Sushi Py uses the latter route with a Python library called SUDS. SUDS takes the URL of a WSDL file as input and allows you to easily create a SOAP envelope from it.
  3. Fill the SOAP envelope with data from step 1. Fairly straightforward.
  4. Make the SOAP request. If you used a WSDL file, a smart SOAP library like SUDS will automatically know the URL of the SOAP service. If all goes well, you will be returned a response from the SUSHI service at the other end.
  5. Parse the response. You will want to have an example of what the response looks like. Options for this include the free tool SoapUI or using lots of print/debug commands in whatever SOAP library you are using. I used both. Each response is parsed to a Python dictionary (associative array), and each dictionary is added to a list (array) of all the responses. The shape and values of the dictionary varies based on which COUNTER report is being requested.
  6. Record the response. Sushi Py loops through each dictionary in the list of responses and writes it to either a CSV file or a MySQL table. The output location also varies based on which COUNTER report is being used. For example, JR1 reports go into one table/file, DB1 reports into another, and so on.
Sushi Py currently works for JR1, DB1, DB2, and DB3 reports. Sushi Py is hard-coded to deal with each report and will not work "out of the box" with new report types. In the future, I will hopefully modify it so that report types are not hard-coded. It's pretty easy to add support for new report types, and if you look into the source code, you can probably figure it out fairly quickly just by looking at how other report types are defined.