OpenClinica User Manual/SAS

From Wikibooks, open books for an open world
Jump to: navigation, search

Using CDISC ODM 1.3 with OpenClinica Extensions to export to SAS[edit]

SAS can import data in a number of formats. For OpenClinica version 3.x the obvious choice, due to its portability and SAS's import facilities, is CDISC ODM 1.3 XML.

The CDISC ODM 1.3 XML with OpenClinica extensions format includes two types of data:

  1. CDISC ODM 1.3 specification data (which SAS's libname knows how to import)
  2. OpenClinica extensions data (which SAS can be told how to pick up with a mapping file)

Most of the useful data is CDISC ODM 1.3, but some useful data is 'hidden' in OpenClinica extensions sections (e.g. DateOfBirth, Sex and Event Start Dates).

SAS doesn't import OpenClinica Extensions by default[edit]

OpenClinica extensions aren't part of the CDISC ODM 1.3 specification, and need to be imported separately. The code below uses a mapping file (extensions.xml) to import some, but not all, of the extensions. Other OpenClinica extensions can be picked up from the data file by altering the mapping file appropriately - OpenClinica extensions are tags and attributes in the data file that have the Openclinica prefix, some of which are important. The only place to obtain the Study Subject ID is by using OpenClinica:StudySubjectId - the SubjectKey refers to the internal OID, which doesn't change even if the Study Subject ID does (the OID remains SS_TEST, even if the Study Subject ID is changed to Test 2). The CDISC ODM 1.3 schema generated by OpenClinica is also documented at Openclinica.com/tools/odm-doc.

How to import CDISC ODM 1.3 XML with OpenClinica extensions into SAS[edit]

Download the mapping file.

Our init.sas file has pointers to directories containing:

  1. The data to be imported
  2. The mapping file to import OpenClinica Extension data
%let orig=C:\Users\<user>\Documents\<study>\Original Data\;
%let maps=C:\Users\<user>\Documents\<study>\Map\;

We import using the CDISC ODM 1.3 XML using the libname command (proc cdisc is deprecated as of SAS 9.2). The import.sas file first imports the CDISC ODM file, then imports the OpenClinica Extensions.

* Import Screening XML file;
 
filename odm0 "&orig.<filename>.xml";
libname odm0 xml xmltype=CDISCODM FormatActive=YES FormatNoReplace=NO FormatLibrary="WORK"; 
 
* Create ODM dataset;
 
data odm0blood;
	set odm0.Blood;
run;
 
* Import OpenClinica extensions values from Screening XML file;
 
filename ex0 "&orig.<filename>.xml";
filename sx0emap "&maps.extensions.xml";
libname ex0 xml92 xmlmap=sx0emap access=READONLY;
 
* Create OpenClinica extensions dataset;
 
data studyeventdata0;
	length __SUBJECTKEY $100.;
	set ex0.Studyeventdata;
run;

Issues with importing CDISC ODM 1.3[edit]

SAS doesn't like Partial Dates[edit]

CDISC ODM 1.3 includes a specification for a data type called partial date, which SAS can't import (it throws an error when you try to read the file). You may choose to not have partial dates in your study and use a text field instead, validated by a regular expression. Alternatively you can modify the XML export before it is imported to change partial dates to text fields (see below).

SAS doesn't import labels[edit]

SAS will use a shortened version (truncated to 8 chars) of the ITEM_NAME as the field name, e.g. DMHEIGHT (the SASFieldName attribute of the ItemDef tag in the XML). It will use the full ITEM_NAME itself as the field label (the Name attribute of the ItemDef tag in the XML). In some cases it would be good to pick up the DESCRIPTION_LABEL as the field label (the Comment attribute of the ItemDef tag in the XML). The code below includes a section (commentable) that switched the Name and Comment attributes of ItemDef tags around, so that DESCRIPTION_LABELs are imported as the label.

Field names too long for SAS[edit]

SAS doesn't like field names that are longer than 32 characters including the OpenClinica prefix (e.g. I_HTBAS_BLD_1STDOSESTUDYMEDSGIVEN). You may want to use shorter names (e.g. 8 character long) for new studies, but if you have an old study you may need to replace long field names with shortened versions.

OpenClinica (3.0.3 at least) exports badly encoded entities[edit]

OpenClinica can export entities that aren't defined in the DTD (see this issuetracker bug - OpenClinica login needed). The script correctly escapes the leading ampersand.

Script to clean up Partial Dates, Labels and Entities[edit]

The script below modifies the CDISC ODM XML directly. For CTIMPs (Clinical Trials Involving Medicinal Products) this alters the valuable data you have been collecting, and should be done carefully, with the changes validated (how exactly is up to you).

Notes[edit]

  • The code below is specific to Windows, and has been tested in Windows 7
  • Windows doesn’t run unsigned PowerShell scripts, so until a signed version of the script below can be created, you have to run the commands manually.
  • Don't be concerned if you notice the XML files double in size, this is because PowerShell's output files are in Unicode (two bytes for every original character) to cope with international characters. The original is most likely to have been in single byte encoding (ASCII or similar)
  • The regexps could be reimplemented in Perl under Linux

How to run the code[edit]

  1. Copy the code below into Notepad (or a similar text editor)
  2. Alter the $extractdirectory value to point to your data extracts (paths can be obtained from the Explorertoolbar). This script creates files ending in _new.xml from existing .xml files (and ignores existing files ending in _new.xml).
  3. Type cmd into the Windows Start Orb search box (which opens a terminal)
  4. Then copy and paste the code into the terminal (right-clicking on the terminal will open context menu from whence you can paste.)
powershell
 
#########################
#PowerShell script begin#
#########################
 
#Debug on:
$DebugPreference = "Continue" 
 
#Filename - in single quotes, the inputfile and outputfile
#The outputfile will be overwritten.  The inputfile will not be modified.
$extractdirectory = 'C:\Users\<user>\Documents\<Study>\Original Data\'
 
Get-Item ($extractdirectory + '*.xml') | ForEach-Object {
 
	if (!($_.FullName).EndsWith("_new.xml")) {
		#Get the file
		$text = [System.IO.File]::ReadAllText($_.FullName)
 
		#Remove datestamps from the filenames so that the import script doesn't change every export
		$modifiedfilename = [regex]::Replace($_.FullName, "(^.+?)D\d+\+\d+.xml",'$1_new.xml')
		Write-Host "Fullname: ($_.FullName); modifiedfilename:$modifiedfilename"
 
		#Replaces partialDate with a text type of length 10
		$text= [regex]::Replace($text, "(<ItemDef [^>]*?DataType="")partialDate("" SASFieldName[^>]*?>)",'$1text" Length="10$2')
 
		#Switch Comment for Name 
		$text= [regex]::Replace($text, "(<ItemDef [^>]*? )Name(=""[^>]*?"" )Comment(=""[^>]*?"">)",'$1Comment$2Name$3')
 
		#Replace badly encoded entities, e.g. &#8805; (OC sometimes exports entities not in the DTD)
		$text= [regex]::Replace($text, "&#(\d+);",'&amp;#$1;')
 
		#Optional: SAS likes names to be 32 characters or less.
                #You can manually replace too long variable names with shorter versions:
 		#$text = $text -replace "I_HTBAS_BLD_1STDOSESTUDYMEDSGIVEN", "I_HTBAS_BLD_1STDOSESTUDYMEDSGIVE"
 
		#Write new file
		$text | Out-File $modifiedfilename
	}
}
 
#########################
#PowerShell script end  #
#########################
exit

Further information[edit]

This OpenClinica forum thread further discussed details of how to import data into SAS.

OpenClinica SPSS to Stata[edit]

On a similar note, for those interested in Stata, there's a tool that converts SPSS to Stata here: http://openclinica.mcri.edu.au/DMTools/OpenClinica/SPSStoStata.aspx


Errors when importing into SPSS[edit]

When a dataextract is exported in SPSS-format the zip contains 2 files:

  • the data, in tab-delimited format, named .dat
  • an SPSS-syntax file, which can be run to add vaue and variable labels to the data, named .sps

Most of the time the syntax file must be edited, because it assumes that the dat-file is in the default SPSS directory. If this is not the case,

GET DATA /TYPE = TXT/FILE = 'SPSS_DAT2013-04-19-200700043.dat'

must be changed into something like:

GET DATA /TYPE = TXT/FILE = 'c:\temp\SPSS_DAT2013-04-19-200700043.dat'

Also error messages appear referring to dates. The reason why dates are not imported correctly is in the different date formats in the .dat file and the .sps file: The .dat file contains dates in searchable format whereas the .sps file expects American date format. If you change all ADATE to SDATE in the the spss syntax (search&replace) then it should work.

SAS, R and CSV xsl Transformations[edit]

Some excellent xsl transform files were made available by Linas Silva on this developers list thread (also available here).

These transformations can be run on full ODM 1.3 xml extracts, with the accompanying powershell scripts. The CSV powershell script is probably the most important as it chops up the output csv into into a series of csvs.

Alternatively, the transforms can be run without powershell, instead using the copy of the saxon library that comes with OpenClinica. Example syntax (all on one line):

"path_to_java.exe" -cp "path_to_tomcat\webapps\OpenClinica3141\WEB-INF\lib\saxon-8.7.jar" 
net.sf.saxon.Transform -o "output_file_path.ext" "input_xml_file.xml" "path_to_xsl.xsl"

Each of the xsl files refer to renaming map stylesheet called 'xml_convert_dynamic_lookup' which lists the CRF and Item Group name combinations and what they should be renamed to in the output. If this is not used, the dataset names default to the Item Group OID.


applying the script for SAS[edit]

For SAS transfomations, use the script titled powershell_perform_SAS_xsl_transforms.ps1. This is a powershell-script, so you should open a command-prompt as administrator. Change to the directory where you unzipped the script plus the *.xsl-files and start powershell.

cd Yourdrive:\YourPath\ExportODM
powershell

Now you have a Powershell-prompt and first you must set the Execution policy:

Set-ExecutionPolicy Unrestricted -Scope Process

Running the script is done with one big line, invoking the script with 7 parameters:

.\powershell_perform_SAS_xsl_transforms.ps1 "RDP001_ODM_20140311.xml" "xml_convert_sas_map.xsl" "sasmap_rdp001_20140310.map" "xml_convert_sas_data.xsl" "data_rdp1_20140311.xml" "xml_convert_sas_format.xsl" "formatout_20140310.sas"

The parameters are:

  • "RDP001_ODM_20140311.xml" and this must be the name of your dataextract in ODM1.3-format
  • "xml_convert_sas_map.xsl" is one of the *.xsl-files of the zip, so do not change this one
  • "sasmap_rdp001_20140310.map" is the mapping-file that is created and you can give this any name you want
  • "xml_convert_sas_data.xsl" is one of the *.xsl-files of the zip, so again: do not change this one
  • "data_rdp1_20140311.xml" is the resulting sas-data-file
  • "xml_convert_sas_format.xsl" is one of the *.xsl-files of the zip: no changing this one
  • "formatout_20140310.sas" is a SAS-file with info about radio- and singleselect-options