Query and Download API

For this example assume you have created a top level Work directory. Under the Work directory create a notebooks and output directory. Then unzip the Immport Download Tool zip package and place the bin and aspera directories included in this zip under the Work directory. Change directories to the notebooks directory then run the command jupyter notebook.

The immport_download.py file contains convience functions to obtain ImmPort and Aspera tokens and a download_file method that simplifies the downloading of a file.

In [1]:
import sys
import os
import pandas as pd
import urllib

# Set the Python path to the location of the directory containing "immport_download.py"
immport_download_code = "../bin/"
sys.path.insert(0,immport_download_code)
os.chdir(immport_download_code)

import immport_download

Example configuration properties - Replace with real user_name and password

In [2]:
user_name = "REPLACE"
password = "REPLACE"
output_directory = "../output"

Download the HAI results for SDY222 using API

In [3]:
# Request a token, then make API call, then load into Pandas's DataFrame
token = immport_download.request_immport_token(user_name, password)
r = immport_download.api("https://api.immport.org/data/query/result/hai?studyAccession=SDY212",token)
df = pd.read_json(r)
In [4]:
# Print out a sample of the information in the DataFrame
print(df.columns)
print(df.measurementTechnique.unique())
print(df.armName.unique())
Index(['ageEvent', 'ageEventSpecify', 'ageUnit', 'ancestralPopulation',
       'armAccession', 'armName', 'biosampleAccession', 'biosampleSubtype',
       'biosampleType', 'clinical', 'comments', 'ethnicity',
       'experimentAccession', 'expsampleAccession', 'gender', 'maxSubjectAge',
       'measurementTechnique', 'minSubjectAge', 'plannedVisitAccession',
       'race', 'raceSpecify', 'repositoryAccession', 'repositoryName',
       'resultId', 'species', 'strain', 'studyAccession', 'studyTimeCollected',
       'studyTimeCollectedUnit', 'studyTimeT0Event', 'studyTimeT0EventSpecify',
       'studyTitle', 'subjectAccession', 'subjectPhenotype',
       'treatmentAccession', 'unitPreferred', 'unitReported', 'valuePreferred',
       'valueReported', 'virusStrainPreferred', 'virusStrainReported'],
      dtype='object')
['Hemagglutination Inhibition']
['Cohort_2' 'Cohort_1']

Rename the armName values to more descriptive values. When reviewing the ARM information we determined that Cohort_1 represents the Young participants and Cohort_2 represents the Older participants

In [5]:
df_clean = df[['subjectAccession','armAccession','armName','gender','race','minSubjectAge','studyTimeCollected', \
               'valuePreferred','virusStrainReported']]
df_clean.loc[df_clean['armName'] == 'Cohort_1','armName'] = "Young"
df_clean.loc[df_clean['armName'] == 'Cohort_2','armName'] = "Old"
/home/jcampbell/opt/python/anaconda3-4.3.0/lib/python3.6/site-packages/pandas/core/indexing.py:477: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
In [6]:
df_clean.head()
Out[6]:
subjectAccession armAccession armName gender race minSubjectAge studyTimeCollected valuePreferred virusStrainReported
0 SUB134240 ARM895 Old Female White 62.86 0 20 B
1 SUB134251 ARM894 Young Female White 29.23 0 40 H3N2
2 SUB134258 ARM895 Old Female White 85.41 0 10 H1N1
3 SUB134264 ARM895 Old Female White 68.08 0 40 B
4 SUB134271 ARM895 Old Female White 86.61 0 640 H3N2

Simple descriptive statistics

In [7]:
df_clean[['armName','subjectAccession']].drop_duplicates().groupby('armName').count()
Out[7]:
subjectAccession
armName
Old 60
Young 29
In [8]:
df_clean[['gender','subjectAccession']].drop_duplicates().groupby('gender').count()
Out[8]:
subjectAccession
gender
Female 54
Male 35
In [9]:
df_clean[['race','subjectAccession']].drop_duplicates().groupby('race').count()
Out[9]:
subjectAccession
race
American Indian or Alaska Native 1
Asian 8
Other 7
White 73
In [10]:
set(df_clean.studyTimeCollected)
Out[10]:
{0, 28}
In [11]:
set(df_clean.virusStrainReported)
Out[11]:
{'B', 'H1N1', 'H3N2'}

Retrieve FCS files for Cohort_1

For section we use the Query API to a sample of the FCS files available for this study. Then we will use the Download API to retrieve 5 of the files.

In [12]:
token = immport_download.request_immport_token(user_name, password)
r = immport_download.api("https://api.immport.org/data/query/result/filePath?studyAccession=SDY212&armName=Cohort_1&measurementTechnique=Flow%20cytometry",token)
df = pd.read_json(r)
In [13]:
df = df[df['fileDetail'] == "Flow cytometry result"]
df_fcs = df[['subjectAccession','armAccession','armName','gender','race','minSubjectAge','studyTimeCollected', \
               'fileDetail','filePath']]
In [14]:
df_fcs.head()
Out[14]:
subjectAccession armAccession armName gender race minSubjectAge studyTimeCollected fileDetail filePath
0 SUB134242 ARM894 Cohort_1 Male White 26.86 0 Flow cytometry result /SDY212/ResultFiles/Flow_cytometry_result/pFlo...
1 SUB134249 ARM894 Cohort_1 Female Other 26.01 0 Flow cytometry result /SDY212/ResultFiles/Flow_cytometry_result/pFlo...
24 SUB134242 ARM894 Cohort_1 Male White 26.86 0 Flow cytometry result /SDY212/ResultFiles/Flow_cytometry_result/PHOS...
25 SUB134242 ARM894 Cohort_1 Male White 26.86 0 Flow cytometry result /SDY212/ResultFiles/Flow_cytometry_result/PHOS...
26 SUB134249 ARM894 Cohort_1 Female Other 26.01 0 Flow cytometry result /SDY212/ResultFiles/Flow_cytometry_result/PHOS...
In [15]:
unique_file_paths = df_fcs.filePath.unique()
unique_fcs_paths = [path for path in unique_file_paths if path.endswith(".fcs")]
len(unique_fcs_paths)
Out[15]:
387
In [16]:
for i in range(0,5):
    print("Downloading: ",unique_fcs_paths[i])
    immport_download.download_file(user_name,password,
                                    unique_fcs_paths[i],output_directory)
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_FLU 010V1.390100.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_FLU 018V1.390104.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_022V1.390144.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_047V1.390160.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_039V1.391052.fcs