Query and Download API¶

For this example assume you have created a top level Work directory. Under the Work directory create a notebooks and output directory. Then unzip the Immport Download Tool zip package and place the bin and aspera directories included in this zip under the Work directory. Change directories to the notebooks directory then run the command jupyter notebook.

The immport_download.py file contains convience functions to obtain ImmPort and Aspera tokens and a download_file method that simplifies the downloading of a file.

import sys
import os
import pandas as pd
import urllib

# Set the Python path to the location of the directory containing "immport_download.py"
immport_download_code = "../bin/"
sys.path.insert(0,immport_download_code)
os.chdir(immport_download_code)

import immport_download

Example configuration properties - Replace with real user_name and password¶

user_name = "REPLACE"
password = "REPLACE"
output_directory = "../output"

Download the HAI results for SDY222 using API¶

# Request a token, then make API call, then load into Pandas's DataFrame
token = immport_download.request_immport_token(user_name, password)
r = immport_download.api("https://api.immport.org/data/query/result/hai?studyAccession=SDY212",token)
df = pd.read_json(r)

# Print out a sample of the information in the DataFrame
print(df.columns)
print(df.measurementTechnique.unique())
print(df.armName.unique())

Index(['ageEvent', 'ageEventSpecify', 'ageUnit', 'ancestralPopulation',
       'armAccession', 'armName', 'biosampleAccession', 'biosampleSubtype',
       'biosampleType', 'clinical', 'comments', 'ethnicity',
       'experimentAccession', 'expsampleAccession', 'gender', 'maxSubjectAge',
       'measurementTechnique', 'minSubjectAge', 'plannedVisitAccession',
       'race', 'raceSpecify', 'repositoryAccession', 'repositoryName',
       'resultId', 'species', 'strain', 'studyAccession', 'studyTimeCollected',
       'studyTimeCollectedUnit', 'studyTimeT0Event', 'studyTimeT0EventSpecify',
       'studyTitle', 'subjectAccession', 'subjectPhenotype',
       'treatmentAccession', 'unitPreferred', 'unitReported', 'valuePreferred',
       'valueReported', 'virusStrainPreferred', 'virusStrainReported'],
      dtype='object')
['Hemagglutination Inhibition']
['Cohort_2' 'Cohort_1']

Rename the armName values to more descriptive values. When reviewing the ARM information we determined that Cohort_1 represents the Young participants and Cohort_2 represents the Older participants

df_clean = df[['subjectAccession','armAccession','armName','gender','race','minSubjectAge','studyTimeCollected', \
               'valuePreferred','virusStrainReported']]
df_clean.loc[df_clean['armName'] == 'Cohort_1','armName'] = "Young"
df_clean.loc[df_clean['armName'] == 'Cohort_2','armName'] = "Old"

/home/jcampbell/opt/python/anaconda3-4.3.0/lib/python3.6/site-packages/pandas/core/indexing.py:477: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

df_clean.head()

Simple descriptive statistics¶

df_clean[['armName','subjectAccession']].drop_duplicates().groupby('armName').count()

df_clean[['gender','subjectAccession']].drop_duplicates().groupby('gender').count()

df_clean[['race','subjectAccession']].drop_duplicates().groupby('race').count()

set(df_clean.studyTimeCollected)

{0, 28}

set(df_clean.virusStrainReported)

{'B', 'H1N1', 'H3N2'}

Retrieve FCS files for Cohort_1¶

For section we use the Query API to a sample of the FCS files available for this study. Then we will use the Download API to retrieve 5 of the files.

token = immport_download.request_immport_token(user_name, password)
r = immport_download.api("https://api.immport.org/data/query/result/filePath?studyAccession=SDY212&armName=Cohort_1&measurementTechnique=Flow%20cytometry",token)
df = pd.read_json(r)

df = df[df['fileDetail'] == "Flow cytometry result"]
df_fcs = df[['subjectAccession','armAccession','armName','gender','race','minSubjectAge','studyTimeCollected', \
               'fileDetail','filePath']]

df_fcs.head()

unique_file_paths = df_fcs.filePath.unique()
unique_fcs_paths = [path for path in unique_file_paths if path.endswith(".fcs")]
len(unique_fcs_paths)

387

for i in range(0,5):
    print("Downloading: ",unique_fcs_paths[i])
    immport_download.download_file(user_name,password,
                                    unique_fcs_paths[i],output_directory)

Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_FLU 010V1.390100.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_FLU 018V1.390104.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_022V1.390144.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_047V1.390160.fcs
Downloading:  /SDY212/ResultFiles/Flow_cytometry_result/PHOSPHOFLOW SPECIMEN_039V1.391052.fcs

	subjectAccession	armAccession	armName	gender	race	minSubjectAge	valuePreferred	virusStrainReported
0	SUB134240	ARM895	Old	Female	White	62.86	20	B
1	SUB134251	ARM894	Young	Female	White	29.23	40	H3N2
2	SUB134258	ARM895	Old	Female	White	85.41	10	H1N1
3	SUB134264	ARM895	Old	Female	White	68.08	40	B
4	SUB134271	ARM895	Old	Female	White	86.61	640	H3N2

	subjectAccession
gender
Female	54
Male	35

	subjectAccession
race
American Indian or Alaska Native	1
Asian	8
Other	7
White	73

	subjectAccession	armAccession	armName	gender	race	minSubjectAge	fileDetail	filePath
0	SUB134242	ARM894	Cohort_1	Male	White	26.86	Flow cytometry result	/SDY212/ResultFiles/Flow_cytometry_result/pFlo...
1	SUB134249	ARM894	Cohort_1	Female	Other	26.01	Flow cytometry result	/SDY212/ResultFiles/Flow_cytometry_result/pFlo...
24	SUB134242	ARM894	Cohort_1	Male	White	26.86	Flow cytometry result	/SDY212/ResultFiles/Flow_cytometry_result/PHOS...
25	SUB134242	ARM894	Cohort_1	Male	White	26.86	Flow cytometry result	/SDY212/ResultFiles/Flow_cytometry_result/PHOS...
26	SUB134249	ARM894	Cohort_1	Female	Other	26.01	Flow cytometry result	/SDY212/ResultFiles/Flow_cytometry_result/PHOS...