For this example assume you have created a top level Work directory. Under the Work directory create a notebooks and output directory. Then unzip the Immport Download Tool zip package and place the bin and aspera directories included in this zip under the Work directory. Change directories to the notebooks directory then run the command jupyter notebook.
The immport_download.py file contains convience functions to obtain ImmPort and Aspera tokens and a download_file method that simplifies the downloading of a file.
import sys
import os
import pandas as pd
import urllib
# Set the Python path to the location of the directory containing "immport_download.py"
immport_download_code = "../bin/"
sys.path.insert(0,immport_download_code)
os.chdir(immport_download_code)
import immport_download
user_name = "REPLACE"
password = "REPLACE"
output_directory = "../output"
# Request a token, then make API call, then load into Pandas's DataFrame
token = immport_download.request_immport_token(user_name, password)
r = immport_download.api("https://api.immport.org/data/query/result/hai?studyAccession=SDY212",token)
df = pd.read_json(r)
# Print out a sample of the information in the DataFrame
print(df.columns)
print(df.measurementTechnique.unique())
print(df.armName.unique())
Rename the armName values to more descriptive values. When reviewing the ARM information we determined that Cohort_1 represents the Young participants and Cohort_2 represents the Older participants
df_clean = df[['subjectAccession','armAccession','armName','gender','race','minSubjectAge','studyTimeCollected', \
'valuePreferred','virusStrainReported']]
df_clean.loc[df_clean['armName'] == 'Cohort_1','armName'] = "Young"
df_clean.loc[df_clean['armName'] == 'Cohort_2','armName'] = "Old"
df_clean.head()
df_clean[['armName','subjectAccession']].drop_duplicates().groupby('armName').count()
df_clean[['gender','subjectAccession']].drop_duplicates().groupby('gender').count()
df_clean[['race','subjectAccession']].drop_duplicates().groupby('race').count()
set(df_clean.studyTimeCollected)
set(df_clean.virusStrainReported)
For section we use the Query API to a sample of the FCS files available for this study. Then we will use the Download API to retrieve 5 of the files.
token = immport_download.request_immport_token(user_name, password)
r = immport_download.api("https://api.immport.org/data/query/result/filePath?studyAccession=SDY212&armName=Cohort_1&measurementTechnique=Flow%20cytometry",token)
df = pd.read_json(r)
df = df[df['fileDetail'] == "Flow cytometry result"]
df_fcs = df[['subjectAccession','armAccession','armName','gender','race','minSubjectAge','studyTimeCollected', \
'fileDetail','filePath']]
df_fcs.head()
unique_file_paths = df_fcs.filePath.unique()
unique_fcs_paths = [path for path in unique_file_paths if path.endswith(".fcs")]
len(unique_fcs_paths)
for i in range(0,5):
print("Downloading: ",unique_fcs_paths[i])
immport_download.download_file(user_name,password,
unique_fcs_paths[i],output_directory)