Files in Synapse¶
Synapse files can be created by uploading content from your local computer or linking to digital files on the web.
Files in Synapse always have a “parent”, which could be a project or a folder. You can organize collections of files into folders and sub-folders, just as you would on your local computer.
Note: You may optionally follow the Uploading data in bulk tutorial instead. The bulk tutorial may fit your needs better as it limits the amount of code that you are required to write and maintain.
This tutorial will follow a Flattened Data Layout. With this example layout:
.
├── biospecimen_experiment_1
│ ├── fileA.txt
│ └── fileB.txt
├── biospecimen_experiment_2
│ ├── fileC.txt
│ └── fileD.txt
├── single_cell_RNAseq_batch_1
│ ├── SRR12345678_R1.fastq.gz
│ └── SRR12345678_R2.fastq.gz
└── single_cell_RNAseq_batch_2
├── SRR12345678_R1.fastq.gz
└── SRR12345678_R2.fastq.gz
Tutorial Purpose¶
In this tutorial you will:
- Upload several files to Synapse
- Print stored attributes about your files
- List all Folders and Files within my project
Prerequisites¶
- Make sure that you have completed the Folder tutorial.
- The tutorial assumes you have a number of files ready to upload. If you do not, create test or dummy files. You may also use these dummy files used during the creation of these tutorials. These are text files with example file extensions that a researcher may be using.
1. Upload several files to Synapse¶
Uploading Large Files
If you are uploading very large files (>100 GB each), consider using sequential uploads with async API instead.
For large file uploads, see the execute_walk_file_sequential() function in uploadBenchmark.py as a reference implementation. This approach uses asyncio.run(file.store_async()) with the newer async API, which has been optimized for handling very large files efficiently. In benchmarks, this pattern successfully uploaded 45 files of 100 GB each (4.5 TB total) in approximately 20.6 hours.
First let's retrieve all of the Synapse IDs we are going to use¶
# Step 1: Upload several files to Synapse
import os
import synapseclient
import synapseutils
from synapseclient.models import File, Folder, Project
syn = synapseclient.login()
# Retrieve the project ID
my_project = Project(name="My uniquely named project about Alzheimer's Disease").get()
# Retrieve the IDs of the folders I want to upload to
batch_1_folder = Folder(
parent_id=my_project.id, name="single_cell_RNAseq_batch_1"
).get()
batch_2_folder = Folder(
parent_id=my_project.id, name="single_cell_RNAseq_batch_2"
).get()
biospecimen_experiment_1_folder = Folder(
parent_id=my_project.id, name="biospecimen_experiment_1"
).get()
biospecimen_experiment_2_folder = Folder(
parent_id=my_project.id, name="biospecimen_experiment_2"
).get()
Next let's create all of the File objects to upload content¶
biospecimen_experiment_1_a_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileA.txt"),
parent_id=biospecimen_experiment_1_folder.id,
)
biospecimen_experiment_1_b_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileB.txt"),
parent_id=biospecimen_experiment_1_folder.id,
)
biospecimen_experiment_2_c_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileC.txt"),
parent_id=biospecimen_experiment_2_folder.id,
)
biospecimen_experiment_2_d_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileD.txt"),
parent_id=biospecimen_experiment_2_folder.id,
)
batch_1_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz"
),
parent_id=batch_1_folder.id,
)
batch_1_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz"
),
parent_id=batch_1_folder.id,
)
batch_2_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz"
),
parent_id=batch_2_folder.id,
)
batch_2_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz"
),
parent_id=batch_2_folder.id,
)
Finally we'll store the files in Synapse¶
biospecimen_experiment_1_a_2022.store()
biospecimen_experiment_1_b_2022.store()
biospecimen_experiment_2_c_2023.store()
biospecimen_experiment_2_d_2023.store()
batch_1_scrnaseq_file_1.store()
batch_1_scrnaseq_file_2.store()
batch_2_scrnaseq_file_1.store()
batch_2_scrnaseq_file_2.store()
Each file being uploaded has an upload progress bar:
##################################################
Uploading file to Synapse storage
##################################################
Uploading [####################]100.00% 2.0bytes/2.0bytes (1.8bytes/s) SRR12345678_R1.fastq.gz Done...
2. Print stored attributes about your files¶
batch_1_scrnaseq_file_1_id = batch_1_scrnaseq_file_1.id
print(f"My file ID is: {batch_1_scrnaseq_file_1_id}")
print(f"The parent ID of my file is: {batch_1_scrnaseq_file_1.parent_id}")
print(f"I created my file on: {batch_1_scrnaseq_file_1.created_on}")
print(
f"The ID of the user that created my file is: {batch_1_scrnaseq_file_1.created_by}"
)
print(f"My file was last modified on: {batch_1_scrnaseq_file_1.modified_on}")
You'll notice the output looks like:
My file ID is: syn53205687
The parent ID of my file is: syn53205629
I created my file on: 2023-12-28T21:55:17.971Z
The ID of the user that created my file is: 3481671
My file was last modified on: 2023-12-28T21:55:17.971Z
3. List all Folders and Files within my project¶
Now that your project has a number of Folders and Files let's explore how we can traverse the content stored within the Project.
my_project.sync_from_synapse(download_file=False)
dir_mapping = my_project.map_directory_to_all_contained_files("./")
for directory_name, file_entities in dir_mapping.items():
print(f"Directory: {directory_name}")
for file_entity in file_entities:
print(f"\tFile: {file_entity.name}, ID: {file_entity.id}")
The result of walking your project structure should look something like:
Directory (syn60109540): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1
Directory (syn60109543): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2
Directory (syn60109534): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1
Directory (syn60109537): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2
File (syn60115444): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileA.txt
File (syn60115457): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_1/fileB.txt
File (syn60115472): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileC.txt
File (syn60115485): My uniquely named project about Alzheimer's Disease/biospecimen_experiment_2/fileD.txt
File (syn60115498): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz
File (syn60115513): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz
File (syn60115526): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz
File (syn60115539): My uniquely named project about Alzheimer's Disease/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz
Results¶
Now that you have created your files you'll be able to inspect this on the Files tab of your project in the synapse web UI. It should look similar to:

Source code for this tutorial¶
Click to show me
"""
Here is where you'll find the code for the File tutorial.
"""
# Step 1: Upload several files to Synapse
import os
import synapseclient
import synapseutils
from synapseclient.models import File, Folder, Project
syn = synapseclient.login()
# Retrieve the project ID
my_project = Project(name="My uniquely named project about Alzheimer's Disease").get()
# Retrieve the IDs of the folders I want to upload to
batch_1_folder = Folder(
parent_id=my_project.id, name="single_cell_RNAseq_batch_1"
).get()
batch_2_folder = Folder(
parent_id=my_project.id, name="single_cell_RNAseq_batch_2"
).get()
biospecimen_experiment_1_folder = Folder(
parent_id=my_project.id, name="biospecimen_experiment_1"
).get()
biospecimen_experiment_2_folder = Folder(
parent_id=my_project.id, name="biospecimen_experiment_2"
).get()
# Create a File object for each file I want to upload
biospecimen_experiment_1_a_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileA.txt"),
parent_id=biospecimen_experiment_1_folder.id,
)
biospecimen_experiment_1_b_2022 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_1/fileB.txt"),
parent_id=biospecimen_experiment_1_folder.id,
)
biospecimen_experiment_2_c_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileC.txt"),
parent_id=biospecimen_experiment_2_folder.id,
)
biospecimen_experiment_2_d_2023 = File(
path=os.path.expanduser("~/my_ad_project/biospecimen_experiment_2/fileD.txt"),
parent_id=biospecimen_experiment_2_folder.id,
)
batch_1_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.gz"
),
parent_id=batch_1_folder.id,
)
batch_1_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.gz"
),
parent_id=batch_1_folder.id,
)
batch_2_scrnaseq_file_1 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.gz"
),
parent_id=batch_2_folder.id,
)
batch_2_scrnaseq_file_2 = File(
path=os.path.expanduser(
"~/my_ad_project/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.gz"
),
parent_id=batch_2_folder.id,
)
# Upload each file to Synapse
biospecimen_experiment_1_a_2022.store()
biospecimen_experiment_1_b_2022.store()
biospecimen_experiment_2_c_2023.store()
biospecimen_experiment_2_d_2023.store()
batch_1_scrnaseq_file_1.store()
batch_1_scrnaseq_file_2.store()
batch_2_scrnaseq_file_1.store()
batch_2_scrnaseq_file_2.store()
# Step 2: Print stored attributes about your file
batch_1_scrnaseq_file_1_id = batch_1_scrnaseq_file_1.id
print(f"My file ID is: {batch_1_scrnaseq_file_1_id}")
print(f"The parent ID of my file is: {batch_1_scrnaseq_file_1.parent_id}")
print(f"I created my file on: {batch_1_scrnaseq_file_1.created_on}")
print(
f"The ID of the user that created my file is: {batch_1_scrnaseq_file_1.created_by}"
)
print(f"My file was last modified on: {batch_1_scrnaseq_file_1.modified_on}")
# Step 3: List all Folders and Files within my project
my_project.sync_from_synapse(download_file=False)
dir_mapping = my_project.map_directory_to_all_contained_files("./")
for directory_name, file_entities in dir_mapping.items():
print(f"Directory: {directory_name}")
for file_entity in file_entities:
print(f"\tFile: {file_entity.name}, ID: {file_entity.id}")