Download Datasets
With this Jupyter Notebook you can use the DownloadDatasets Script to download all datasets, organized as follows:
Claimed/Unclaimed
Projects
Samples
Datasets
You can use a symlink path if datasets should be sorted by the format with symlinks to the original datasets.
You can set the following arguments:
target_path: The target path for the downloaded datasets.
Default: ./saved_datasetsproject_ids: The project ids of the datasets to be downloaded. Has to be integer separated by a comma without spaces. For datasets without projects, use 0 as id. If not set all projects will be downloaded.
Default: ‘’include_metadata: Should be set to True if metadata of datasets, projects and samples should be saved.
Default: Falseduplicate_handling: How an dataset with an existing name should be handled. 1: rename (the ID is appended), 2: overwrite, 3: take first.
Default: 1include_sample_projects: Set to False if projects should be taken from datasets only.
Default: Truesymlink_path: Please specify a path if the datasets should be sorted by the format with symbolic links to the original datasets. Warning: If you’re using a non-Unix-based system, you need administrative privileges to create symbolic links, or you must enable developer mode.
Default: None
To avoid re-downloading all datasets after an interruption, the script saves the progress in a file called last_state.json. If you want to download all datasets again, you need to delete the last_state.json file.
To run this notebook you need a Jupyter kernel. A kernel is a Python environment that executes the code from your notebook. If no Jupyter kernel is set up yet, you can register one with the following command in the terminal:
python -m ipykernel install --user --name env_name --display-name "YourName"
from LOGS_solutions.DownloadDatasets.main import main, parse_args
import sys
sys.argv = ["main.py",
"--target_path", "./data",
"--project_ids", "1,2,3",
"--include_metadata", "false",
"--duplicate_handling", "1",
"--include_sample_projects", "true",
#"--symlink_path", "./symlinks"
]
args = parse_args()
main(args)