Hi everyone!
I am looking for all kind of suggestions for sharing raw genetic data from one institution to another, considering DUAs are in order but the rest is a grey area!
There are not specific rules on the how to, so I am looking for suggestions on how to share large amount of data ( around 900 GB) in a safe (and preferably cheap) way.
Any prior experience on protecting downloads? Amazon Cloud or other servers you suggest? Any advice welcomed!
Thanks
I would start by checking with the IT department at the host and recipient institutions to see if either has a data sharing plan in place - they may already have a server or some other process for this. Sometimes it is part of the DUA. My institution has an Aspera server set up for this purpose, for example (though there are other options as well).
Aside from that… Google cloud (GCS) has a whitepaper on the topic from a data security standpoint.
Not sure if the data analysis will be in the cloud too or you just want to download - egress fees shouldn’t be too bad for downloading 900 GB from the cloud, maybe around $100.
@paularp , your IT administrator should be able to set up a FTP (or scp, or sftp) transfer between the host/recipient servers
Thank you! Sadly no plan was available, we did the DUA writing and we had to be very vague in that because of that. Only download will be cloud based. Thank you for the link! I think it will be very useful
I realize this post is a few years old, but for the sake of anyone else coming across this post with the same need, I figured I’d mention another capability worth considering, for folks using cloud data platforms (Snowflake, BigQuery, Databricks, etc.) which is their respective native data sharing capabilities, that require zero data transfer (i.e. no egress/ingress fees).
And for a more agnostic solution that works across platforms, Bobsled is supposed to be a good product (I’ve never used it)
Honestly this still comes handy! Thanks so much!