Looking into github policies we should be able to upload the stripped down dataset and maybe the full size as it would not breech github policy.
see https://help.github.com/articles/working-with-large-files/ and https://help.github.com/articles/conditions-for-large-files/
This would potentially allow us to version the md5/qc associated with the results alongside.