You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in general:
There were discussion on the nipy mailing list about making installable python dataset packages, which makes sense if users will want to use most of the data available or they don't get too large, but not so much if we want to use just a few datasets as in rdatasets.
I didn't pay a lot of attention to the details of dataset packages and meta information. For now the rdataset pattern plus our datasets inside statsmodels seems to be enough.
It's possible to rethink this in future if someone is interested. I saw that there are also related datset packages for Julia (one of them a translation of Vincent's rdatasets) which will have similar installation and license/copyright questions as we do.
On specific question:
Is the Hanes .gz file an archive with a single csv file or does it have a collection of csv files?
What's the advantage of using an archive instead of a plain csv file?
I'm fine either way, but AFAIK, we would have to write the py2/py3 compatible helper functions to get the data from an archive file. (The statespace notebooks are doing that, and it was what triggered me into looking at creating smdatasets)
There's just a single file in there, in csv format. It's only compressed to save space/bandwidth.
I don't feel strongly about this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Not sure what you have in mind in terms of data format, meta data, etc. Let me know and I will revise the PR as needed.