Draft
Conversation
reformatted and added method to split and directly create vertical federated dataset
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Contributor
Author
|
To Resolve #81 |
Updated split_data_create_vertical_dataset to match with current dataset classes (i.e. samplesetwithlabels).
TTitcombe
requested changes
Jan 9, 2021
| import datasets | ||
|
|
||
|
|
||
| """I think this is not needed anymore""" |
Member
There was a problem hiding this comment.
Do you mean we don't need the partitioned dataloader?
Contributor
Author
There was a problem hiding this comment.
No, I mean that the default pytorch dataloader in PyTorch works, so we do not need a custom one (for how it is done now). See the notebook for an example.
| self.values = torch.Tensor(values) if is_labels else torch.stack(values) | ||
|
|
||
| self.worker_id = None | ||
| if worker_id != None: |
Member
There was a problem hiding this comment.
that can simplify to if worker_id:
| fmt_str = "FederatedDataset\n" | ||
| fmt_str += f" Distributed accross: {', '.join(str(x) for x in self.workers)}\n" | ||
| fmt_str += f" Number of datapoints: {self.__len__()}\n" | ||
| return fmt_str No newline at end of file |
Member
There was a problem hiding this comment.
newline at the end of the file
| self.dataset = dataset #It can also be None, and then it would be only computational | ||
| self.model = model | ||
|
|
||
| self.level = level if level >= 0 else 0 #it should start from zero, otherwise throw error #TODO: implement error throwing |
| This code is meant to be used with dual-headed Neural Networks, where there are a bunch of different workers, | ||
| which agrees on the labels, and there is a server with the labels only. | ||
| Code built upon: | ||
| - Abbas Ismail's (@abbas5253) work on dual-headed NN. In particular, check Configuration 1: |
Member
There was a problem hiding this comment.
Does this PR require abbas' PR to be merged?
| the third the index, which is to keep track of the same data point. | ||
| """ | ||
|
|
||
| if worker_list == None: |
Member
There was a problem hiding this comment.
if worker_list: or if worker_list is None:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Work in progress pull request for dataloading utils, dataloaders and datasets.
Affected Dependencies
Currently using PySyft 2.0. To be changed to not using PySyft at all, or eventually PySyft 3.0
How has this been tested?
Manually, unit and integration tests to be properly added
Checklist