Forward arbitrary kwargs to remote blocks#467
Open
justheuristic wants to merge 40 commits intomainfrom
Open
Conversation
added 6 commits
August 17, 2023 01:59
Collaborator
Author
|
note 2 self: old client runs backward with inputs that do not require_grad, we must support that! |
Collaborator
Author
|
note 2self: on wake up, do
|
justheuristic
commented
Sep 6, 2023
| if attempt_no >= 1: | ||
| _, backup_inputs, backup_sequences = await sequential_forward( | ||
| inputs, prompts, sequence_manager, start_index=span.start, end_index=span.end | ||
| sequence_manager, inputs, prompts, start_index=span.start, end_index=span.end |
Collaborator
Author
There was a problem hiding this comment.
subjective matter: sequence_manager is the first parameter to most internal functions; can rollback if the reviewer disagrees.
justheuristic
commented
Sep 6, 2023
| value = value[:, offset : offset + max_chunk_length] | ||
| kwargs_chunk[key] = value | ||
| return kwargs_chunk | ||
|
|
Collaborator
Author
There was a problem hiding this comment.
Note: this is a potential problem; not all tensors where shape[-2] == seq_len can be time-sliced.
Counter-example: a LoRA adapter might accidentally have it's rank equal to sequence length
justheuristic
commented
Sep 6, 2023
| @staticmethod | ||
| def forward(ctx, inputs: torch.Tensor, prompts: torch.Tensor, sequence_manager: RemoteSequenceManager): | ||
| def forward(ctx, sequence_manager: RemoteSequenceManager, inputs: torch.Tensor, prompts: torch.Tensor): | ||
| # TODO add kwargs here; figure out a way to split kwargs across servers |
Collaborator
Author
There was a problem hiding this comment.
problem: how do we split args/kwargs into sub-batches?
# Conflicts: # src/petals/__init__.py # src/petals/client/inference_session.py
Collaborator
|
@justheuristic solemnly swears to
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NB: this pull request makes several drastic changes to the backend, block_functions and pools. It might be better if I walk you through before the review. On a related note, if it interferes with long-term plans for the codebase, please raise a concern - i'm happy to rollback any detrimetnal changes.
Why this exists:
and expect that the outputs are the same
output_with_lora = internal_model_interface.forward(inputs, **lora_adapters)output = internal_model_interface.forward(inputs, layer_past=make_method_dependent_tensors())output_with_lora = internal_model_interface.forward(inputs, **ia3_state_dict)What does this PR contain
New functionality
Internal codebase changes:
RemoteSequenceManager.get_request_metadata now always accepts (server_id, protocol, block_uids, args, kwargs) in that order
client-side code: packing args/kwargs and forming metadata was moved from sequential_autograd to remote_forward_backward
Task size is now specified explicitly in block_functions
Task and PrioritizedTaskPool support kwargs
, and therefore, this pull request does not make server-side batching any more complicated than it already is
Notable missing functionality
(implementation issue) _RemoteSequentialAutogradFunction can't split sub-batches with kwargs
(implementation issue) InferenceSession only accepts kwargs during it's creation
Tests & sanity checks
Sanity checks:
CI tests