Forward arbitrary kwargs to remote blocks by justheuristic · Pull Request #467 · bigscience-workshop/petals

justheuristic · 2023-08-17T00:19:49Z

NB: this pull request makes several drastic changes to the backend, block_functions and pools. It might be better if I walk you through before the review. On a related note, if it interferes with long-term plans for the codebase, please raise a concern - i'm happy to rollback any detrimetnal changes.

Why this exists:

So that user would be able to

outputs = model.forward(hidden_states,
                        attention_mask=something,
                        layer_past=my_learned_deep_prompts,
                        position_ids=..., **anything_ellse)
loss(outputs).backward()
assert my_learned_deep_prompts[0][0].grad is not None

and expect that the outputs are the same

So that we can integrate all peft tuners
- LoRA: output_with_lora = internal_model_interface.forward(inputs, **lora_adapters)
- prefix- and P-tuning: output = internal_model_interface.forward(inputs, layer_past=make_method_dependent_tensors())
- IA3: output_with_lora = internal_model_interface.forward(inputs, **ia3_state_dict)
- that other method they're gonna add in the future

What does this PR contain

New functionality

servers will now support arbitrary kwargs for transformers blocks
- user may provide either one set of kwargs or a different set of kwargs for each block
servers can now backprop w.r.t. additional keyword args and return gradients
low-level client interface supports forwarding arbitrary kwargs
user-facing client interface supports passing kwargs directly

Internal codebase changes:

RemoteSequenceManager.get_request_metadata now always accepts (server_id, protocol, block_uids, args, kwargs) in that order
- This potentially breaks backward compatibility; gotta check with @borzunov on how many people it would affect
client-side code: packing args/kwargs and forming metadata was moved from sequential_autograd to remote_forward_backward
- why: to reduce the number of low-level things (like serialization) that sequence-level code cares about
Task size is now specified explicitly in block_functions
Task and PrioritizedTaskPool support kwargs
- note: if we (eventually) implement server-side batching, we can only batch queries if they have the same input schema anyway
  , and therefore, this pull request does not make server-side batching any more complicated than it already is

Notable missing functionality

(implementation issue) _RemoteSequentialAutogradFunction can't split sub-batches with kwargs
- problem: how do we know which kwargs to split across batch dimension
- layer_past and attention_mask should be split across batch dimension
- adapters and head_mask should not be split across batch dimension
- possible convention: if something.shape[0] == batch_size, it should be split over dim 0
  - bad: if batch_size == LoRA rank and we split the adapters >.<
- possible convention: user provides batch_kwargs and global_kwargs, manually specifying which kwarg is split
- possible convention: split if kwargs_schema[key] is BatchTensorProto; replicate if it is just a TensorProto
(implementation issue) InferenceSession only accepts kwargs during it's creation
- to match the HF behavior, we should support kwargs during any inference steps
- __problem 1: __ if a given server fails and we find a replacement, how should we playback the previous kwargs?
  - what if different past previous use different sets of kwargs?
  - possible solution: merge any two steps that have (1) matching kwargs up to IS and (2) no single kwarg has batch dimension
- problem 2: how do we handle pushes (direct server-to-server communication) if next server requires kwargs?
  - possible solution: as a client, disable push for [this specific inference step] if the next server has non-empty kwargs
  - possible solution (2): implement "partial messages" so the client can broadcast all kwargs ahead of time, but the servers would still await
  - possible solution (3): implement a special inference type message that does NOT trigger inference step, but can change metadata/kwargs; this type of message is sent by client to all servers in parallel.

Tests & sanity checks

Sanity checks:

CI tests

everything that used to work works again
forward and backward with attention_mask - check exact match
inference with attention mask chunking - check exact match
full model exact match with attention mask
block-level backward with non-differentiable kwargs
block-level backward with differentiable kwargs
find a CI test that splits forward/backward batch into sub-batches or write a new one

justheuristic · 2023-08-25T14:59:39Z

note 2 self: old client runs backward with inputs that do not require_grad, we must support that!

justheuristic · 2023-09-06T00:35:05Z

note 2self: on wake up, do

add args/kwargs partitioning in _RemoteSequentialAutogradFunction
- strategy: if tensor.shape[0] == batch_size, split; otherwise, replicate
- add some documentation on how that works

modify inference_session.py to save past args/kwargs and re-send them on server failure
from first step
from intermediate steps
Q: how do we merge _ServerInferenceSession.history if it uses different kwargs between steps

handle prefix length correctly if first input batch contains layer_past
in petals.server.handler
in petals.client.inference_session

justheuristic · 2023-09-06T03:16:01Z

src/petals/client/sequential_autograd.py

                if attempt_no >= 1:
                    _, backup_inputs, backup_sequences = await sequential_forward(
-                        inputs, prompts, sequence_manager, start_index=span.start, end_index=span.end
+                        sequence_manager, inputs, prompts, start_index=span.start, end_index=span.end


subjective matter: sequence_manager is the first parameter to most internal functions; can rollback if the reviewer disagrees.

justheuristic · 2023-09-06T03:20:51Z

src/petals/server/backend.py

+                value = value[:, offset : offset + max_chunk_length]
+            kwargs_chunk[key] = value
+        return kwargs_chunk
+


Note: this is a potential problem; not all tensors where shape[-2] == seq_len can be time-sliced.

Counter-example: a LoRA adapter might accidentally have it's rank equal to sequence length

justheuristic · 2023-09-06T03:27:47Z

src/petals/client/sequential_autograd.py

    @staticmethod
-    def forward(ctx, inputs: torch.Tensor, prompts: torch.Tensor, sequence_manager: RemoteSequenceManager):
+    def forward(ctx, sequence_manager: RemoteSequenceManager, inputs: torch.Tensor, prompts: torch.Tensor):
+        # TODO add kwargs here; figure out a way to split kwargs across servers


problem: how do we split args/kwargs into sub-batches?

# Conflicts: # src/petals/__init__.py # src/petals/client/inference_session.py

dvmazur · 2023-12-02T21:47:48Z

@justheuristic solemnly swears to

show a proof that forwarding kwargs works in a basic test that is easy to follow
show an example of how new clients can work with old servers, at least for all supported basic ops
show an example of how old clients can work with new servers

Your Name added 6 commits August 17, 2023 01:59

typos

1879788

WIP

f313730

mwp

ed8d7f4

black-isort

fb9b211

black-isort

355c150

priority pool

084d565

justheuristic requested a review from artek0chumak August 17, 2023 01:43

justheuristic marked this pull request as draft August 17, 2023 01:45

justheuristic changed the title ~~Forward arbitrary kwargs~~ Forward arbitrary kwargs to remote blocks Aug 17, 2023

Your Name and others added 5 commits August 22, 2023 15:20

wip (again)

65e8739

wip (again)

13c13d3

black, isort

4529471

undo debug change

d51c08e

Merge branch 'main' into forward_kwargs

1e5df29

justheuristic marked this pull request as ready for review August 22, 2023 13:03

justheuristic and others added 2 commits August 24, 2023 15:53

Merge branch 'main' into forward_kwargs

22bcbb3

serialize outputs structure

09e9da6

Your Name and others added 12 commits August 28, 2023 06:03

WIP, switching to another PR

84ebd57

undo

49ff759

Merge branch 'main' into forward_kwargs

ce89b64

Merge remote-tracking branch 'origin/main' into forward_kwargs

6256995

rollback: only generic kwarg

6c7f762

minimize diff

cc4fe17

add docstr

2e76031

WIP BEFORE MEETING NEED BACKWARD UPDATE

e5c2d8e

wip some more

49474e5

1isort

4393d99

more WIP

465fd93

make it work for fwd, bwd

f204965

Your Name added 3 commits September 6, 2023 01:28

black-isort

b7bd477

mention reference issue

9e29140

black-isort-clarify

17d278e

justheuristic requested a review from borzunov September 5, 2023 23:56

Your Name added 4 commits September 6, 2023 03:13

check num block kwargs

62e780c

pass args/kwargs via forward

aacd8b2

standardize checking block_kwargs

056cd77

probably break everyting

a23bd73

Your Name added 6 commits September 6, 2023 03:35

note

68b8cea

standardize: s/backend_kwargs/block_kwargs/g everywhere

8eb1722

black+isort

3bffcde

unbreak everything

721f7d2

rollback

3048c3b

temporary rollback: allow kwargs only at first inference step

3f06b53

justheuristic commented Sep 6, 2023

View reviewed changes

reduce diff

c665c42

justheuristic commented Sep 6, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into forward_kwargs

3195579

# Conflicts: # src/petals/__init__.py # src/petals/client/inference_session.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Forward arbitrary kwargs to remote blocks#467

Forward arbitrary kwargs to remote blocks#467
justheuristic wants to merge 40 commits intomainfrom
forward_kwargs

justheuristic commented Aug 17, 2023 •

edited

Loading

Uh oh!

justheuristic commented Aug 25, 2023

Uh oh!

justheuristic commented Sep 6, 2023 •

edited

Loading

Uh oh!

justheuristic Sep 6, 2023

Uh oh!

justheuristic Sep 6, 2023

Uh oh!

justheuristic Sep 6, 2023

Uh oh!

dvmazur commented Dec 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

justheuristic commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this exists:

What does this PR contain

Notable missing functionality

Tests & sanity checks

Uh oh!

justheuristic commented Aug 25, 2023

Uh oh!

justheuristic commented Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justheuristic Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

justheuristic Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

justheuristic Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

dvmazur commented Dec 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

justheuristic commented Aug 17, 2023 •

edited

Loading

justheuristic commented Sep 6, 2023 •

edited

Loading