Skip to content

Implement GELU Activation Function (Approximation)#2177

Closed
MaximilianSchreff wants to merge 7 commits intoapache:mainfrom
MaximilianSchreff:gelu
Closed

Implement GELU Activation Function (Approximation)#2177
MaximilianSchreff wants to merge 7 commits intoapache:mainfrom
MaximilianSchreff:gelu

Conversation

@MaximilianSchreff
Copy link
Contributor

This PR introduces the Gaussian Error Linear Unit (GELU) activation function to SystemDS as a built-in operation. The implementation uses the widely adopted approximate formulation (https://arxiv.org/abs/1606.08415).

This PR is part of a series of PRs to support famous Transformer architectures in SystemDS. The GELU activation the most commonly used activation functions in models like BERT and GPT.

Includes

  • Forward pass
  • Backward pass

Testing:

Added two simple test cases comparing the forward pass and backward pass results against PyTorch's implementation for correctness.

  • The tests validate:
    • Forward pass against PyTorch's torch.nn.functional.gelu.
    • Backward pass against PyTorch's torch.autograd.grad.

@phaniarnab
Copy link
Contributor

Thanks @MaximilianSchreff. I approved the tests to run.

@phaniarnab
Copy link
Contributor

Thanks for the patch @MaximilianSchreff. I will merge it in the next days.

@phaniarnab
Copy link
Contributor

@MaximilianSchreff, this PR is merged now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants