Implementation of ROBERT on top of Transformers and Flux

- This implementation is the same as `Transformers.Bert` with a tiny embeddings tweaks.
- [RoBERTa](https://arxiv.org/abs/1907.11692) has the same architecture as BERT, but uses a byte-level BPE(implemented in `BPE.jl`) as a tokenizer (same as GPT-2) and uses a different pre-training scheme.
- RoBERTa doesn’t have token_type_ids, you don’t need to indicate which token belongs to which segment. Just separate your segments with the separation (or `</s>`)

we can also wrapper [Camembert](https://camembert-model.fr/) (or the french version of BERT) around RoBERT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation of ROBERT on top of Transformers and Flux #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation of ROBERT on top of Transformers and Flux #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions