New Transformer (General-purpose, Quantization resilient / Enables 1.58-bit ternary training without STE / Long-term memory / Stable convergence), New Optimizer (General-purpose), and New Scheduler (General-purpose) #1964

muooon · 2026-06-05T15:03:37Z

muooon
Jun 5, 2026

Hello everyone,

I am sharing this post to contribute back to the open-source community.
I always have immense respect and gratitude for the pioneers of bits and bytes and their incredible achievements.
Though I'm just an amateur, I have released some general-purpose tools. Each of them is completely independent and designed for general use (not optimized for any specific task).

1, New Transformer Architecture: D-RNA
2, New Optimizer Family: emo series (includes 5 variations)
3, New Scheduler: emoPulse (derived from the emo optimizer)

All of these function well even with ternary (1.58-bit) quantization and maintain high adaptability with other quantization methods as well.
While I’m not certain if this directly serves the bits and bytes ecosystem, I would highly appreciate it if you could take a look at these three projects when you have a moment.

D-RNA : https://github.com/muooon/DRNA
1.58bit sample : https://github.com/muooon/DRNA/tree/drna/158b_train_sample
emo optimizer : https://github.com/muooon/EmoSens
emo scheduler : https://github.com/muooon/EmoSens/tree/v3.9.0_ecc/scheduler

Key features of D-RNA :
It can be used to build security-adaptive models, public keys, private keys, and models. This architecture allows you to freeze the base model and expand it using LoRA in an MoE (Mixture of Experts) fashion.
D-RNA is highly versatile and can be utilized in many different ways.
It maintains compatibility with standard Transformers, meaning you can easily port existing weights over.

Key features of emo optim, emo scheduler :
Both the emo optimizers and the Scheduler feature Auto-LR (Automatic Learning Rate) capabilities.
This mechanism derives the learning rate directly from the Loss rather than measuring gradients, ensuring stable convergence even under quantization constraints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Transformer (General-purpose, Quantization resilient / Enables 1.58-bit ternary training without STE / Long-term memory / Stable convergence), New Optimizer (General-purpose), and New Scheduler (General-purpose) #1964

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

New Transformer (General-purpose, Quantization resilient / Enables 1.58-bit ternary training without STE / Long-term memory / Stable convergence), New Optimizer (General-purpose), and New Scheduler (General-purpose) #1964

Uh oh!

muooon Jun 5, 2026

Replies: 0 comments

muooon
Jun 5, 2026