Skip to content

feat: add base64 support#1171

Open
RatanKokal wants to merge 7 commits intofortran-lang:masterfrom
RatanKokal:wip_base64
Open

feat: add base64 support#1171
RatanKokal wants to merge 7 commits intofortran-lang:masterfrom
RatanKokal:wip_base64

Conversation

@RatanKokal
Copy link
Copy Markdown

@RatanKokal RatanKokal commented Mar 31, 2026

Adds support for base64

Solves issue: #916

This PR provides a portable, branchless scalar implementation. A follow-up PR is in development to add an AVX2-accelerated backend for supported x86_64 architectures.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Base64 encoding/decoding support to stdlib, integrating it into the strings component and adding a dedicated test suite (per issue #916).

Changes:

  • Added new stdlib_base64 module implementing base64_encode (generic over numeric/logical types) and base64_decode.
  • Re-exported base64_encode/base64_decode via stdlib_strings.
  • Added Base64 unit tests and wired them into the string test CMake configuration.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/strings/stdlib_base64.fypp New Base64 implementation (encode/decode).
src/strings/stdlib_strings.fypp Re-exports Base64 APIs from stdlib_base64.
src/strings/CMakeLists.txt Adds stdlib_base64.fypp to strings build sources.
test/string/test_base64.f90 Adds unit tests for known vectors, whitespace handling, invalid input, and roundtrips.
test/string/CMakeLists.txt Registers the new base64 test target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/strings/stdlib_base64.fypp Outdated
Comment thread src/strings/stdlib_base64.fypp Outdated
Comment thread src/strings/stdlib_base64.fypp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/strings/stdlib_base64_encode.fypp Outdated
Comment thread src/strings/stdlib_base64_decode.fypp Outdated
Comment thread src/strings/stdlib_base64_decode.fypp
Comment thread src/strings/stdlib_base64_encode.fypp Outdated
Comment thread src/strings/stdlib_base64_encode.fypp Outdated
Comment thread src/strings/stdlib_base64_decode.fypp Outdated
Comment thread test/string/test_base64.f90 Outdated
Comment thread test/string/test_base64.f90
@RatanKokal
Copy link
Copy Markdown
Author

@jvdp1 Could you please have a look? I have tackled all possible issues.

Working on a performance improvement using avx2 specific instructions which was the ultimate goal of the resources shared in the issue. Upon completion, would increase speeds drastically.

Since fortran is depndent on autovectorizers, it may not always lead to the optimall register management as can be controlled using C intrinsics.

Will still have a architecture fallback. Architecture specific kernels can be added to increase performance in future.

Copy link
Copy Markdown
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @RatanKokal for this PR. Overall it looks already very good to me.

Maybe some general comments:

  • I suggest to use state_type (provided by stdlib_error to handle errors
  • I think that some procedures could be declared pure
  • Is there any reason to prefer module instead of using submodule?

Comment thread src/strings/stdlib_base64.fypp Outdated
Comment thread src/strings/stdlib_base64_decode.fypp Outdated
Comment thread src/strings/stdlib_base64_decode.fypp Outdated
Comment thread src/strings/stdlib_base64_decode.fypp Outdated
Comment thread src/strings/stdlib_base64_encode.fypp Outdated
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

❌ Patch coverage is 97.87234% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.86%. Comparing base (dae9d4b) to head (2e7fad9).
⚠️ Report is 52 commits behind head on master.

Files with missing lines Patch % Lines
test/string/test_base64.f90 97.87% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1171      +/-   ##
==========================================
+ Coverage   68.04%   68.86%   +0.82%     
==========================================
  Files         404      409       +5     
  Lines       12948    13713     +765     
  Branches     1395     1551     +156     
==========================================
+ Hits         8810     9443     +633     
- Misses       4138     4270     +132     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
…ulator

Replaced O(N) string searches with an instant Look-Up Table and implemented a 32-bit accumulator to process data in a single pass. Eliminated multi-pass string packing, removed helper function overhead (dead code elimination), and pre-allocated memory to prevent expensive reallocations.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses reviewer feedback regarding memory safety, type portability, and documentation for the pure Fortran implementation.

Key changes:
- API Alignment: Updated base64_encode_into to mirror the decoder API by returning encoded_len and error_flag.
- Memory Safety: Added explicit blank-filling and strict bounds checking to the encoder to prevent silent buffer overflows.
- Branchless Bounds Checking: Implemented an iand(..., 255_int32) mask in the decoder to safely handle non-ASCII codepoints.
- Strict Portability: Enforced int32 type consistency across all bitwise intrinsics and literal masks.
- Documentation: Added FORD headers to the umbrella module.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit hardens the base64 implementation by addressing standard compliance issues and optimizing the hot paths for the zero-copy API.

Key changes:
* Compliance: Replaced non-conforming 'c_loc' and 'c_f_pointer' usage with 'transfer' and 'select rank' blocks to ensure compatibility with non-C-interoperable types.
* Correctness: Fixed a Unicode aliasing bug in the decoder by implementing a bitwise-OR 'unicode_accum' check to reject non-ASCII characters (>255).
* Performance: Optimized 'base64_encode_into' by moving string blank-filling ('str = ""') off the hot path, preventing unnecessary memory operations during high-throughput encoding.
* API Refinement: Restricted the 'base64_encode_into' power-user API to accept only 'integer(int8)' arrays to guarantee zero-copy performance while remaining standard-compliant.
* Ergonomics: Added an optional 'error_flag' to 'base64_decode' to improve error handling for the allocating API.
* Testing: Expanded unit tests to include coverage for the new subroutine signatures and error states.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses the architectural and stylistic feedback for the Base64 implementation:

* Architecture: Consolidated interfaces into a single stdlib_base64 parent module and moved the high-performance logic into encode and decode submodules to prevent compilation cascades.
* Standards: Replaced the logical error flag with type(state_type) from stdlib_error for consistent library-wide error handling.
* Optimization: Marked the core _into subroutines as pure to guarantee no side effects and allow for aggressive compiler optimizations.
* Dependency: Imported base64_alphabet directly from stdlib_ascii to reduce redundancy.
* Documentation: Added explanatory comments for the branchless DT and IS_VAL lookup tables.
* Formatting: Expanded semicolon-separated multi-statement lines into single lines to improve debuggability and prevent CI truncation errors.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
@RatanKokal
Copy link
Copy Markdown
Author

Hey @jvdp1, could you please have a look. I have encorporated all the feedback.

Copy link
Copy Markdown
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @RatanKokal . this PR is taking a good shape. Below are a few comments. Could you also add the specs, please?

Comment thread src/core/stdlib_ascii.fypp Outdated
Comment thread src/strings/stdlib_base64.fypp Outdated
Comment thread src/strings/stdlib_base64_decode.fypp
Comment thread src/strings/stdlib_base64_decode.fypp
Comment thread src/strings/stdlib_base64_decode.fypp Outdated

#:for k1, t1, _, _ in REAL_KINDS_TYPES
module function base64_encode_real_${k1}$(data) result(str)
${t1}$, intent(in), target, contiguous :: data(..)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is target mentioned?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, target is not needed here, please remove.
The target attribute in fortran has a very specific use and it is to signal that a variable will be pointed to.


#:for k1, t1, _ in CMPLX_KINDS_TYPES
module function base64_encode_cmplx_${k1}$(data) result(str)
${t1}$, intent(in), target, contiguous :: data(..)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${t1}$, intent(in), target, contiguous :: data(..)
${t1}$, intent(in), contiguous :: data(..)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target is a compiler hint, ennsuring maximum memory efficiency by forcing pass-by-reference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the fortran standard does it specify such conclusion?
target has a very specific purpose: signaling the compiler that a variable can be "pointed to" by a pointer variable. Telling the compiler that a variable is pontetially a target, can prevent agressive optimization.

Also, in Fortran, arguments are passed by reference.

Please remove this attribute.

module function base64_encode_bytes(bytes) result(str)
integer(int8), intent(in), target, contiguous :: bytes(:)
character(len=:), allocatable, target :: str
integer(c_size_t) :: nbytes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to explain why nbytes must be of type c_size_t?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this simply requires integer(int64), there is no need to invoke iso_c_binding here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jalvesz : use integer(int64) instead of c_size_t

Comment thread test/string/test_base64.f90 Outdated
Comment thread test/string/test_base64.f90 Outdated
- use MAXRANK-based rank generation (including rank-0 scalar support)
- align c_bool import with stdlib_kinds and keep iso_c_binding for c_size_t
- derive base64 alphabet from letters and digits constants
- simplify decode despace loop by skipping filtered-out bytes directly
- tidy roundtrip tests: parameterized constants and 132-char-safe line split

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
@RatanKokal RatanKokal requested a review from jvdp1 April 13, 2026 08:58
Signed-off-by: RatanKokal <ratanskokal@gmail.com>
@RatanKokal
Copy link
Copy Markdown
Author

Hi @jvdp1,
I have implemented most of the changes as per the comments.
Also updated the specs.


! Branchless RFC 4648 decode map: byte -> 6-bit value, invalid -> -1.
! The main loop OR-reduces values and checks once at the end.
integer(int8), parameter :: DT(0:255) = int( [ &
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a more explicit name to this string list. base64_dictionary or something of the sorts.

!> encoded = base64_encode(bytes) ! "TWFu"
!>```
interface base64_encode
#:for k1, t1, _, _ in REAL_KINDS_TYPES
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align fypp macros to the inner scope where they take effect.


#:for k1, t1, _ in CMPLX_KINDS_TYPES
module function base64_encode_cmplx_${k1}$(data) result(str)
${t1}$, intent(in), target, contiguous :: data(..)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the fortran standard does it specify such conclusion?
target has a very specific purpose: signaling the compiler that a variable can be "pointed to" by a pointer variable. Telling the compiler that a variable is pontetially a target, can prevent agressive optimization.

Also, in Fortran, arguments are passed by reference.

Please remove this attribute.

module function base64_encode_bytes(bytes) result(str)
integer(int8), intent(in), target, contiguous :: bytes(:)
character(len=:), allocatable, target :: str
integer(c_size_t) :: nbytes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this simply requires integer(int64), there is no need to invoke iso_c_binding here.

! Fortran Scalar Decoder Loop
do i = 1, slen - 7, 4
if (clean_mode) then
raw_c = int(iachar(str(i:i)), int32)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments:

  • int(iachar(str(i:i)), int32) is not required => iachar(str(i:i), int32). iachar takes a kind optional argument for that exact purpose.
  • All these expressions could be perfectly vectorized and make the whole code more compact:
            if (clean_mode) then
                raw_c(1:4) = iachar(str(i:i+3), int32)
                v(1:4) = int(DT(iand(raw_c, 255_int32)), int32)
            else
                v(1:4) = int(DT(iand(int(filtered(i), int32), 255_int32)), int32))
            end if
            ... etc

You could also parametrize the vector register size with :

integer, parameter :: chunk_size = 4
integer(int32) :: v(chunk_size)
... etc

#:endfor

module function base64_encode_bytes(bytes) result(str)
integer(int8), intent(in), target, contiguous :: bytes(:)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
integer(int8), intent(in), target, contiguous :: bytes(:)
integer(int8), intent(in), contiguous :: bytes(:)

end function base64_encode_bytes

pure module subroutine base64_encode_into(bytes, str, encoded_len, err_state)
integer(int8), intent(in), target, contiguous :: bytes(:)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
integer(int8), intent(in), target, contiguous :: bytes(:)
integer(int8), intent(in), contiguous :: bytes(:)


pure module subroutine base64_encode_into(bytes, str, encoded_len, err_state)
integer(int8), intent(in), target, contiguous :: bytes(:)
character(len=*), intent(out), target :: str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
character(len=*), intent(out), target :: str
character(len=*), intent(out) :: str


#:for k1, t1 in LOG_KINDS_TYPES
module function base64_encode_logical_${k1}$(data) result(str)
${t1}$, intent(in), target, contiguous :: data(..)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${t1}$, intent(in), target, contiguous :: data(..)
${t1}$, intent(in), contiguous :: data(..)


#### Example

```fortran
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make examples executable programs and include them within the documentation by importing the file.
You can check other .md files to see how it is done.

if (nbytes == 0) return

j = 1
do i = 1, int(nbytes) - 2, 3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usinint(nbytes) (that is converting nbytes to implicitely int32) could generate overflows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants