Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces Base64 encoding/decoding support to stdlib, integrating it into the strings component and adding a dedicated test suite (per issue #916).
Changes:
- Added new
stdlib_base64module implementingbase64_encode(generic over numeric/logical types) andbase64_decode. - Re-exported
base64_encode/base64_decodeviastdlib_strings. - Added Base64 unit tests and wired them into the string test CMake configuration.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/strings/stdlib_base64.fypp |
New Base64 implementation (encode/decode). |
src/strings/stdlib_strings.fypp |
Re-exports Base64 APIs from stdlib_base64. |
src/strings/CMakeLists.txt |
Adds stdlib_base64.fypp to strings build sources. |
test/string/test_base64.f90 |
Adds unit tests for known vectors, whitespace handling, invalid input, and roundtrips. |
test/string/CMakeLists.txt |
Registers the new base64 test target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@jvdp1 Could you please have a look? I have tackled all possible issues. Working on a performance improvement using avx2 specific instructions which was the ultimate goal of the resources shared in the issue. Upon completion, would increase speeds drastically. Since fortran is depndent on autovectorizers, it may not always lead to the optimall register management as can be controlled using C intrinsics. Will still have a architecture fallback. Architecture specific kernels can be added to increase performance in future. |
jvdp1
left a comment
There was a problem hiding this comment.
thank you @RatanKokal for this PR. Overall it looks already very good to me.
Maybe some general comments:
- I suggest to use
state_type(provided bystdlib_errorto handle errors - I think that some procedures could be declared
pure - Is there any reason to prefer
moduleinstead of usingsubmodule?
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1171 +/- ##
==========================================
+ Coverage 68.04% 68.86% +0.82%
==========================================
Files 404 409 +5
Lines 12948 13713 +765
Branches 1395 1551 +156
==========================================
+ Hits 8810 9443 +633
- Misses 4138 4270 +132 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: RatanKokal <ratanskokal@gmail.com>
…ulator Replaced O(N) string searches with an instant Look-Up Table and implemented a 32-bit accumulator to process data in a single pass. Eliminated multi-pass string packing, removed helper function overhead (dead code elimination), and pre-allocated memory to prevent expensive reallocations. Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses reviewer feedback regarding memory safety, type portability, and documentation for the pure Fortran implementation. Key changes: - API Alignment: Updated base64_encode_into to mirror the decoder API by returning encoded_len and error_flag. - Memory Safety: Added explicit blank-filling and strict bounds checking to the encoder to prevent silent buffer overflows. - Branchless Bounds Checking: Implemented an iand(..., 255_int32) mask in the decoder to safely handle non-ASCII codepoints. - Strict Portability: Enforced int32 type consistency across all bitwise intrinsics and literal masks. - Documentation: Added FORD headers to the umbrella module. Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit hardens the base64 implementation by addressing standard compliance issues and optimizing the hot paths for the zero-copy API.
Key changes:
* Compliance: Replaced non-conforming 'c_loc' and 'c_f_pointer' usage with 'transfer' and 'select rank' blocks to ensure compatibility with non-C-interoperable types.
* Correctness: Fixed a Unicode aliasing bug in the decoder by implementing a bitwise-OR 'unicode_accum' check to reject non-ASCII characters (>255).
* Performance: Optimized 'base64_encode_into' by moving string blank-filling ('str = ""') off the hot path, preventing unnecessary memory operations during high-throughput encoding.
* API Refinement: Restricted the 'base64_encode_into' power-user API to accept only 'integer(int8)' arrays to guarantee zero-copy performance while remaining standard-compliant.
* Ergonomics: Added an optional 'error_flag' to 'base64_decode' to improve error handling for the allocating API.
* Testing: Expanded unit tests to include coverage for the new subroutine signatures and error states.
Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses the architectural and stylistic feedback for the Base64 implementation: * Architecture: Consolidated interfaces into a single stdlib_base64 parent module and moved the high-performance logic into encode and decode submodules to prevent compilation cascades. * Standards: Replaced the logical error flag with type(state_type) from stdlib_error for consistent library-wide error handling. * Optimization: Marked the core _into subroutines as pure to guarantee no side effects and allow for aggressive compiler optimizations. * Dependency: Imported base64_alphabet directly from stdlib_ascii to reduce redundancy. * Documentation: Added explanatory comments for the branchless DT and IS_VAL lookup tables. * Formatting: Expanded semicolon-separated multi-statement lines into single lines to improve debuggability and prevent CI truncation errors. Signed-off-by: RatanKokal <ratanskokal@gmail.com>
|
Hey @jvdp1, could you please have a look. I have encorporated all the feedback. |
jvdp1
left a comment
There was a problem hiding this comment.
Thank you @RatanKokal . this PR is taking a good shape. Below are a few comments. Could you also add the specs, please?
|
|
||
| #:for k1, t1, _, _ in REAL_KINDS_TYPES | ||
| module function base64_encode_real_${k1}$(data) result(str) | ||
| ${t1}$, intent(in), target, contiguous :: data(..) |
There was a problem hiding this comment.
I agree, target is not needed here, please remove.
The target attribute in fortran has a very specific use and it is to signal that a variable will be pointed to.
|
|
||
| #:for k1, t1, _ in CMPLX_KINDS_TYPES | ||
| module function base64_encode_cmplx_${k1}$(data) result(str) | ||
| ${t1}$, intent(in), target, contiguous :: data(..) |
There was a problem hiding this comment.
| ${t1}$, intent(in), target, contiguous :: data(..) | |
| ${t1}$, intent(in), contiguous :: data(..) |
There was a problem hiding this comment.
target is a compiler hint, ennsuring maximum memory efficiency by forcing pass-by-reference.
There was a problem hiding this comment.
Where in the fortran standard does it specify such conclusion?
target has a very specific purpose: signaling the compiler that a variable can be "pointed to" by a pointer variable. Telling the compiler that a variable is pontetially a target, can prevent agressive optimization.
Also, in Fortran, arguments are passed by reference.
Please remove this attribute.
| module function base64_encode_bytes(bytes) result(str) | ||
| integer(int8), intent(in), target, contiguous :: bytes(:) | ||
| character(len=:), allocatable, target :: str | ||
| integer(c_size_t) :: nbytes |
There was a problem hiding this comment.
Could you add a comment to explain why nbytes must be of type c_size_t?
There was a problem hiding this comment.
I think this simply requires integer(int64), there is no need to invoke iso_c_binding here.
There was a problem hiding this comment.
I agree with @jalvesz : use integer(int64) instead of c_size_t
- use MAXRANK-based rank generation (including rank-0 scalar support) - align c_bool import with stdlib_kinds and keep iso_c_binding for c_size_t - derive base64 alphabet from letters and digits constants - simplify decode despace loop by skipping filtered-out bytes directly - tidy roundtrip tests: parameterized constants and 132-char-safe line split Signed-off-by: RatanKokal <ratanskokal@gmail.com>
Signed-off-by: RatanKokal <ratanskokal@gmail.com>
|
Hi @jvdp1, |
|
|
||
| ! Branchless RFC 4648 decode map: byte -> 6-bit value, invalid -> -1. | ||
| ! The main loop OR-reduces values and checks once at the end. | ||
| integer(int8), parameter :: DT(0:255) = int( [ & |
There was a problem hiding this comment.
Can you please provide a more explicit name to this string list. base64_dictionary or something of the sorts.
| !> encoded = base64_encode(bytes) ! "TWFu" | ||
| !>``` | ||
| interface base64_encode | ||
| #:for k1, t1, _, _ in REAL_KINDS_TYPES |
There was a problem hiding this comment.
Please align fypp macros to the inner scope where they take effect.
|
|
||
| #:for k1, t1, _ in CMPLX_KINDS_TYPES | ||
| module function base64_encode_cmplx_${k1}$(data) result(str) | ||
| ${t1}$, intent(in), target, contiguous :: data(..) |
There was a problem hiding this comment.
Where in the fortran standard does it specify such conclusion?
target has a very specific purpose: signaling the compiler that a variable can be "pointed to" by a pointer variable. Telling the compiler that a variable is pontetially a target, can prevent agressive optimization.
Also, in Fortran, arguments are passed by reference.
Please remove this attribute.
| module function base64_encode_bytes(bytes) result(str) | ||
| integer(int8), intent(in), target, contiguous :: bytes(:) | ||
| character(len=:), allocatable, target :: str | ||
| integer(c_size_t) :: nbytes |
There was a problem hiding this comment.
I think this simply requires integer(int64), there is no need to invoke iso_c_binding here.
| ! Fortran Scalar Decoder Loop | ||
| do i = 1, slen - 7, 4 | ||
| if (clean_mode) then | ||
| raw_c = int(iachar(str(i:i)), int32) |
There was a problem hiding this comment.
Couple of comments:
int(iachar(str(i:i)), int32)is not required =>iachar(str(i:i), int32). iachar takes a kind optional argument for that exact purpose.- All these expressions could be perfectly vectorized and make the whole code more compact:
if (clean_mode) then
raw_c(1:4) = iachar(str(i:i+3), int32)
v(1:4) = int(DT(iand(raw_c, 255_int32)), int32)
else
v(1:4) = int(DT(iand(int(filtered(i), int32), 255_int32)), int32))
end if
... etcYou could also parametrize the vector register size with :
integer, parameter :: chunk_size = 4
integer(int32) :: v(chunk_size)
... etc| #:endfor | ||
|
|
||
| module function base64_encode_bytes(bytes) result(str) | ||
| integer(int8), intent(in), target, contiguous :: bytes(:) |
There was a problem hiding this comment.
| integer(int8), intent(in), target, contiguous :: bytes(:) | |
| integer(int8), intent(in), contiguous :: bytes(:) |
| end function base64_encode_bytes | ||
|
|
||
| pure module subroutine base64_encode_into(bytes, str, encoded_len, err_state) | ||
| integer(int8), intent(in), target, contiguous :: bytes(:) |
There was a problem hiding this comment.
| integer(int8), intent(in), target, contiguous :: bytes(:) | |
| integer(int8), intent(in), contiguous :: bytes(:) |
|
|
||
| pure module subroutine base64_encode_into(bytes, str, encoded_len, err_state) | ||
| integer(int8), intent(in), target, contiguous :: bytes(:) | ||
| character(len=*), intent(out), target :: str |
There was a problem hiding this comment.
| character(len=*), intent(out), target :: str | |
| character(len=*), intent(out) :: str |
|
|
||
| #:for k1, t1 in LOG_KINDS_TYPES | ||
| module function base64_encode_logical_${k1}$(data) result(str) | ||
| ${t1}$, intent(in), target, contiguous :: data(..) |
There was a problem hiding this comment.
| ${t1}$, intent(in), target, contiguous :: data(..) | |
| ${t1}$, intent(in), contiguous :: data(..) |
|
|
||
| #### Example | ||
|
|
||
| ```fortran |
There was a problem hiding this comment.
Please make examples executable programs and include them within the documentation by importing the file.
You can check other .md files to see how it is done.
| if (nbytes == 0) return | ||
|
|
||
| j = 1 | ||
| do i = 1, int(nbytes) - 2, 3 |
There was a problem hiding this comment.
usinint(nbytes) (that is converting nbytes to implicitely int32) could generate overflows
Adds support for base64
Solves issue: #916
This PR provides a portable, branchless scalar implementation. A follow-up PR is in development to add an AVX2-accelerated backend for supported x86_64 architectures.