Skip to content

Conversation

@steven-aerts
Copy link
Contributor

What is the purpose of the change

When avro-c tries to encode an empty byte array it will avro_malloc(0)
which on some architectures will return NULL. Make sure this is not
interpreted as an error or or dereferenced causing a segfault.

Verifying this change

Extended unit tests to validate this case

Documentation

  • Does this pull request introduce a new feature? no

@github-actions github-actions bot added the C label Oct 24, 2025
@martin-g martin-g changed the title Avro 4193 Avro-4193: [c] segfault cannot handle empty byte array Oct 24, 2025
@martin-g
Copy link
Member

Please rebase!

When avro-c tries to encode an empty byte array it will avro_malloc(0)
which on some architectures will return NULL.  Make sure this is not
interpreted as an error or or dereferenced causing a segfault.
@steven-aerts
Copy link
Contributor Author

@martin-g sorry for that, rebased on main.

@martin-g martin-g requested a review from Copilot October 27, 2025 09:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a segmentation fault that occurs when avro-c attempts to encode empty byte arrays. The issue arises because avro_malloc(0) can return NULL on some architectures, which was incorrectly treated as an allocation error and caused null pointer dereferences.

Key changes:

  • Updated NULL checks to distinguish between allocation failures and valid zero-size allocations
  • Added null pointer guard in the test allocator's deallocation path
  • Extended test coverage to validate empty byte array handling

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
lang/c/src/datum.c Modified NULL checks in avro_bytes() and avro_bytes_set() to only treat NULL as an error when size is non-zero
lang/c/src/value-json.c Updated NULL checks in encode_utf8_bytes() and avro_value_to_json_t() to handle zero-length allocations correctly
lang/c/tests/test_avro_data.c Added null pointer guard in test allocator and introduced test_empty_bytes() to verify empty byte array handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


static int test_empty_bytes(void)
{
char bytes[] = { };
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty array initializer { } is a GNU extension and not standard C. Consider using char *bytes = NULL; or char bytes[1]; for better portability across compilers.

Suggested change
char bytes[] = { };
char *bytes = NULL;

Copilot uses AI. Check for mistakes.
avro_datum_decref(datum);
avro_datum_decref(expected_datum);

avro_schema_decref(writer_schema);
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double free: writer_schema is already decremented at line 230, so this second decrement at line 243 will cause a use-after-free error.

Suggested change
avro_schema_decref(writer_schema);

Copilot uses AI. Check for mistakes.
{
char *bytes_copy = (char *) avro_malloc(size);
if (!bytes_copy) {
if (!bytes_copy && size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen now if size is 0?
It looks to me that no matter what is the value of bytes_copy it will never enter the if and it will call memcpy(bytes_copy, bytes, 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the there must be a check for size == 0 as a first step in this function that returns early and avoids any mallocs and memcpys.

}

json_t *result = json_stringn_nocheck((const char *) encoded, encoded_size);
json_t *result = json_stringn_nocheck((const char *) encoded ? encoded : "", encoded_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
json_t *result = json_stringn_nocheck((const char *) encoded ? encoded : "", encoded_size);
json_t *result = json_stringn_nocheck((const char *) encoded ? encoded : "", encoded ? encoded_size : 0);

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a similar code for FIXED at https://github.com/steven-aerts/avro/blob/89fb9db03478b793be7188c5683c6560c2eff115/lang/c/src/value-json.c#L244
Does it need the same improvements ?

{
char *bytes_copy = (char *) avro_malloc(size);
if (!bytes_copy) {
if (!bytes_copy && size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the there must be a check for size == 0 as a first step in this function that returns early and avoids any mallocs and memcpys.

int rval;
char *bytes_copy = (char *) avro_malloc(size);
if (!bytes_copy) {
if (!bytes_copy && size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants