Skip to content
/ server Public

Comments

MDEV-31414: Implement optional lengths for string types#4551

Open
OsamaNabih wants to merge 1 commit intoMariaDB:mainfrom
OsamaNabih:main
Open

MDEV-31414: Implement optional lengths for string types#4551
OsamaNabih wants to merge 1 commit intoMariaDB:mainfrom
OsamaNabih:main

Conversation

@OsamaNabih
Copy link

@OsamaNabih OsamaNabih commented Jan 17, 2026

Jira: https://jira.mariadb.org/browse/MDEV-31414
Adding support for SQL standard 2023's T081: Optional string types maximum length

This PR allows the user to not specify a length for a VARCHAR field. We default to making the field a TEXT type.

@CLAassistant
Copy link

CLAassistant commented Jan 17, 2026

CLA assistant check
All committers have signed the CLA.

@FooBarrior
Copy link
Contributor

Hi,

Is 16383 reported as the maximum due to collation choice?

That is a nice guess!

Our documentation claims it's 21,844 characters, though I think it's a bit outdated, since we support 4-byte encodings for some time. So yes, it's got to be UINT16_MAX/4 - header_length, where header_length is something like 3.

Copy link
Contributor

@FooBarrior FooBarrior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow the TDD paradigm - every feature should be covered with tests (and every bugfix, if possible). Include both .test and .result files in the commit.

I'm fine with a hard constant of 16383 in scope of this small work, we can always extend it and make a function of charset any time we want, since 16383 is expected to be the minimum of the maximums.

@FooBarrior
Copy link
Contributor

FooBarrior commented Jan 17, 2026

You can try to make tha max value variable if you want. Though it may require some research.

The problem is that we have (at least) three layers of charset settings:

  • system-wide
  • per-table
  • per-field

One overrides another. At some point, Filed would get a deduced value of charset, and perhaps Create_field would get it, too. Though it's hard to say, when exactly. Create_field has charset field in its parent Column_definition_attributes, you may check if it's set to the right encoding at different points in time.

You'd have to test all three cases though:

  • When a system-wide encoding is set
  • When it's overriden with table
  • When the latter is overriden with field charset.

Again, I'm NOT putting this as a requirement for this submission!

@github-actions github-actions bot added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jan 18, 2026
@OsamaNabih
Copy link
Author

OsamaNabih commented Jan 20, 2026

You can try to make tha max value variable if you want. Though it may require some research.

The problem is that we have (at least) three layers of charset settings:

* system-wide

* per-table

* per-field

One overrides another. At some point, Filed would get a deduced value of charset, and perhaps Create_field would get it, too. Though it's hard to say, when exactly. Create_field has charset field in its parent Column_definition_attributes, you may check if it's set to the right encoding at different points in time.

You'd have to test all three cases though:

* When a system-wide encoding is set

* When it's overriden with table

* When the latter is overriden with field charset.

Again, I'm NOT putting this as a requirement for this submission!

That is very informative.
I was mainly focusing on the CREATE TABLE scenario so I only paid attention to the system-wide encoding, but your comment made me realize there are other scenarios where T081 applies as well.
For example ALTER, CAST, CREATE TYPE...etc
For some of these scenarios, we need to handle the situation where the table/field were already created with a different charset than the system-wide one.

To verify my understanding, we are okay with limiting the scope of this PR to the CREATE TABLE scenario?

@OsamaNabih OsamaNabih force-pushed the main branch 2 times, most recently from 17e0e30 to c6e985c Compare January 20, 2026 13:59
@OsamaNabih OsamaNabih closed this Jan 20, 2026
@OsamaNabih OsamaNabih reopened this Jan 20, 2026
@OsamaNabih OsamaNabih force-pushed the main branch 2 times, most recently from fe334ea to 7adad2c Compare January 20, 2026 14:08
@OsamaNabih OsamaNabih force-pushed the main branch 4 times, most recently from b0e32ff to 03b43df Compare February 4, 2026 20:02
@gkodinov gkodinov changed the title [MDEV-31414] Implement optional lengths for string types MDEV-31414: Implement optional lengths for string types Feb 19, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. This is a preliminary review.

Please fix some sanity checks:

  • Make sure this is a single commit and it has a proper commit message, according to CODING_STANDARDS.md.
  • Make sure there's no failing tests as a result of your change. I can see a couple in the buildbot results.

SHOW CREATE TABLE t1;
Table Create Table
t1 CREATE TABLE `t1` (
`a` text DEFAULT NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my own 2 cents here: it would look odd to people saying "a VARCHAR" and getting "a TEXT" back as a result of SHOW CREATE.

I'd recognize this in SHOW CREATE and print back VARCHAR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been discussing these approaches with @vuvova - making it substituted in show create means saving this information in frm somehow, which seems an overkill for such a slight improvement.

That is, we first agreed that this should be an auto-calculated varchar with a max possible size. But then, turns out we have to mind the field encoding to calculate the max possible field size, and moreover it won't be possible to add more than one such column.

So Serg said, ok let it be TEXT under the hood.

I haven't formed any strong opinion against being it text. It confuses me with imperfection, but other than that I'm fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not advocate one way or another. Just making some usability remarks.
But, no matter what we settle on, we at least need a very good documentation (preferably in the jira itself) on what gets stored where in what case.
I.e. what length goes into the .frm for "a VARCHAR"?
Note that this seems to be allowed syntax by the parser even today. I can e.g. write:

MariaDB [test]> create table t1 (a char);
Query OK, 0 rows affected (0.035 sec)

MariaDB [test]> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `a` char(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_uca1400_ai_ci

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read in https://mariadb.com/docs/server/reference/sql-statements/data-definition/create/silent-column-changes

If strict SQL mode is not enabled (by default, it is), a VARCHAR column longer than 65535 become TEXT, and a VARBINARY columns longer than 65535 becomes a BLOB.

That's why I though as it's already converts VARCHAR→TEXT, let's use that. At least it'll be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serg,

So you're thinking that:

CREATE TABLE t1 (a VARCHAR)

should be equivalent to

CREATE TABLE t1 (a TEXT)

?

As in, you treat a missing length as passing down a maximum length of some sort, right?

Note that right now, without any change:

CREATE TABLE t1 (a CHAR)

is equivalent to

CREATE TABLE t1 (a CHAR(1))

So the above change would not be consistent with that. And it won't be consistent with postgres too: according to the jira text, pgsql treats VARCHAR the same way as CHAR when length is missing.

This is why it seemed more logical to me to not go to TEXT.

But, again, I'm just trying to think out-loud here.

Most importantly: whatever you decide on, let's please make sure is well documented in the MDEV and into the commit itself.

Copy link
Member

@vuvova vuvova Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. I was sure that behavior was somewhere in the standard and I took it from there. But now I'm seeing

  1. If <length> is omitted, then a <length> of 1 (one) is implicit.

This is from Syntax Rules for 6.1 <data type>, Part 2: Foundation, SQL16.

Does not seem to be "implementation defined", just 1 (one). Makes everything simper, doesn't it?

Copy link
Author

@OsamaNabih OsamaNabih Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall reading that this is implementation dependent. It can be as low as 1, but I believe Postgres goes the maximum size way.

According to the docs here

https://www.postgresql.org/docs/current/datatype-character.html

If character varying (or varchar) is used without length specifier, the type accepts strings of any length

For char, they default to char(1).

If character (or char) lacks a specifier, it is equivalent to character(1).

Not saying we should do it the same way.

Allow VARCHAR, NVARCHAR, VARBINARY without length
Default to TEXT type

Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

5 participants