MDEV-31414: Implement optional lengths for string types#4551
MDEV-31414: Implement optional lengths for string types#4551OsamaNabih wants to merge 1 commit intoMariaDB:mainfrom
Conversation
|
Hi,
That is a nice guess! Our documentation claims it's 21,844 characters, though I think it's a bit outdated, since we support 4-byte encodings for some time. So yes, it's got to be UINT16_MAX/4 - header_length, where header_length is something like 3. |
FooBarrior
left a comment
There was a problem hiding this comment.
We follow the TDD paradigm - every feature should be covered with tests (and every bugfix, if possible). Include both .test and .result files in the commit.
I'm fine with a hard constant of 16383 in scope of this small work, we can always extend it and make a function of charset any time we want, since 16383 is expected to be the minimum of the maximums.
|
You can try to make tha max value variable if you want. Though it may require some research. The problem is that we have (at least) three layers of charset settings:
One overrides another. At some point, Filed would get a deduced value of charset, and perhaps Create_field would get it, too. Though it's hard to say, when exactly. You'd have to test all three cases though:
Again, I'm NOT putting this as a requirement for this submission! |
That is very informative. To verify my understanding, we are okay with limiting the scope of this PR to the |
17e0e30 to
c6e985c
Compare
fe334ea to
7adad2c
Compare
b0e32ff to
03b43df
Compare
gkodinov
left a comment
There was a problem hiding this comment.
Thanks for working on this. This is a preliminary review.
Please fix some sanity checks:
- Make sure this is a single commit and it has a proper commit message, according to CODING_STANDARDS.md.
- Make sure there's no failing tests as a result of your change. I can see a couple in the buildbot results.
| SHOW CREATE TABLE t1; | ||
| Table Create Table | ||
| t1 CREATE TABLE `t1` ( | ||
| `a` text DEFAULT NULL |
There was a problem hiding this comment.
my own 2 cents here: it would look odd to people saying "a VARCHAR" and getting "a TEXT" back as a result of SHOW CREATE.
I'd recognize this in SHOW CREATE and print back VARCHAR.
There was a problem hiding this comment.
We've been discussing these approaches with @vuvova - making it substituted in show create means saving this information in frm somehow, which seems an overkill for such a slight improvement.
That is, we first agreed that this should be an auto-calculated varchar with a max possible size. But then, turns out we have to mind the field encoding to calculate the max possible field size, and moreover it won't be possible to add more than one such column.
So Serg said, ok let it be TEXT under the hood.
I haven't formed any strong opinion against being it text. It confuses me with imperfection, but other than that I'm fine.
There was a problem hiding this comment.
I do not advocate one way or another. Just making some usability remarks.
But, no matter what we settle on, we at least need a very good documentation (preferably in the jira itself) on what gets stored where in what case.
I.e. what length goes into the .frm for "a VARCHAR"?
Note that this seems to be allowed syntax by the parser even today. I can e.g. write:
MariaDB [test]> create table t1 (a char);
Query OK, 0 rows affected (0.035 sec)
MariaDB [test]> show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`a` char(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_uca1400_ai_ci
There was a problem hiding this comment.
If strict SQL mode is not enabled (by default, it is), a VARCHAR column longer than 65535 become TEXT, and a VARBINARY columns longer than 65535 becomes a BLOB.
That's why I though as it's already converts VARCHAR→TEXT, let's use that. At least it'll be consistent.
There was a problem hiding this comment.
Serg,
So you're thinking that:
CREATE TABLE t1 (a VARCHAR)
should be equivalent to
CREATE TABLE t1 (a TEXT)
?
As in, you treat a missing length as passing down a maximum length of some sort, right?
Note that right now, without any change:
CREATE TABLE t1 (a CHAR)
is equivalent to
CREATE TABLE t1 (a CHAR(1))
So the above change would not be consistent with that. And it won't be consistent with postgres too: according to the jira text, pgsql treats VARCHAR the same way as CHAR when length is missing.
This is why it seemed more logical to me to not go to TEXT.
But, again, I'm just trying to think out-loud here.
Most importantly: whatever you decide on, let's please make sure is well documented in the MDEV and into the commit itself.
There was a problem hiding this comment.
Strange. I was sure that behavior was somewhere in the standard and I took it from there. But now I'm seeing
- If <length> is omitted, then a <length> of 1 (one) is implicit.
This is from Syntax Rules for 6.1 <data type>, Part 2: Foundation, SQL16.
Does not seem to be "implementation defined", just 1 (one). Makes everything simper, doesn't it?
There was a problem hiding this comment.
I recall reading that this is implementation dependent. It can be as low as 1, but I believe Postgres goes the maximum size way.
According to the docs here
https://www.postgresql.org/docs/current/datatype-character.html
If character varying (or varchar) is used without length specifier, the type accepts strings of any length
For char, they default to char(1).
If character (or char) lacks a specifier, it is equivalent to character(1).
Not saying we should do it the same way.
Allow VARCHAR, NVARCHAR, VARBINARY without length Default to TEXT type Signed-off-by: Osama Nabih <nabih_the_4th@yahoo.com>
Jira: https://jira.mariadb.org/browse/MDEV-31414
Adding support for SQL standard 2023's T081:
Optional string types maximum lengthThis PR allows the user to not specify a length for a
VARCHARfield. We default to making the field aTEXTtype.