A Better Query Language: BQL Language Specification
For some reason or another, it seems to me that SQL has always taken a back seat in the language economy. I have used SQL for years, and very little has changed.
After working on a couple of relatively large projects which use SQL quite a bit, I have slipped into a bit of a love-hate relationship with SQL.
SQL has lot's of beautiful features. SQL is already a fairly high-order language; a tremendous amount of implementation detail is left invisible to the programmer. In terms of spewing data out of a data store, it can be impressively efficient.
But somehow it just still isn't quite there. Right?
Most developers I know seem to avoid SQL like the plague... and that's too bad.
One day I started thinking about some of the specific hardships I've had with maintaining a large SQL codebase, and something dawned on me...
Why hasn't anyone created more convenient languages that transpile into SQL?
There is CoffeeScript, Dart, Typescript, etc for JavaScript.
There is LESS and SCSS for CSS.
Why not something for SQL?
So I played the devil's advocate and said: "What if SQL was a compile target?"
In doing so, I found myself wanting a language that:
- Strict superset of SQL. This is important because it promotes a clean transition. One can move migrate their SQL codebase incrementally since all valid SQL is also valid in this language.
- Promotes good and efficient SQL practices
- Promotes keeping SQL code DRY
- Transpiles into easily readable SQL
For now I will call "BQL" for "A Better Query Language". I have taken a stab at outlining below a rough specification for what I think would be some positive features to add, while adhering to the constraints above.
To summarize what I see as "benificial" to a language such as this, I would put as the following:
Key Language Benefits
- notion of "projections" to keep SQL efficient, and keep code DRY
- inner joins are inferred with simplified syntax
- reordered select/from clauses to make autocomplete context more helpful
- queries can be stored syntactically and referenced in other statements to help readability, logical separation, and DRYness
- strict superset, so all SQL is already valid
- syntactic niceties with variable declaration / boolean logic / functions / etc.
Language Specification
The select clause can be anywhere in a statement
In TSQL, the select clause of a select statement is done first, which makes it logically difficult to provide meaningful autocomplete suggestions for field names currently available in the context, since that is defined afterwards in the from and join clauses.
In BQL, the select is valid at both the beginning and the end of the query.
For example, the following BQL statements are equivalent:
from student
where status = 'enrolled'
select id, name, grade
select id, name, grade
from student
where status = 'enrolled'both of which will compile into the following SQL:
SELECT
id,
name,
grade
FROM student
WHERE status = 'enrolled'With the presence of foreign keys, join conditions are implied
Provided there is a single foreign key relationship defined unambiguously between two tables, the foreign key relationship is chosen implicitly as the default join condition.
For instance, let's say we have the following simplified tables:
DECLARE TABLE post (
id INT, -- primary key
/* ... */
)
DECLARE TABLE comment (
id INT, -- primary key
postId INT, -- foreign key to post.id
/* ... */
)The following BQL statement:
from post
inner join comment
select *compiles to
SELECT *
FROM post p
INNER JOIN comment c
ON c.postId = p.idWhen multiple tables are joined in a single statement, the table with which you want to join may become ambiguous, and an optional with clause can be added to the join:
For instance, we may have the following tables:
DECLARE TABLE user (
id INT, -- primary key
/* ... */
)
DECLARE TABLE post (
id INT, -- primary key
authorId INT, -- foreign key
/* ... */
)
DECLARE TABLE comment (
id INT, -- primary key
postId INT, -- foreign key to post.id
authorId INT, -- foreign key to user.id
/* ... */
)then the following BQL statement
from post
inner join user u1 with post
inner join comment
inner join user u2 with comment
select *compiles to:
SELECT *
FROM post p
INNER JOIN user u1
ON post.authorId = user.id
INNER JOIN comment c
ON comment.postId = post.id
INNER JOIN user u2
ON comment.authorId = user.idFurthermore, in some cases there are multiple foreign keys from one table to a single table. In this case, the join condition is ambiguous until you clarify which foreign key relationship you would like to use. As a result, one can use the on clause, however, unlike it's SQL counterpart, only the column in the "foreign" table is needed.
To demonstrate, provided we have the following tables:
DECLARE TABLE user (
id INT, -- primary key
/* ... */
)
DECLARE TABLE message (
id INT, -- primary key
senderId INT, -- foreign key with user.id
recipientid INT, -- foreign key with user.id
/* ... */
)Thus, the following BQL query:
from message m
inner join user s on senderId
inner join user r on recipientId
select *compiles into
SELECT *
FROM message m
INNER JOIN user s ON m.senderId = s.id
INNER JOIN user r ON m.recipientId = r.idShort-cutting Join Conditions in where clauses...
Joins are commonly used simply to perform a lookup and return the corresponding rows of one table. Provided there is a unique foreign key relationship between two tables, the lookup can be performed by a simple condition in the where clause like so:
from user
where user:country.name = 'Canada'
select idwhich compiles into
SELECT u.id
INNER JOIN country c
ON c.id = u.country_id
WHERE c.name = 'Canada'Here the {fk table name}:{pk table name} syntax is used to identify the join while making the condition more easily legible.
Table Projections
Table projections are a construct in BQL which is not present in standard SQL. A table projection is simply a named collection of field names which can be used (and thus re-used) in the select clause of any query.
For instance, one can define a projection as follows:
projection [contact_info] {
firstName,
lastName,
phoneNumber,
email
}and thus, the BQL query
from dbo.users
select <contact_info>compiles into the following SQL:
SELECT
firstName,
lastName,
phoneNumber,
email
FROM dbo.usersFurther, I can mix selecting Projections and field names if needed:
from dbo.users
select id, <contact_info>compiles into:
SELECT
id,
firstname,
lastName,
phoneNumber,
email
FROM dbo.usersThe projection can be prefixed by the table name or table alias if you are projecting a specific table in a join...
for instance, provided we have the following projections:
projection abc { a, b, c }
projection def { d, e, f }this BQL statement
from tblX x
inner join tblY y
on x.id = y.xid
select x.<abc>, y.<def>compiles into
SELECT
x.a,
x.b,
x.c,
y.d,
y.e,
y.f
FROM tblX X
INNER JOIN tblY y
ON x.id = y.idThe special <$all>, <$keys> Projection
Every table by default has an "include all" projection. This does not need to be defined anywhere, and is pragmatically equivalent to the "*" operator in normal SQL, except for the fact that the compiled SQL will have all of the table's fields enumerated explicitly in the compilation step.
Given we have the following table definitions:
declare table abc (
id int, -- (primary key)
f1 int,
f2 varchar(20),
f3 datetime
)
declare table def (
id int PRIMARY KEY,
abcId int, -- (foreign key to tbl1.id)
f4 varchar(20),
f5 datetime
)The following projections are implicitly created for these tables:
-- for table "abc"
projection $all { id, f1, f2, f3 }
projection $keys { id }
-- for table "def"
projection $all { id, abcId, f4, f5 }
projection $keys { id, abcId }and thus, the following BQL statements:
from abc select <$all>
from abc
inner join def
on abc.id = def.abcId
select abc.<$keys>, def.<$keys>compile into
SELECT
id,
f1,
f2,
f3
FROM abc
SELECT
abc.id,
def.id,
def.abcId
FROM abc
INNER JOIN def
ON abc.id = def.abcIdSimilar to select lists, you can also specify computed columns as well as field name aliases
projection foo {
id as userId,
'{{first}} {{last}}' as displayName
}where the projection takes:
from users
select <foo>and compiles it into
SELECT
id as userId,
first + ' ' + last as displayName
FROM usersProjection definitions can also reference other projections:
projection fullName {
'{{first}} {{last}}' as fullName
}
projection user {
id,
email,
<fullName>
}and thus:
from users
select <user>compiles into
SELECT
id,
email,
first + ' ' + last as fullName
FROM usersNotes about projections:
- they can be thought of as ad-hoc interfaces. They can be applied to any table, but will be a SQL compile error if the proper field names are not present on the table
- one can select multiple projections, comma delimited. in this case, common field names between the projections will be repeated... (or should they just be indicated once???)
- if projecting a table alias
a, it would beselect a.<contact_info> - if a delimited name is needed, it would be
select <[projection name]> - the above example uses another feature of BQL discussed below, string interpolation.
Boolean values first class citizens
BQL has true and false as new keywords which translate into a SQL bit data type.
For instance, the following BQL:
select trueCompiles into the following SQL:
DECLARE @true BIT = 1;
SELECT @trueIf any statement contains usage of the boolean values true or false, it will be transpiled into a local variable reference which is initialized at 1 or 0 accordingly. This prevents some common ambiguities found between the BIT value of 1 and the INT value of 1.
Further, many expressions are automatically converted into boolean expressions if evaluated in an if structure or a where clause.
Note: more needs to be discussed on first-class suport of booleans in BQL.
Setting variables
no need for SET... ie, the line @a = 1 => SET @a = 1
Control-flow Logic
BQL introduces the usage of curly brackets as natural scope boundaries, similar to (but not equivalent to) SQL's BEGIN ... END syntax.
Brackets as scope boundaries
instead of BEGIN ... END we would have brackets. ie,
if ( /* boolean expression */ ) {
/* expression */
} else {
/* expression
}Variable declaration simplified...
-- implicit type
var @a = 'abc',
@b = 123
-- explicit type
varchar(20) @a;
int @b;
datetime @c;Instead of:
DECLARE @a VARCHAR(MAX) = 'abc'
DECLARE @b INT = 123
DECLARE @a VARCHAR(20)
DECLARE @b INT
DECLARE @c DATETIMEBetter implicit type coercion...
print 'abc' + 123 // prints 'abc123'compiles to
PRINT 'abc' + CAST(123 AS VARCHAR)Note: this should be thought about a little more. I like this, but it could actually cause existing (valid) SQL to break where the original intention different.
C-style for loops
for(int @i = 0; @i < @length; @i++) {
/* expressions */
}compiles into
DECLARE @i INT = 0
WHILE @i < @length BEGIN
/* expressions */
SET @i = @i + 1
ENDC# style foreach
foreach(var @row in users) {
if( @row.id % 2 ) {
print '{{row.first}} {{row.last}}'
}
/* expressions */
}compiles into
DECLARE @id INT,
@first VARCHAR(50),
@last VARCHAR(50)
DECLARE cursor_1 CURSOR FOR
SELECT
id,
first,
last
FROM users
OPEN cursor_1;
FETCH NEXT FROM cursor_1
INTO @id, @first, @last;
WHILE @@FETCH_STATUS = 0
BEGIN
IF NOT( @id % 2 = 0 ) BEGIN
PRINT @first + ' ' + @last
END
/* expressions */
FETCH NEXT FROM cursor_1
INTO @id, @first, @last;
END
CLOSE cursor_1;
DEALLOCATE cursor_1;Note: might want to talk about how this promotes precedural flow in SQL which isn't necessarily a good thing...
String interpolation
var @name = 'john doe'
-- prints 'my name is john doe'
print 'my name is: {{name}}'Note: we could adopt something different than this sort of "mustache-style" here. We might want to think more about this. This is also potentially the most obvious breach of being a "strict superset" of SQL.
Local scoping rules
var @i = 1;
if (true) { // new variable scope...
var @i = 2;
print @i // prints 2
}
print @i // prints 1this would compile into:
DECLARE @true BIT = 1,
@i INT = 1,
@i_two INT;
IF (@true = @true) BEGIN
SET @i_two = 1;
PRINT @i_two;
END
PRINT @iyou can also declare local scope by just writing code inside brackets:
{
-- in this scope, @i is an int
var @i = 123;
}
{
-- in this scope, @i is of type varchar
var @i = 'abc'
}Ternary operator
select @a > @b ? '123' : '456'compiles into:
SELECT CASE WHEN @a > @b THEN '123' ELSE '456' ENDNull coalescing operator
from students
select name ?? 'N/A'compiles into:
SELECT ISNULL(name, 'N/A')
FROM studentsBoolean operators
if ( (@a || @b) && !@c) {
/* expressions */
}compiles into
DECLARE @true BIT = 1,
@false BIT = 0;
IF ( (@a = @true OR @b = @true) AND @c = @false ) BEGIN
/* expressions */
ENDMethod notation on various datatypes
various datatypes would have various "methods" defined for them which simply compile into calls for the global method...
for instance, everything would have a .toString() method...
var @i = 1234;
var @s = ' AbcdEf ';
datetime @d = GETUTCDATE();Methods for char, varchar, nchar, and nvarchar
- trim()
- ltrim()
- rtrim()
- toLowerCase()
- toUpperCase()
- toTitleCase() - not sure this is possible/practical
- padLeft()
- padRight()
- pad()
- contains(string s)
- endsWith(string s)
- startsWith(string s)
- substring
- split(string seperator) - this would return a table... could be interesting
- toCharSet() - not sure if this is possible/practical, but could be a cool feature as well.
- indexOf()
calling methods on variables would work pretty much how you would expect:
@s.trim()compiles into
LTRIM(RTRIM(@s))and so on...
Methods for datetime and date
- addDays(numerical d)
- addHours(numerical d)
- addMinutes(numerical d)
- addMilliseconds(numerical d)
- addMonths(numerical d)
- addSeconds(numerical d)
- addTicks(numerical d)
- addYears(numerical d)
- toOADate()
- toString(string format)
- subtract(datetime d) - need to think about this more... how is a timespan represented in SQL?
- add(datetime d)
- dayOfMonth()
- dayOfYear()
- dayOfWeek()
- hour()
- minute()
- millisecond()
- ticks()
- second()
- year()
- date()
note: i'd like to think more about functionality that can be had around UTC time / local time conversion and such. Could be a big win for this language if implemented in a way that prevented people from doing bad practices...
Methods on int, bigint, tinyint, decimal, float
- toString(string format)
Datetime declaration
a convention for declaring date times will exist, but i'm not sure on the syntax.
could be: var @dt = #1/12/2014# (this is kind of similar to an MS Access convention).
could also be var @dt = &'1/12/2014'
Regular expressions
var @a = /(s|o)me?reg[ular]ex[pression]/.replace('abcd','');
var @islower = !/[A-Z]/.match(@test);Note: I'd like to do more thinking here on how we could bring more native support to regular expressions to SQL. This will largely depend on the target (ie, T-SQL, PL/SQL, etc.)
Explicit type casting
select (varchar)123compiles into
select CAST(123 as VARCHAR)+= and -= operators
var @i = 1,
@s = 'abc';
@i += 1
@s += 'def'compiles to
DECLARE @i INT = 1,
@s varchar = 'abc';
SET @i = @i + 1;
SET @s = @s + 'def';Paging syntax
The offset and limit keywords will be used for paging, and will
satisfy all SQL targets appropriately.
For instance:
from tbl
order by a
select a, b, c
offset 100
limit 20for SQL Server 2008 would compile into
;WITH tbl_paged AS
(
SELECT
a,
b,
c,
ROW_NUMBER() OVER (ORDER BY a ASC) as __row_number
FROM tbl
)
SELECT
a,
b,
c
FROM tbl_paged
WHERE __row_number BETWEEN 100 AND 100 + 20Versus MySQL where it would simply compile into:
SELECT
a,
b,
c
FROM tbl
OFFSET 100
LIMIT 20Of course, we are also able to use variables or scalar expressions here
from tbl
order by a
select a, b, c
offset (@page - 1) * @size
limit @sizecompiles into
;WITH tbl_paged AS
(
SELECT
a,
b,
c,
ROW_NUMBER() OVER (ORDER BY a ASC) as __row_number
FROM tbl
)
SELECT
a,
b,
c
FROM tbl_paged
WHERE __row_number BETWEEN (@page - 1) * @size + 1 AND @page * @sizeNOTE: the compilation of these keywords would be different depending on the target... different SQL engines have different support for paging already.
Implicit "result sets" as local variables
Result set variables can be declared as you would any other query select statement
var @results = from tbl1 select a, b, c where a < 20;Result set variables have a couple of unique properties that temp tables and table variables do not have...
For example, you can...
Coerce a resultset variable into a boolean as a shortcut for EXISTS
var @results = from tbl1 select a, b, c where a < 20;
if (@results) {
/* expressions */
}this will compile into:
DECLARE @results TABLE (
a INT,
b VARCHAR(20),
c VARCHAR(255)
)
INSERT INTO @results ( a, b, c )
SELECT
a,
b,
c
FROM tbl1
WHERE a < 20
IF EXISTS(SELECT TOP 1 * FROM @results)
BEGIN
/* expressions */
ENDGet a resultset variable's number of rows
calling a .length method on a resultset variable is shorthand for getting the total count
var @results = from tbl1 select a, b, c where a < 20;
var @count = @results.lengthcompiles into
DECLARE @results TABLE (
a INT,
b VARCHAR(20),
c VARCHAR(255)
)
DECLARE @count INT
INSERT INTO @results ( a, b, c )
SELECT
a,
b,
c
FROM tbl1
WHERE a < 20
SELECT @count = COUNT(*)
FROM @resultsin certain cases where a resultset variable can be optimized into a CTE, it will do so. This is the case when the query is referenced in only one statement.
var @results = from tbl1 select a, b, c where a < 20;
-- here we only use results in *one* statement
from @results
where a > 10
select a, b, ccompiles into
;WITH results AS
(
SELECT
a,
b,
c
FROM tbl1
WHERE a < 20
)
SELECT
a,
b,
c
FROM results
WHERE a > 10note: some of the above examples would also allow and utilize this optimization, but the compiled output shown is that of the more general case (table variables) simply for clarity of what would normally happen.
In other cases, the resultset variable will be created as a table variable
var @results = from tbl1 select a, b, c where a < 20;
var @count = @results.lengthResultsets as partial queries
Resultset variables can be useful in taking complex queries and compartmentalizing them into digestable parts without sacrificing performance by storing the results into temp tables, or going through the unnecessary overhead of creating a view to be used just once:
-- notice how no "select" clause is needed here...
var @unpaid_cust = from customers where status = 'unpaid'
from invoices i
inner join @unpaid_cust
where i.date_sent != null
select invoices.<$all>compiles into
SELECT
i.id,
i.field1,
/* ... */
FROM invoices i
INNER JOIN customers c
ON i.customer_id = c.id
WHERE
c.status = 'unpaid' AND
i.date_sent IS NOT NULLAlthough this is a smple example, it should be clear how you can take a very complex query with several joins and partialize it into logically separate pieces, letting BQL compile it into one final efficient query.
Mixins: Inline Functions
simple inline functions:
var @postUrl = mixin(@id, @slug) {
return 'http://my-blog.com/post/{{@id}}/{{slug}}'
}
from post
select id, @postUrl(id,slug) as urlcompiles into
SELECT
id,
'http://my-blog.com/post/' + CAST(id as varchar) + '/' + slug
FROM postsor...
var @isEmail = mixin(@email){
return @email.contains('@')
}
from users
where @isEmail(email)
select id, emailcompiles to
SELECT id, email
FROM users
WHERE CHARINDEX('@',email) > 0Schema scope declaration
you can use a schema scope declaration in order to default to certain databases and schema
schema abc {
from tbl1
select a, b, c
}compiles to:
SELECT
a,
b,
c
FROM abc.tbl1if no schema scope is used, the schema will instead be the database default schema. This is essentially like locally "overriding" the database default schema...
Some ideas I haven't thought through yet...
-
support for more complicated aggregate functions? For example a
JOINorCSVaggregate function based on the XML expand functionality. -
what about boolean logic w/ nulls? could we allow
if (field_name = null)in replacement ofIF (field_name IS NULL)? -
We could have some javascript-like type coersion? For example, could
!field_namebe converted tofield_name is null OR field_name = ''and similar tests?
Next steps...
Languages are complicated things, and I think in order to be successful, must be the product of many minds. I'd like for any and all feedback (or help) on producing this language spec, as well as a parser/compiler for it. If you stumbles upon this article because you have similar wishes for such a language to exist, please consider contributing.
I've opened a GitHub repo where I've started work on the parser and grammar. Right now I'm trying out Irony to build an AST, as it makes building into the Visual Studio toolchain very easy, but the Irony grammar is very easy to translate into BNF if we want to try other AST Parsers as well.
At the very least, share your opinions below.
Thank you.
Leland Richardson
Personal blog of Leland Richardson.
Member of Technical Staff at Anthropic. Previously Google, Airbnb.
I like learning, discussing, and diving into challenges.
