Mini SQL 2.0

Beta 1

Language Specification

Introduction

The mSQL language offers a significant subset of the features provided by ANSI SQL. It allows a program or user to store, manipulate and retrieve data in table structures. It does not support some relational capabilities such as views and nested queries. Although it does not support all the relational operations defined in the ANSI specification, it does provide the capability of "joins" between multiple tables.
The definitions and examples below depict mSQL key words in upper case, but no such restriction is placed on the actual queries.

The Create Clause

The create clause as supported by mSQL 2 can be used to create tables, indices, and sequences. It cannot be used to create other definitions such as views. The three valid constructs of the create clause are shown below:

CREATE TABLE table_name (
col_name col_type [ not null ]
[ , col_name col_type [ not null ] ]**
)

CREATE [ UNIQUE ] INDEX index_name ON table_name (
field_name
[ , field_name ] **
)

CREATE SEQUENCE ON table_name [ STEP step_val ] [ VALUE initial_val ]

An example of the creation of a table is show below:

CREATE TABLE emp_details (
first_name char(15) not null,

last_name char(15) not null,

comment text(50),
dept char(20),
emp_id int
)

The available types are:-

char (len) String of characters (or other 8 bit data)

text (len) Variable length string of chracters (or other 8 bit data) The defined length is used to indicate the expected average length of the data. Any data longer than the specified length will be split between the data table and external overflow buffers.
Note : text fields are slower to access than char fields and cannot be used in an index nor in LIKE tests.

int Signed integer values

real Decimal or Scientific Notation real values

The table structure shown in the example would benefit greatly from the creation of some indices. It is assumed that the emp_id field would be a unique value that is used to identify an employee. Such a field would normally be defined as the primary key. mSQL 2.0 has removed support for the primary key construct within the table creation syntax although the same result can be achieved with an index. Similarly, a common query may be to access an employee based on the combination of the first and last names. A compound index (i.e. constructed from more than 1 field) would improve performance. We could construct these indices using :

CREATE UNIQUE INDEX idx1 ON emp_details (emp_id)
CREATE INDEX idx2 ON emp_details (first_name, last_name)

These indices will be used automatically whenever a query is sent to the database engine that uses those fields in its WHERE clause. The user is not required to specify any special values in the query to ensure the indices are used to increase performance.

Sequences provide a mechanism via which a sequence value can be maintained by the mSQL server. This allows for atomic operations (such as getting the next sequence value) and removes the concerns associated with performing these operations in client applications. A sequence is associated with a table and a table may contain at most one sequence.
Once a sequence has been created it can be accessed by SELECTing the _seq system variable from the table in which the sequence is defined. For example

CREATE SEQUENCE ON TABLE test STEP 1 VALUE 5
SELECT _seq FROM test

The above CREATE operation would define a sequence on the table called test that had an initial value of 5 and would be incremented each time it is accessed (i.e. have a step of 1). The SELECT statement above would return the value 5. If the SELECT was issued again, a value of 6 would be returned. Each time the _seq field is selected from test the current value is returned to the caller and the sequence value itself is incremented.
Using the STEP and VALUE options a sequence can be created that starts at any specified number and is incremented or decremented by any specified value. The value of a sequence would decrease by 5 each time it was accessed if it was defined with a step of -5.

The Drop Clause

The Drop clause is used to remove a definition from the database. It is most commonly used to remove a table from a database but can also be used for removing several other constructs. In 2.0 it can be used to remove the definition of an index, a sequence, or a table. It should be noted that dropping a table or an index removes the data associated with that object as well as the definition.
The syntax of the drop clause as well as examples of its use are given below.

DROP TABLE table_name
DROP INDEX index_name FROM table_name
DROP SEQUENCE FROM table_name

for example

DROP TABLE emp_details
DROP INDEX idx1 FROM emp_details
DROP SEQUENCE FROM emp_details

The Insert Clause

Unlike ANSI SQL, you cannot nest a select within an insert (i.e. you cannot insert the data returned by a select). If you do not specify the field names they will be used in the order they were defined - you must specify a value for every field if you do this.

INSERT INTO table_name [ ( column [ , column ]** ) ]
VALUES (value [, value]** )

for example

INSERT INTO emp_details
(first_name, last_name, dept, salary)
VALUES (`David', `Hughes', `Development','12345')

INSERT INTO emp_details
VALUES (`David', `Hughes', `Development','12345')

The number of values supplied must match the number of columns.

The Select Clause

The SELECT offered by mSQL lacks some of the features provided by the standard SQL specification. Development of mSQL 2 is continuing and some of this missing functionality will be made available in the next beta release. At this point in time, mSQL's select does not provide

Nested selects
Implicit functions (e.g. count(), avg() )

It does however support:

Joins - including table aliases
DISTINCT row selection
ORDER BY clauses
Regular expression matching
Column to Column comparisons in WHERE clauses
Complex conditions

The formal definition of the syntax for mSQL's select clause is

SELECT [table.]column [ , [table.]column ]**
FROM table [ = alias] [ , table [ = alias] ]**
[ WHERE [table.] column OPERATOR VALUE
[ AND | OR [table.]column OPERATOR VALUE]** ]

[ ORDER BY [table.]column [DESC] [, [table.]column [DESC] ]

OPERATOR can be <, >, =, <=, >=, <>, LIKE, RLIKE or CLIKE
VALUE can be a literal value or a column name
Where clauses may contain '(' ')' to nest conditions e.g. "where (age < 20 or age > 30) and sex = 'male'" .
A simple select may be

SELECT first_name, last_name FROM emp_details
WHERE dept = `finance'

To sort the returned data in ascending order by last_name and descending order by first_name the query would look like this

SELECT first_name, last_name FROM emp_details
WHERE dept = `finance'
ORDER BY last_name, first_name DESC

And to remove any duplicate rows from the result of the select, the DISTINCT operator could be used:

SELECT DISTINCT first_name, last_name FROM emp_details
WHERE dept = `finance'
ORDER BY last_name, first_name DESC

mSQL provides three regular expression operators for use in where comparisons. The standard SQL syntax provides a very simplistic regular expression capability that does not provide the power nor the flexibility UNIX programmers or users will be accustomed to. mSQL supports the "standard" SQL regular expression syntax, via the LIKE operator, but also provide further functionality if it is required. The available regular expression operators are:

LIKE - the standard SQL regular expression operator.
CLIKE - a standard LIKE operator that ignores case.
RLIKE - a complete UNIX regular expression operator.

Note : CLIKE and RLIKE are not standard SQL and may not be available in other implementations of the language if you decide to port your application. They are however very convenient and powerful features of mSQL.

The regular expression syntax supported by the LIKE and CLIKE operators is that of standard SQL and is outlined below

`_' matches any single character

`%' matches 0 or more characters of any value
`\' escapes special characters (e.g. `\%' matches % and `\\' matches \ )

all other characters match themselves

As an example of the LIKE operator, it is possible to search for anyone in the finance department who's last name consists of any letter followed by `ughes', such as Hughes. The query to perform this operation could look like

SELECT first_name, last_name FROM emp_details
WHERE dept = `finance' and last_name like `_ughes'

The RLIKE operator provides access to the power of the UNIX standard regular expression syntax. The UNIX regular expression syntax provides far greater functionality than SQL's LIKE syntax. The UNIX regex syntax does not use the '_' or '%' characters in the way SQL's regex does (as outlined above). The syntax available in the RLIKE operator is

'.' matches any single character

'^' When used as the first charactr in a regex, the caret character forces the match to start at the first character of the string

'$' When used as the last charactr in a regex, the dollar sign forces the match to end at the last character of the string

'[ ]' By enclosing a group of single characters withing square brackets, the regex will match a single character from the group of characters. If the ']' character is one of the characters you wish to match you may specifiy it as the first character in the group without closing the group (e.g. '[]abc]' would match any single character that was either ']', 'a', 'b', or 'c'). Ranges of characters can be specified within the group using the 'first-last' syntax (e.g. '[a-z0-9]' would match any lower case letter or a digit). If the first charactr of the group is the '^' character the regex will match any single character that is not contained within the group.

'*' If any regex element is followed by a '*' it will match zero or more instances of the regular expression.

The power of a relational query language starts to become apparent when you join tables together during a select operation. Lets say you had two tables defined, one containing staff details and another listing the projects being worked on by each staff member, and each staff member has been assigned an employee number that is unique to that person. You could generate a sorted list of who was working on what project with a query like:

SELECT emp_details.first_name, emp_details.last_name, project_details.project
FROM emp_details, project_details
WHERE emp_details.emp_id = project_details.emp_id
ORDER BY emp_details.last_name, emp_details.first_name

mSQL places no restriction on the number of tables "joined" during a query so if there were 15 tables all containing information related to an employee ID in some manner, data from each of those tables could be extracted, by a single query. One key point to note regarding joins is that you must qualify all column names with a table name. mSQL does not support the concept of uniquely named columns spanning multiple tables so you are forced to qualify every column name as soon as you access more than one table in a single select.
mSQL also supports table aliases so that you can perform a join of a table onto itself. This may appear to be an unusual thing to do but it is a very powerful feature if there are rows within a single table relate to each other in some way. An example of such a table could be a list of people including the names of their parents. In such a table there would be multiple rows with a parent/child relationship. Using a table alias you could find out any grandparents contained in the table using something like

SELECT t1.parent, t2.child from parent_data=t1, parent_data=t2
where t1.child = t2.parent

The table aliases t1 and t2 both point to the same table (parent_data in this case) and are treated as two different tables that just happen to contain exactly the same data.

The Delete Clause

The SQL DELETE construct is used to remove one or more entries from a database table. The selection of rows to be removed from the table is based on the same where construct as used by the SELECT clause. The syntax for mSQL's delete clause is

DELETE FROM table_name
WHERE column OPERATOR value
[ AND | OR column OPERATOR value ]**

OPERATOR can be <, >, =, <=, >=, <>, LIKE, RLIKE, or CLIKE
for example
DELETE FROM emp_details WHERE emp_id = 12345

The Update Clause

The SQL update clause is used to modify data that is already in the database. The operation is carried out on one or more rows as specified by the where construct. The value of any number of fields on the rows matching the where construct can be updated. mSQL places a limitation on the operation of the update clause in that it cannot use a column name as an update value (i.e. you cannot set the value of one field to the current value of another field). Only literal values may by used as an update value. The syntax supported by mSQL is

UPDATE table_name SET column=value [ , column=value ]**
WHERE column OPERATOR value
[ AND | OR column OPERATOR value ]**

OPERATOR can be <, >, =, <=, >=, <>, LIKE, RLIKE or CLIKE
for example
UPDATE emp_details SET salary=30000 WHERE emp_id = 1234

char (len)	String of characters (or other 8 bit data)
text (len)	Variable length string of chracters (or other 8 bit data) The defined length is used to indicate the expected average length of the data. Any data longer than the specified length will be split between the data table and external overflow buffers. Note : text fields are slower to access than char fields and cannot be used in an index nor in LIKE tests.
int	Signed integer values
real	Decimal or Scientific Notation real values

`_'	matches any single character
`%'	matches 0 or more characters of any value
`\'	escapes special characters (e.g. `\%' matches % and `\\' matches \ )
	all other characters match themselves

'.'	matches any single character
'^'	When used as the first charactr in a regex, the caret character forces the match to start at the first character of the string
'$'	When used as the last charactr in a regex, the dollar sign forces the match to end at the last character of the string
'[ ]'	By enclosing a group of single characters withing square brackets, the regex will match a single character from the group of characters. If the ']' character is one of the characters you wish to match you may specifiy it as the first character in the group without closing the group (e.g. '[]abc]' would match any single character that was either ']', 'a', 'b', or 'c'). Ranges of characters can be specified within the group using the 'first-last' syntax (e.g. '[a-z0-9]' would match any lower case letter or a digit). If the first charactr of the group is the '^' character the regex will match any single character that is not contained within the group.
'*'	If any regex element is followed by a '' it will match zero or more* instances of the regular expression.