Chapter 6: Advanced Tutorial (*)
This chapter will be similar to the tutorial in Chapter 3, but more direct in all its operation, and all the real complexities will be explained.
Don't miss it!
STxT reminder
What is STxT?
An STxT document is made up of a set of hierarchical nodes, and the structure of each node is defined in its corresponding namespace. This structure is achieved through indentation (tabs or spaces), which gives the document a visual form that is recognizable by humans and machines indiscriminately.
Example:
Booking (miempresa.example.demo/reserva.stxt): Reference: 093a2da1-q345-739r-ba5d-pqff98fe8j7d Date and time: 2001-11-29 13:20:00 Passenger: Name: John First last name: Smith Second last name: Adams Route: Outbound: Departure: New York Arrival: Los Angeles Departure date: 2001-12-14 Departure time: Late in the afternoon Seating preference: aisle Return: Departure: Los Angeles Arrival: New York Departure Date: 2001-12-20 Departure time: mid-morning Accommodation: Preference: none Remarks: This passenger has contracted the special privacy services, so in case of availability it is recommended to allow their access to the VIP area.
An STxT document can only have one main node, so this makes the indentation of its direct child nodes unnecessary. In addition, it is mandatory to specify the namespace of this main node, as the other nodes will be perfectly defined from it.
From the different namespaces we can build all the grammar and structure of the document; which has to be complied with in order to consider a document as correct.
Grammars
The grammar of a document defines what the different nodes in the document are like, what namespace they belong to, and what their children (or subnodes) are like.
The namespace definition is done in STXT documents, and they must have the following definition:
Namespace Definition(www.semantictext.info/namespace.stxt): Node Definition: Name: Alias: Type: Description: Child: Node: Namespace: Num:
- Name specifies the node name (in that namespace). It is also known as Canonical Name.
- Alias are name synonyms, which can be used interchangeably in its place.
- Type is the node type, and can have the following values:
- NODE: Container node for other nodes
- TEXT: Node with text content
- URL: Node with an absolute URL
- NATURAL: Node with a natural number
- INTEGER: Node with an integer
- RATIONAL: Node with a rational number
- NUMBER: Numeric node
- BINARY: Binary node
- HEXADECIMAL: Hexadecimal content node
- BASE64: Base64 content node
- BOOLEAN: Boolean content node
- Description here you can find a description of this node.
- A node, if it is of the NODE type, can have one or more children (subnodes), and they must be specified by child:
- Node: Name of the child node.
- Namespace: If the child is from a different namespace from the one in the definition, it shall be specified here.
- Num: Specifies how many elements of this node may appear.
- *: It means that there can be an indeterminate number of these children
- ?: There may be 1 or 0 of these children
- +: There must be at least 1
- number: There must be an exact number of these children
Name and alias cannot be repeated throughout the namespace. The child nodes shall also be perfectly defined, without allowing ambiguity about the namespace to which they belong.
Compaction
An STxT document can be compacted to explicitly show the levels and be faster for machines to parse. It also allows to save space, as the initial tabs and spaces are replaced by numbers.
In addition, we must use the canonical name instead of aliases. In case we do not to use the canonical name we would be talking about semicompacted documents.
In our previous example, the compacted document would have the following form:
Booking (miempresa.example.demo/reserva.stxt): 1:Reference: 093a2da1-q345-739r-ba5d-pqff98fe8j7d 1:Date and time: 2001-11-29 13:20:00 1:Passenger: 2:Name: John 2:First last name: Smith 2:Second last name: Adams 1:Route: 2:Outbound: 3:Departure: New York 3:Arrival: Los Angeles 3:Departure date: 2001-12-14 3:Departure time: Late in the afternoon 3:Seating preference: aisle 2:Return: 3:Departure: Los Angeles 3:Arrival: New York 3:Departure Date: 2001-12-20 3:Departure time: mid-morning 2:Accommodation: 3:Preference: none 2:Remarks: 3:This passenger has contracted the special 3:privacy services, so in 3:case of availability it is recommended 3:to allow their access to the VIP area.
Language subtleties
Restrictions to node names
Node names can be any way we want... or almost! There are only a few small restrictions:
- They cannot have any of the following 3 characters:
- colon character: ':'
- open parenthesis character: '('
- close parenthesis character: ')'
- A name cannot be just numeric or otherwise it could be confused with a compacted line.
By the way:
Spaces are permitted in names.
Why not? We want to look like computer techs?
Times are changing :-D
The first line will be of the type:
name of the field (name_namespace) + ':'
All other lines will be of the type:
tabs or spaces + name of the field + ':'
[+ field content if it is basic]
Case insensitive
The identifiers are CASE-INSENSITIVE!
We are going to explain a fact, that it seems no one dares to say:
In all languages which are Case-Sensitive, no one in the world (in their right mind) has ever made a document or program is only differentiated by upper- or lower-case. Why? Because it is absurd. Let’s admit it. It has only resulted in being a source of errors, and has no practical effects on increasing readability. What’s more, sometimes it would be great to allow case-insensitive to increase clarity.
It is possible that in other cases it’s justified, but in STxT it seems just the opposite. We always think about semantics, and clearly names have the same meaning in uppercase as in lowercase.
UTF-8 Coding
This problem has been intrinsic to computers for many years. That I can recall it has always been there, and has always been a source of problems. I think that STxT should have a single coding, to avoid having to say on the document which one it is. In addition, it should be easily understood in the world of Internet, so I think there is no doubt.
The documents will be coded in UTF-8
It is a good rule and I am very happy with it. Nowadays it is the most acceptable, more universal, and more implemented coding. I don't think this will change in the coming years.
Tabs or spaces
We are going to explain a suggestion for when writing documents in STxT.
It is not a rule, we'll explain why in more detail later on, but I strongly suggest that you use it.
A document’s nodes must be created using tabs. Spaces are permitted but not recommended, and mixing them is strongly not recommended.
This is a rule of thumb, and I hope that the programs and text editors used to create it follow this rule, but we will not always count with this help. For this reason, we will explain how levels are counted when tabs and spaces appear.
The rule of thumb to remember is:
"4 Spaces" = "1 tab"
Thus, when levels are counted, 1 tab moves up a level, and so do 4 spaces. But if we don’t get to 4 spaces and another character appears, the previous spaces will not move up a level. This lets us perceive visually the correct number of levels (in most editors).
Let’s give examples of counting levels:
s: space t: tab t t t t XXXX: Element XXXX, Level 4 ssss ssss ssss ssss YYYY: Element YYYY, Level 4 ssst sst st t ZZZZ: Element ZZZZ, Level 4
We see that this is consistent with most text editors; as long as it’s configured with the option "1 tab = 4 spaces".
Text (**)
Writing text in STxT seems easy... and it actually is! :-)
But (there's always a but) sometimes we want to understand exactly how it works and see the more special cases and how they are interpreted. If this is your case, read on. You are in the right place ;-)
These rules are above all focused on parsing text and computer interpretation. When we write STxT, we just have to take into account the indentation and the levels, and follow our intuition.
All about indentation
Text indentation has some subtleties that we are going to show you, and we will do so with examples.
If the line where the node begins is empty
the line break should not be considered.
This is reasonable, since it allows us to make beautiful texts, all aligned. For example, the following nodes are equivalent:
Text Node: Lorem ipsum dolor sit amet, sicing elit, sed do eiusmod tempor incididunt ut dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irur in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Text Node: Lorem ipsum dolor sit amet, sicing elit, sed do eiusmod tempor incididunt ut dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irur in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
We see that the second node looks better, and has the same content as the first. If we actually wanted the 1st line to be blank, we should do the following:
Text Node: Lorem ipsum dolor sit amet, sicing elit, sed do eiusmod tempor incididunt ut dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irur in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Another important rule is that blank intermediate lines do not have to reach the text level, and will still count as blank lines.
This is also an aesthetic and intuitive function, and we have taken it into account. Thus, the following nodes are also equivalent:
t t Text Node: t t t xxxxxxx t t t t t t zz zz t t Text Node: t t t xxxxxxx t t t t t zz zz t t Text Node: t t t xxxxxxx t t t t zz zz t t Text Node: t t t xxxxxxx t t t zz zz
They all make an xxxxxxx text, followed by a blank line and another one with zz zz.
All about comments
Comments are very useful, and we already know their operation:
Comments are lines that start with #, even if they have spaces or tabs in front of them.
But let's discuss something important. This rule is not entirely complete. It needs clarifying:
Comments are lines that start with #, even if they have spaces or tabs in front of them; provided that they are not in a text node, and they have reached their level.
Better with an example:
Text Node: aaaa bbbb cccc # A comment is not included in the text (level 0) # A comment is not included in the text (level 1) # THIS LINE IS INCLUDED!! # Already inside the text (level 2!). # It is not a comment any more. # This is not a comment either :-D dddd eeee
We see that out of the 6 lines that look like comments, only the 2 first are, with a level less than or equal to the corresponding text node.
However, outside a text node it does not matter, but is unadvisable:
Normal Node: # Comment # It is also a comment, although it does NOT look good :-( # Better to avoid these comments that are so unaligned Another node: Another node:
Text in the text
Well, this is one of the things that I like most of STxT: It lets us make text from other languages without having to parse anything!. Just keep in mind the indentation rules and that’s it! We can write whatever we want. Let’s see how many languages can say the same thing :-D
Let's make examples, since there is nothing more to add ;-)
Node with XML: <tag1> <tag2>Content!!!!</tag2> <tag2>Other content!!!!</tag2> </tag1>
Node with Wiki Text: This is a list: * Elem 1 * Elem 2 ** Elem 2.1 ** Elem 2.2
Node with Latex: \begin{equation} y_{i+1} = x_{i}^{2n} - \sqrt{5}x_{i-1}^{n} + \sqrt{x_{i-2}^7} -1 \end{equation}
Namespaces, nodes, and more nodes (**)
I'd like to give the nodes, the namespace inference, and documents with different namespaces nodes, some more thought. I suppose that at this point this issue is already clear, but in order to clarify concepts, I will repeat myself a little. I hope not to confuse you, everything is actually easy and simple. But I'm not at ease if I don't do it ;-)
What is a namespace?
A namespace is a grouping of nodes definition and the description of the namespace is always available on the Internet as an STxT document; through an access url.
For example, the "www.gym.demo/ client.stxt" namespace can define 4 node types:
- Customer
- Employee
- Account number
- Name
That's it. We have a relation where we link the namespace www.gym.demo/ client.stxt with its nodes: Customer, Employee, Account number, and Name.
And now comes the fun part. In the namespace it also says what children each node can have, and the children can be from any namespace, they do not have to be in the same one.
So, we could say that Customer has the following children:
- Name (of the same namespace, www.gymdemo.org/ client.stxt)
- Training (of another namespace with the same url base, www.gym.demo/ gymdata.stxt)
- Confidential (of another namespace with another url, www.security.demo/credentials.stxt)
We could do the same thing with all the other nodes.
What is an STxT document?
An STxT document is a set of nested nodes. The first node is the one containing the rest, and there can only be one main node. This main node specifies its namespace, and this fact makes all the other nodes have their namespace specified automatically, through all subsequent definitions.
But also, as we have seen before, each node can belong to a different namespace.
Thus, in the example we were making before, we can build the Customer document as:
Customer (www.gymdemo.org/ client.stxt): # Of the same namespace Name: # Of another namespace, without needing to specify, # the grammar is inferred Training: # Of another namespace, also without specifying Confidential:
The important thing is that the document is simple, but thanks to the grammars and definitions we know exactly what each element is, in a very simple way. And we just had to say which is the main element.
Restrictions on the child nodes
Due to the automatic namespace inference, the names (+ aliases) of the child nodes cannot collide together. If they do collide the namespace they belong to could not be properly infered.