Information Technology Environment — Ember

Introduction

This section documents the Ember computing environment: a centralized append-only information store, a computer operating system integrated with that information store, and related specifications. This is a work-in-progress draft, and everything here is subject to change and is not presently suited for implementation.

Overview

The computing environment will consist of the following components:

Development principles

Prerequisites for code to be added to the repository

How issues should be prioritised

Ordered from highest priority to lowest priority

  1. Security vulnerabilities
  2. Functional regressions
  3. Incorrect results
  4. Crashes and similar critical usability issues
  5. Slow code with a significant impact on usability
  6. Aesthetic regressions
  7. Minor usability issues
  8. Slow code with a moderate impact on usability
  9. Missing features

Data formats

Ember Language

Objective

Develop a machine-readable language that can be source-to-source translated into other languages. Possible target languages to investigate include NQP, C--, C, Qt, and JavaScript.

Language profiles

Ember Language programs may optionally declare a non-default language profile to use: Core, Basic, and Dangerous (the default is "Standard"). Core and Basic both restrict the program to a subset of the language. The Basic language interpreter is written using the Core subset of the language, and provides useful shortcuts to use in the development of the interpreter for the Standard profile. The Default language interpreter is written using the Basic subset of the language. The Dangerous profile allows using language features which are probably a bad idea to use, but may be needed in some cases.

Dcs

The core unit of the Ember Language is the Dc (Document Character). The defined Dcs are listed in DcData.csv.

Reading DcData.csv

DcData.csv contains nine columns, each of which gives some information about a given Dc.

From left to right, the columns are: ID, Name, Combining class, Bidirectional class, Simple case mapping, Type, Script, Details, and Description.

The "ID" column specifies the number used to refer to a given Dc. Once an ID has been specified in a stable version, its meaning will not change in future versions.

The "Name" column specifies an informative name for the Dc. The names may change in future versions if the current names seem suboptimal. They should not be relied on as unique identifiers. If a name is prefixed with "!", then that Dc is deprecated.

"Combining class" column: See below.

"Bidirectional class" column: See below.

"Simple case mapping" column: This column contains the ID of the uppercase form of characters with the "Ll" type, and the ID of the lowercase form of characters with the "Lu" type.

"Type" column: See below.

The "Script" column indicates the script or other set to which the character belongs. Values needing further explanation include "Semantic", "DCE", "DCE sheets", "Noncharacters", "DCE versions", "Encapsulation", "EL Syntax", "EL Routines", and "EL Types".

The "Details" column contains various additional information about characters, as a comma-separated list. List entries beginning with ">" are cross-references to related Dcs. List entries beginning with "<" are decompositions. List entries beginning with "(" indicate the syntax (parameter type signatures) for Ember Language routines. List entries beginning with ":" indicate the required syntax for the given Dc, using a form similar to regular expressions: a bracketed list of Dcs [] indicate a set of possible Dcs, + indicates 1 or more of the preceding item, a bracketed list of Dcs with a ^ at the beginning indicates an inversion of the set, a Dc ID in brackets with a colon before the closing brackets indicates any syntactically correct sequence of Dcs beginning with the enclosed Dc ID, and "~" represents the Dc the syntax of which is being defined. The remaining list entries are aliases (alternate names for the characters, for ease of look-up).

The "Description" column contains additional comments regarding the Dc.

Three columns' contents are directly inherited from the Unicode Standard: Combining class (inherits Unicode's "Canonical_Combining_Class property"), Bidirectional class (inherits Unicode's "Bidi_Class" property), and Type (inherits Unicode's "General_Category" property). The "Simple case mapping" and "Script" columns should also be inherited from Unicode in some manner, but are not at present. For characters not included in Unicode, a reasonable value is chosen in the pattern of the values used by Unicode. If there are discrepancies between this value and Unicode's value for a given character that is in both sets, this should be reported as an error in the Ember Language standard. Unicode's values should take precedence.

"Type" column values also extend the Unicode Standard's possible values with the "!Cx" category, denoting characters that do not fit neatly into Unicode's existing categories.

Notes on specific Dcs
Dcs 241–245: Mode indicators

Inclusion of the mode indicators in documents is optional. The selected mode expresses information about the document's expected execution environment. These modes are shortcuts that set up the environment in advance so that the document does not need to contain specific code to set up these contexts. This lets the resulting documents more concise and readable.

Dcs 246–255: Source formatting control

Dcs 246 through 255 control the formatting of the EM format version of a document.

Document formats

There are six file formats defined by this specification. Four of them (EMD, EM, EMS, and DEMS) are general-use formats, while the fifth (EMR) is a special-purpose subset of the EMB format. EMS and DEMS are intended as an intermediate, more-readable format between EMD and EMB, and are not intended for information interchange (they are much larger than the other formats for a given document, in general).

To allow backward compatibility, once a completed version of this standard has been released, the meaning of any given Dc will not change. That will ensure that existing documents retain their meaning.

There is a one-to-one correspondence between EMD, EM, EMS, and DEMS files (for any given document in one of those formats, there is only one way to represent it in the other formats), but not for EMR files (because EMR files can only represent a subset of Ember Language documents). That means that documents can be losslessly round-trip-converted between those four formats.

EMD, EM, and EMS files are subsets of ASCII text files, with lines delimited by 0x0A (line feed). Bytes 0x00 through 0x09, 0x0B through 0x1F, and 0x7F through 0xFF (all ranges inclusive) are disallowed. Files must end with 0x0A. This may later be changed to use UTF-8.

At the end of each format's summary (except for EMR), a simple "Hello, World!" document is given in the format.

Ember Language documents (EMD), .emd

Ember Language documents are a list of Dcs. The Dcs mappable to the permitted ASCII characters are represented by those ASCII characters, with the exception of 0x40 "@" (Dc 1). All other Dcs are represented by "@b@" followed by the integer Dc ID followed by "@e@", such that, for instance, "@" would be represented as "@b@1@e@".

Hello, World!
Ember Language source files (EM), .em

Ember Language source files are a programming language–inspired representation of Ember Language documents. It is the most readable of the formats, but also the most technically complex.

dc:
    Hello, World!

or more idiomatically (but not the exact equivalent of the others in terms of the Dcs used),

print 'Hello, World!'

which would be

256 258 260 262 # . . . .
264 263 57 86 # . . H e
93 93 96 30 # l l o ,
18 72 96 99 # . W o r
93 85 19 261 # l d ! .
259 # .

in Dcs.

Ember Language sequence files (EMS), .ems

A list of Dc numbers. Four Dcs are given per line, separated by spaces.

57 86 93 93
96 30 18 72
96 99 93 85
19
Documented Ember Language sequence files (DEMS), .dems

A variant of the EMS format for easier reading: after each line, the printable ASCII equivalent of each Dc is given following 0x202320, each separated from the next by a space. If there is no printable ASCII equivalent, or the character is a space, "." is used instead.

57 86 93 93 # H e l l
96 30 18 72 # o , . W
96 99 93 85 # o r l d
19 # !
Ember Record Documents (EMR), .emr

This is a special format in the "Structured" mode used for structured record storage in the Ember cloud. It is not yet defined, but will most likely be a subset of one of the other formats.

Structures in the Ember Language

The Ember Language uses the following main types of entity to represent information. They are:

Project
A Project is a single document, and if relevant, any other documents maintained as part of that document.
Module
A Module is one or more Library-mode documents that have a package name for addressing the things they provide.
Routine
A Routine is a set of instructions for a computer to follow as part of the process of interpreting a document. Similar concepts are known as functions or subroutines in most programming languages. Similar concepts are known as methods when used within objects in most programming languages.
Operator
An Operator is a short notation or syntax pattern for some common Routines (e.g., Number a + Number b in place of add(Number a, Number b), or if true; then print 'Hello, World!'; else die in place of if(true, {print 'Hello, World!'}, {die})).
Identifier
An Identifier is a name for an entity.
Structure
A Structure is the definition of what the structure is that an entity can have, similar to type definitions or type signatures in some programming languages.
Statement
A Statement is a logical line of a document. It can be an invocation of a Routine, or a Declaration of an entity's Structure or value.
Type
Types are templates describing the structure of Objects. They are known as classes in most programming languages.
Object
An Object is an entity that conforms to a given Type (an instance of that Type).