# Structure of the project

Under 100 files (54). Under 10k lines of code (7k). No comments (0). The code for the language server relies on the Xtext framework. It consists of two projects, m and m.ide. The project m holds all the information for running the compiler in headless mode. This includes the grammar specifying the lexer and parser rules, the validator, formatter, linker and code generators. In fact, m consists of 7 languages and translations, one for each kind of supported file type. A language for JSON, XML, YAML, C#, and of course M and all its variants in different natural languages. Google's Guice dependency injection framework wires every component of the language. The IDE components like code completion, syntax highlighting, labels, quick fixes, outline view are in the m.ide project independently from the execution platform. It's useful to have a m.ui project to test the functionality in a eclipse instance with a debugger attached to it.

Single sourcing: If certain code's structure logically depends on another code, it should get automatically generated. Example: Blockly block definitions depend on the grammars rules. Block definition files are automatically generated based on the grammars.

Parser: Terminals are single character and pairwise disjoint. Don't use hidden terminals. Don't use until token. Don't use syntactic predicates. Don't use backtracking. DataTypes combine terminals in the least information needed. Expressions follow Xtext guidelines for associativity and precedence. Full support for c# expressions in the base grammar. M grammar can override a rule to skip certain operators from the language. Unary minus operand acts on floats to make them negative (otherwise they are alway positive). Numbers in components can be negative because they use a different datatype rule which can include a minus sign (NUMBER vs FLOAT).


Generate.java: This main java file generates all the xtext infrastructure from the grammar files. To generate all the project files, first execute this main file. It will generate a bunch of files under m/src, m/src-gen, m.ide/src and m.ide/src-gen. It also generates a m.ui project to faster test the language server inside an eclipse application. The easiest way to use eclipse with this project is to create a xtext project with the name m first with generic ide and eclipse plugin options selected, then download the project files to possibly override the existing ones. Execute generate.java as a java application and your eclipse project will be ready.

xRuntimeModule: Dependency injection wiring via Google guice modules. If the injection is already wired in AbstractLanguageRuntimeModule, the injection needs to be overridden, if it's not it must use def. Xtext has syntax sugar def bindIService(){ ClassName } OutputConfiguration is an example.

x.xtext: Grammar definitions. All independent. Limit terminals. More data rules and lastly parser rules. no backtracking, no semantic helpers.

converter/MyTerminalConverter: Used when terminal rules return something other than a EString. Defines how the conversion works in both directions.

errorHandling/XSyntaxErrorMessageProvider: Used when the parser throws an error to improve the error message considering the context.

xFormatter: dispatched functions to add white spaces when serializing a model. AAlways try to use prepend to avoid conflicts, use grammar access to access non-keywords. Use literal keywords like ',' instead of grammarAccess ones for better maintainability.


Main.java: For headless execution. Depends on java JRE and the JAR built. Give it a file and it will parse it and generate it for you.

transformation: Each file is responsible of generating all files of a specific kind. First the type inference runs and now with the model and the types inferred the generator can generate all the artifacts. BuiltInLibrary translates names of built-in components for generating the same text in other languages. Some are legacy code. The ones that uses TextGenerator matter.

No hybrid or pure distinction, use Hybrid.Rendering, Unity.Physics etc when they are ready and do the rest in Hybrid, as any professional would do nowadays.

UnityGenerator UnityParser BlockGenerator BlockParser

xValidator: Rules to make extra assumptions true in the model. For example unique names.

Workflow.mwe2: Important. Generates all the artifacts. It defines the project structure and the fragments that each language cares about.


MyContentAssist extends IdeContentProposalProvider: For platform agnostic definition of content assist proposals.

MyLabelProvider extends SimpleNameLabelProvider: For platform agnostic definition of label names.

build.gradle: Gradle script to build the language server. It needs the reference to the mwe workflow file and it passes it the rootPath variable. It runs the workflow and generates a language server fat jar file in build folder.

# Generating Xtext artifacts

The first step to build the project is to generate all the xtext artifacts. The RunFragments.java file creates a xtext generator workflow and adds a standard language to it for each grammar file located in the m/src/m folder.

The standard languages include built in fragments to generate the grammar access, parser, emf model, generators, serializer etc. To all this we add a custom fragment called ContextualParserMessagesFragment which wires the ContextualParserMessages as the responsible for producing error messages when the parser finds one.

When we run the workflow, the generated dependency injection injectors AbstractXRuntimeModule.java wire all the fragments together. In particular, they wire the ISyntaxErrorMessageProvider interface to ContextualParserMessages.

# Using gradle

The build.gradle file defines two gradle tasks:

  • runFragments: Runs the m.main.RunFragments.java file with the correct dependencies to generate the xtext artifacts.
  • generateLanguageServer: Packages all the generated .class files and the dependencies in a fat jar file.

# Folder structure

The project has three root folders:

  • .github: Contains all the automation tasks and their configuration files.

  • Code: Contains all the java files and xtext grammars used to describe the language server and client.

  • Documentation: Contains all the markdown files that explain the reasoning behind the code.

# Validation

There are three main types of checks:

  • Uniqueness checks: Makes sure that there is a single element in the container with that specific name. Examples: unique component, unique entity, unique system. Included checks: Unique component name in entity, Unique entity name in module, Unique system name in module, Unique procedure name in module, Unique tag name in loop tags, Unique argument name in procedure.

  • Existence checks: Makes sure that the specified cross reference exists. Examples: function call name exists, base of an entity exists. Included checks: The base of an entity exists in the module, The entity accessed in a variable exists in a previous assignment or loop statement.

  • Acyclic checks: Makes sure that the dependencies are not cyclic. That means, if A depends on B, B can not depend on A etc. Examples: entity A has base B, entity B can not have base A. Similarly for 3-cycles and n-cycles. A depends on B, B depends on C, C can not depend on A. Included checks: The base of an entity is acyclic.

  • Typing checks: Makes sure that the variables have conformable types. Examples: Engine components have the correct type (Position must be a Float3 etc), Custom vector components have up to four entries, the left and right expressions in an addition have the same type.

# Parsing error messages

The ContextualParserMessagesFragment deduces parser error messages from the grammar definition. When there is an error, it walks the grammar structure to find the feature that is missing, considering all the possible alternatives and creates a one line message for each alternative.

For example, under the input

entity : ?

It gives the error message

Parser error at ?
Write WORD to set the base of the Entity

# Unity components

Four kind of components represent the data:

  • Value components: They are IComponentData with automatically generated authoring components. Example: mass.

  • List components: They are IBufferElementData with custom authoring component that converts game object references to entities.

  • Reference components: They are class IComponentData with custom authoring component that adds a componentObject to the entity.

  • Engine components: They are UnityEngine classes that get added to entities by the EngineComponentConversion system.

# Workflow

M operates in four stages: Parsing, validating, generating and formatting.

The parser processes the input given by the user and checks for syntactical errors. The result is an abstract syntax tree containing all the essential information of the file, all the data without white spaces.

The validator checks the information of the abstract tree for semantic validity. The result is an Eclipse Modeling Framework model of the abstract representation of the interactive simulation.

The generating represents the model in all the requested platforms, including textual representations in different languages, visual representations in different languages and project representations in different game engines. The result is a folder with all the required files for representing that simulation in each platform.

The formatter mixes white space with the generated text so that humans can understand it.

# For each view

M has three main views: Textual, Visual and Project. These might appear translated to different natural languages. For instance, English, Spanish and Basque. The total number of combinations is the product of views by translations.

# Language syntax

Allow not recommended constructions like single line comments, multi line comments, number and string literals in systems... but warn them as deprecated. This way, coders are not confused but directed in the "right" direction, and after adoption, the language will enforce stronger rules.

M has a C-style syntax, like C++, Java or C#. It build on the same idea of expression and control flows like iteration and branching blocks but differs on the higher level abstractions. M does not contain any object oriented concepts like classes or visibility modifiers. Neither has the ability to specify new data structures, enumerations or similar in C. Instead it provides two simple and powerful abstractions: Entities and systems.

Two root level concepts in M: Entities and systems. In fact, every program written in M consists solely of entities and systems. The components of entities hold all the data of the program while systems hold the interactions.

Entities have a name and a list of components. Each component holds a single value which might be a vector or an asset. Vectors store numeric values such as mass, position or rotation. They can have up to four dimensions. Assets are a sequence of words of arbitrary length. The type inference system will decide which kind of asset - mesh, image, audio clip, font... - based on the usage of the component.

Systems consist of a unique name and a list of sequential commands. These command can be blocks such as loops or branches that might in turn contain more commands themselves.

Four kinds of commands. Loops iterate over entities with a certain constraints, branch commands take different paths depending of a condition. Repeat commands repeat their body a set number of times. Assignments assign a value to a variable or a component. Call commands call a function that is either built in or defined by the user.

The expressions of the language consist of boolean and arithmetic expressions. Arithmetic expressions can use operands like +,-,*,/,% and the bitwise &,|,<<,>>,^,~ to alter numerical values. The types of the operands must be compatible with the operation. The bitwise operators work over single dimension vectors (scalars). The multiplicative operations *,/,% can take any vector as first argument but a scalar as a second argument. The additive operations + and - accept any two vectors as long as their dimensions are equal.

And that's it. You can build more complex abstractions by combining these basic concepts. You can watch some of these in action in the examples documentation.

# Keywords

The grammar has some words reserved that can't be identifiers. The list of all identifiers is:

foreach, with, if, else

You can use any other word for your identifiers as long as it starts with an underscore or letter followed by any amount of letters and numbers.

Identifier: '_'? ('a'..'z'|'A'..'Z'|'0'..'9')+

White space. The parser of M ignores all the white space. You can structure your code as you want using white space. We recommend using the built in formatter for consistency across team members and easier version controlling.

Whitespace: '\n'|'\r'|' '|'\t'

Sometimes, the next token must be a number.

Number: '-'? DIGIT* ('.' DIGIT+ ('e' '-'? DIGIT+)?)?
DIGIT: '0'..'9'

# Data oriented design

Everything is data. Introducing the data model:

The data matrix and the variable heap hold all the data of a frame of a game.

Our job as programmers is to define the laws that transform this data every frame.

The possible ways to alter the data are:

  • Create a new row in the matrix (Entity creation)

  • Delete a row in the matrix (Entity deletion)

  • Activate a cell in the matrix (Component addition)

  • Deactivate a cell in the matrix (Component removal)

We can use the variable heap to store temporary values. The system automatically deletes this values after every frame.

  • Create an entry in the heap (Variable declaration)

Now we need a way to gather values to write to the matrix cells and the heap.

  • Read a value stored in a cell (Component access)

  • Read a value stored in the heap (Variable access)

These are the atoms of data. We can transform the data by using transformations:

  • Operators: Arithmetic, Comparison, Logic, Set operators
  • Grouping: Join, Set creation, Brackets...
  • Functions: Math functions.

Control Flow.

We may repeat this process per matching row (Forall), while a condition is true (Iteration) or if a condition is true (Selection).

And that is the whole grammar of M.

# Documentation

Treat documentation as code. Use linters for syntax and semantic correctness (correct grammar, vocabulary, links) and test the documentation (do the examples actually work, do the code snippets have correct syntax) Documentation is as important as source code. Under 50 files (25), under 10k lines (1.5k). Similar lines of code and documentation. 1 gradle 1 workflow 1 gitIgnore

1 Terminal converter, 2 parser, 7 formatter, 7 generator, 7 validator, 7 grammar, 7 dependency injection modules, 7 ide modules.

1 type inference 1 translator 1 factory 1 serializer

4 XToGame (Blocks, text, testua, unity) 4 GameToX

5 visual editor, 3 for online and standalone editors.

Don't assume experience. Avoid words like easy, trivial. Avoid acronyms or introduce them.

Test it. write-good for grammar. markdown-link-check for links. Format with scc for not too long, vscode md lint for heading structure and whitespace. Lists and bulleted lists? Quotes checked manually.

# Project structure

The project consists of four root folders: Code, Documentation, Tests and .github.

# The root folders

Code contains the source code of the program.

Documentation contains explanations of the whys and hows og the rest of the project.

Tests contain the source files that ensure that the code works as expected.

The .github folder contains automation code and social elements. The automation consists of DevOps actions and configuration files to build, release and deploy. The social files are templates for issues and pull requests and redirection for funding options.

# Root files

The root of the project contains three files. The license file specifies the license for the project. Notice.md is the copyright notice which includes the author mention for the project this one depends on. The readme file gives an overview of the project.