# Design decisions

# Motivation. Why does M exist

Problem I am trying to solve: I know how to say it, I have said it, I know someone else can say it.

Why my problem is important to solve: Who does it benefit? How much is it worth solving?

Plan B: What other solution do I have for my problem? Is plan B implemented?

Steps: I can articulate the steps to solve my problem I can point risks and unknowns in those steps Examples: I have used the project to solve actual problems Completion: I know what means to my project being complete Statistics: I know how to falsify my hypothesis. Test with users my project and disprove my assumptions.

Efficiency: I know how long it takes to run the project. I know the minimal specs for my project to run I know the dependencies of my project

Profile the generation I know how to optimize the generation

Use case: I know the most common use cases of the project I know how programs written in my project look lke in real life I know the acceptable range of programs that programmers can write in my project I know what happens when someone goes out of that range Watch users use the project What is the part that they require most time for? I know to debug when something fails, even without source code I can apply data oriented: I know what data I operate on I know how I act with cache This project gives back to the community.

# Properties

M is a DSL Compiled External C-like concrete syntax textual Scratch like visual syntax Abstract syntax is a EMF model Strongly typed with implicit types Type inferred and validated No null reference, no type conversion error, no infinite loop Execution semantics: Game engine projects with assets Useful for non programmers Turing complete Efficient compilation Efficient generated code, native multi threaded Clean: Native support for DOD Full IDE support:

  • Code completion
  • Code generation
  • Syntax Highlight
  • Error markers
  • Quick fixes
  • Formatting
  • Refactoring
  • Labels
  • Outline view
  • Find uses
  • Find definition
  • Debugging

Small syntax Concise User defined abstractions: Files as libraries, functions for reusability, prefabs possibly nested Fast and volatile development of language No collections, all entities Big language scope: Most games M improves:

  • Productivity: Less code to write and read
  • Quality: No runtime errors, halts, multi threaded
  • Communication and thinking: Familiar domain concepts, personalized views
  • Platform support: Any platform with JVM any platform with Unity support

# Lexer

The lexer is derived from the Unicode 13 standard. It accepts all character code points in the Base Multilangual Plane: U+0000 to U+FFFF.

It classifies the characters in character classes given by the unicode consortium and then merges together in the major character groups:

  • Letter
  • Number
  • Punctuation
  • Symbol
  • Blank
  • Control

For the case of M, three terminals are interesting:

  • Identifiers are a nonempty string of letters
  • Operators are a nonempty string of symbols
  • Blanks are a nonempty string of blanks
  • All the rest are unrecognized characters

# Syntax

To distinguish parser rules from each other, some keywords are necessary. In M, the keywords are taken from the punctuations class. This is because it is not involved in the interesting terminals and keeps the syntax natural language agnostic.

The syntax of M is inspired in the family of C-like programming languages.

The keywords in M are:

  • Brackets for grouping: (), [], {}
  • Commas for indicating sequences: ,
  • Equality for assignment: =
  • Dot for cell access: .

# Parser

The parser of M is inspired in maths. In particular, minimal Turing complete languages with infix operator support and a special expression in cells from ECS paradigms.

A program is a list of functions. A function has a name, a list of parameters and a list of statements. A statement is either a block statement, an assignment or a delegation. A block statement has a name, a expression and a list of statements. An assignment assigns an expression to a value or a cell. A delegation delegates work to another function and receives the output. An expression can be a binary operator, a unary operator, a function application, a value or a cell.

# Type system

System F with bounds, possible bounded type variables.

Is a lambda calculus where the set of base types is {Bool,Number,String} and there are no constants. The only definable types are the base types and function types t -> t' where t and t' have been previously defined. It is implicitly typed but fully inferred.

The system is inspired in the theory of algebraic types. M supports

  • sum types: a + b: either type a or b
  • product types: a x b: a tuple of (type a, type b)
  • exponent types: a -> b: a function from type a to b
  • polymorphic types: f<a>: functions on types, for example List<Number>.

Expressions get typed for two reasons:

  • Symbol typed in standard library
  • Entity of a cell

Typing errors occur when either:

  • Not enough information to infer the type of an expression
  • The inferred information is not coherent or compatible

# Type inference

Hindley-Milner style fully implicitly typed inference. Similar to Algorithm W.

# Standard library

The standard library defines the types, symbols and errors messages that are available to all files written in M. Symbols are names with an associated predefined type. For example, The symbol mass will always have the type number. However, its name will depend on the specific instance of standard library that is in use. In fact, two instances of the standard library are provided, one using English as the natural language and the other using Euskara.

The collection of symbols is inspired in game development, mostly in the standard components of the Unity game engine. Commonly used properties and functions are included. In particular, from the seven main areas of game development:

  • Rendering
  • Audio
  • Networking
  • Input
  • Physics
  • AI
  • Animation

Functions are inspired in math libraries and the matrix model of the ECS paradigm (CRUD over the matrix):

  • All the data is stored in a conceptual matrix
  • The rows represent entities and the columns components
  • All the simulation in the game happens by modifying the matrix
  • By removing and adding rows we create and destroy entities
  • Columns can not be altered at runtime (no dynamic components)
  • By setting or nullifying cells we add and remove components
  • By querying for column sets we iterate over entities with specific sets of components
  • By looking wheter a row has a column active we determine whether an entity has a component or not

It also provides messages to form the error messages. Five kinds of errors can happen in M:

  • Syntax error: Didn't follow the grammar rules
  • Symbol undefined: Used a symbol that has not been defined
  • Symbol redefined: Defined a symbol that was already defined
  • Undecidable type: Not enough information to infer type of an expression
  • Incompatible types: Inferred multiple types for an expression that are not compatible between them. For example, number and numeric or sum types are compatible.

# Validation algorithm

Custom inference algorithm determines the types of all expressions involved in a program written in M and reports errors if the process goes badly.

# The IDE project

To showcase the capabilities of the language server M, a custom IDE is built using Theia. It is a node application with a extension. The theia blocks are defined in the root package.json and an extension called m is included with backend and frontend contributions for the IDE.

# The syntax

The goal is to maximize the familiarity of programmers. According to the Stack Overflow 2020 interview, the top used and loved programming languages use curly braces or indentation to group statements in blocks. M can support both styles independently so that programmers can feel at home and adapt the syntax to their needs.

# Debug and production modes in the generator

By default, code is generated in "debug" mode. It is aimed at providing maximal flexibility while developing the game. When required, a production build can be made to maximize the efficiency of the generated code, assuming properties of the worlds of the project.

# Parser and syntax

Based on structured+procedural+modular programming. All code blocks are enclosed by curly braces or indentation. Programs are divided in files that each contain functions that can call each other. The return statement is a delegation to the return function which can have a different name in each natural language.

A program is divided in files. A file is a collection of functions. A function has a name, a list of parameters and a block of statements.

A statement is a controlled block of statements or assignment or an application.

Assignment is an statement not an operator or expression in M.

M is a expression oriented language.

# Licensing

The project is licensed under AGPL v3 to enforce that it will always remain open sourced.

The NOTICE.md file in the root folder contains the attribution to the open source frameworks that get bundled when the desktop applications are built.

It has two parts: First explains that the source code in the repository is licensed under AGPLv3 and Martin Azpillaga Aldalur is the copyright holder.

Second, it lists the bundled frameworks with their author names and a link to the open source initiave webpage of the corresponding license as well as a link to the source code repository.

# Classification. What is M technically

M is a Imperative Structured programming Procedural programming Modular programming Data oriented programming DSL Compiled External C-like concrete syntax textual Scratch like visual syntax Abstract syntax is a EMF model Strongly typed with implicit types Type inferred and validated No null reference, no type conversion error, no infinite loop Execution semantics: Game engine projects with assets Useful for non programmers Turing complete Efficient compilation Efficient generated code, native multi threaded Clean: Native support for DOD Full IDE support:

  • Code completion
  • Code generation
  • Syntax Highlight
  • Error markers
  • Quick fixes
  • Formatting
  • Refactoring
  • Labels
  • Outline view
  • Find uses
  • Find definition
  • Debugging

Small syntax Concise User defined abstractions: Files as libraries, functions for reusability, prefabs possibly nested Fast and volatile development of language No collections, all entities Big language scope: Most games M improves:

  • Productivity: Less code to write and read
  • Quality: No runtime errors, halts, multi threaded
  • Communication and thinking: Familiar domain concepts, different interfaces
  • Platform support: Any platform with JVM any platform with Unity support

# Turing complete

M is turing complete. Indeed, given a Turing machine with a table of rules, one can simulate it in M as follows:

The tape is a list of entities, each having previous, next and value. Previous and next refer to the previous and next entity cells and value stores the value of the cell.

Create an entity global with state, haltState and initialCell. State determines the internal state of the Turing machine and the haltState is the state where the computation stops. Place the head of the machine in the entity referenced by initialCell.

Create an entity entry with components state, value, writeState, writeValue, continueRight. The combination of state and value determine when to use this instruction. In that case, replace the global state with writeState, replace the cell value with writeValue and proceed right if continueRight else proceed left.

The algorithm follows by storing the currentCell in a variable of type Entity:

currentCell = global.initialCell

while global.state != global.haltState
{
    for each a with a = currentCell
    {
        for each entry
        {
            if global.state = entry.state and a.value = entry.value
            {
                a.value = entry.writeValue
                global.state = entry.writeState

                if (entry.continueRight)
                {
                    currentCell = a.next
                }
                else
                {
                    currentCell = a.previous
                }
            }
        }
    }
}