What's a good resource for starting to write a programming language, that's not context free? [closed]

I'm looking to write a programming language for fun, however most of the resource I have seen are for writing a context free language, however I wish to write a language that, like python, uses indentation, which to my understanding means it can't be context free.

Asked by: Daryl870 | Posted: 28-01-2022

Answer 1

A context-free grammar is, simply, one that doesn't require a symbol table in order to correctly parse the code. A context-sensitive grammar does.

The D programming language is an example of a context free grammar. C++ is a context sensitive one. (For example, is T*x declaring x to be pointer to T, or is it multiplying T by x ? We can only tell by looking up T in the symbol table to see if it is a type or a variable.)

Whitespace has nothing to do with it.

D uses a context free grammar in order to greatly simplify parsing it, and so that simple tools can parse it (such as syntax highlighting editors).

Answered by: Julian442 | Posted: 01-03-2022

Answer 2

You might want to read this rather well written essay on parsing Python, Python: Myths about Indentation.

While I haven't tried to write a context free parser using something like yacc, I think it may be possible using a conditional lexer to return the indentation change tokens as described in the url.

By the way, here is the official python grammar from python.org: http://www.python.org/doc/current/ref/grammar.txt

Answered by: Stuart794 | Posted: 01-03-2022

Answer 3

I would familiarize myself with the problem first by reading up on some of the literature that's available on the subject. The classic Compilers book by Aho et. al. may be heavy on the math and comp sci, but a much more aproachable text is the Let's Build a Compiler articles by Jack Crenshaw. This is a series of articles that Mr. Crenshaw wrote back in the late 80's and it's the most under-appreciated text on compilers ever written. The approach is simple and to the point: Mr. Crenshaw shows "A" approach that works. You can easily go through the content in the span of a few evenings and have a much better understanding of what a compiler is all about. A couple of caveats are that the examples in the text are written in Turbo Pascal and the compilers emit 68K assembler. The examples are easy enough to port to a more current programming language and I recomment Python for that. But if you want to follow along as the examples are presented you will at least need Turbo Pascal 5.5 and a 68K assembler and emulator. The text is still relevant today and using these old technologies is really fun. I highly recommend it as anyone's first text on compilers. The great news is that languages like Python and Ruby are open sourced and you can download and study the C source code in order to better understand how it's done.

Answered by: Cherry620 | Posted: 01-03-2022

Answer 4

"Context-free" is a relative term. Most context-free parsers actually parse a superset of the language which is context-free and then check the resulting parse tree to see if it is valid. For example, the following two C programs are valid according to the context-free grammar of C, but one quickly fails during context-checking:

int main()
    int i;
    i = 1;
    return 0;

int main()
    int i;
    i = "Hello, world";
    return 0;

Free of context, i = "Hello, world"; is a perfectly valid assignment, but in context you can see that the types are all wrong. If the context were char* i; it would be okay. So the context-free parser will see nothing wrong with that assignment. It's not until the compiler starts checking types (which are context dependent) that it will catch the error.

Anything that can be produced with a keyboard can be parsed as context-free; at the very least you can check that all the characters used are valid (the set of all strings containing only displayable Unicode Characters is a context-free grammar). The only limitation is how useful your grammar is and how much context-sensitive checking you have to do on your resulting parse tree.

Whitespace-dependent languages like Python make your context-free grammar less useful and therefore require more context-sensitive checking later on (much of this is done at runtime in Python through dynamic typing). But there is still plenty that a context-free parser can do before context-sensitive checking is needed.

Answered by: Kimberly509 | Posted: 01-03-2022

Answer 5

I don't know of any tutorials/guides, but you could try looking at the source for tinypy, it's a very small implementation of a python like language.

Answered by: Walter469 | Posted: 01-03-2022

Answer 6

Using indentation in a language doesn't necessarily mean that the language's grammar can not be context free. I.e. the indentation will determine in which scope a statement exists. A statement will still be a statement no matter which scope it is defined within (scope can often be handled by a different part of the compiler/interpreter, generally during a semantic parse).

That said a good resource is the antlr tool (http://www.antlr.org). The author of the tool has also produced a book on creating parsers for languages using antlr (http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference). There is pretty good documentation and lots of example grammars.

Answered by: Alina715 | Posted: 01-03-2022

Answer 7

If you're really going to take a whack at language design and implementation, you might want to add the following to your bookshelf:

  • Programming Language Pragmatics, Scott et al.
  • Design Concepts in Programming Languages, Turbak et al.
  • Modern Compiler Design, Grune et al. (I sacrilegiously prefer this to "The Dragon Book" by Aho et al.)

Gentler introductions such as:

  • Crenshaw's tutorial (as suggested by @'Jonas Gorauskas' here)
  • The Definitive ANTLR Reference by Parr
  • Martin Fowler's recent work on DSLs

You should also consider your implementation language. This is one of those areas where different languages vastly differ in what they facilitate. You should consider languages such as LISP, F# / OCaml, and Gilad Bracha's new language Newspeak.

Answered by: John523 | Posted: 01-03-2022

Answer 8

I would recommend that you write your parser by hand, in which case having significant whitespace should not present any real problems.

The main problem with using a parser generator is that it is difficult to get good error recovery in the parser. If you plan on implementing an IDE for your language, then having good error recovery is important for getting things like Intellisence to work. Intellisence always works on incomplete syntactic constructs, and the better the parser is at figuring out what construct the user is trying to type, the better an intellisence experience you can deliver.

If you write a hand-written top-down parser, you can pretty much implement what ever rules you want, where ever you want to. This is what makes it easy to provide error recovery. It will also make it trivial for you to implement significant whitespace. You can simply store what the current indentation level is in a variable inside your parser class, and can stop parsing blocks when you encounter a token on a new line that has a column position that is less than the current indentation level. Also, chances are that you are going to run into ambiguities in your grammar. Most “production” languages in wide use have syntactic ambiguities. A good example is generics in C# (there are ambiguities around "<" in an expression context, it can be either a "less-than" operator, or the start of a "generic argument list"). In a hand-written parser solving ambiguities like that are trivial. You can just add a little bit of non-determinism where you need it with relatively little impact on the rest of the parser,

Furthermore, because you are designing the language yourself, you should assume it's design is going to evolve rapidly (for some languages with standards committees, like C++ this is not the case). Making changes to automatically generated parsers to either handle ambiguities, or evolve the language, may require you to do significant refactoring of the grammar, which can be both irritating and time consuming. Changes to hand written parsers, particularly for top-down parsers, are usually pretty localized.

I would say that parser generators are only a good choice if:

  1. You never plan on writing an IDE ever,
  2. The language has really simple syntax, or
  3. You need a parser extremely quickly, and are ok with a bad user experience

Answered by: Owen370 | Posted: 01-03-2022

Answer 9

Have you read Aho, Sethi, Ullman: "Compilers: Principles, Techniques, and Tools"? It is a classical language reference book.


Answered by: Elise413 | Posted: 01-03-2022

Answer 10

If you've never written a parser before, start with something simple. Parsers are surprisingly subtle, and you can get into all sorts of trouble writing them if you've never studied the structure of programming languages.

Reading Aho, Sethi, and Ullman (it's known as "The Dragon Book") is a good plan. Contrary to other contributors, I say you should play with simpler parser generators like Yacc and Bison first, and only when you get burned because you can't do something with that tool should you go on to try to build something with an LL(*) parser like Antlr.

Answered by: Rubie233 | Posted: 01-03-2022

Answer 11

Just because a language uses significant indentation doesn't mean that it is inherently context-sensitive. As an example, Haskell makes use of significant indentation, and (to my knowledge) its grammar is context-free.

An example of source requiring a context-sensitive grammar could be this snippet from Ruby:

my_essay = << END_STR
This is within the string

<< self
  def other_method

Another example would be Scala's XML mode:

def doSomething() = {
  val xml = <code>def val <tag/> class</code>

As a general rule, context-sensitive languages are slightly harder to imagine in any precise sense and thus far less common. Even Ruby and Scala don't really count since their context sensitive features encompass only a minor sub-set of the language. If I were you, I would formulate my grammar as inspiration dictates and then worry about parsing methodologies at a later date. I think you'll find that whatever you come up with will be naturally context-free, or very close to it.

As a final note, if you really need context-sensitive parsing tools, you might try some of the less rigidly formal techniques. Parser combinators are used in Scala's parsing. They have some annoying limitations (no lexing), but they aren't a bad tool. LL(*) tools like ANTLR also seem to be more adept at expressing such "ad hoc" parsing escapes. Don't try to use Yacc or Bison with a context-sensitive grammar, they are far to strict to express such concepts easily.

Answered by: Sydney765 | Posted: 01-03-2022

Answer 12

A context-sensitive language? This one's non-indented: Protium (http://www.protiumble.com)

Answered by: Miller794 | Posted: 01-03-2022

Similar questions

java - Programming language decision for C++ legacy project workflow

Closed. This question is opinion-based. It is not c...

python - Which programming language or a library can process Infinite Series?

Which programming language or a library is able to process infinite series (like geometric or harmonic)? It perhaps must have a database of some well-known series and automatically give proper values in case of convergence, and maybe generate an exception in case of divergence. For example, in Python it could look like: sum = 0 sign = -1.0 for i in range(1,Infinity,2): sign = -sign sum +=...

python - R as a general purpose programming language

Closed. This question is opinion-based. It is not c...

c++ - Which programming language to choose?

Closed. This question is off-topic. It is not curre...

java - System side programming - Which language?

I have few questions over System side programming. Can Python be used for both Web and System like perl. Which language would you prefer me. I have a little knowledge on JavaScript and Java. If i want to develop a compiler what should i know and where should i start.

java - Detect Programming Language from code snippet

This question already has answers here:

idl programming language - reading a binary file in python

I have to read a binary file in python. This is first written by a Fortran 90 program in this way: open(unit=10,file=filename,form='unformatted') write(10)table%n1,table%n2 write(10)table%nH write(10)table%T2 write(10)table%cool write(10)table%heat write(10)table%cool_com write(10)table%heat_com write(10)table%metal write(10)table%cool_prime write(10)table%heat_prime write(10)table%cool_com_prime write(10)t...

idl programming language - Calling an IDL script in Python

I would like to run an IDL script in a python code, since I need to analyse the results of IDL code later in the python script but I have no idea how it works. I want to call this IDL script for example in a python code: pro plotgaussian, center, sigma, X=x, Y=y x = findgen(1000) / 999; numbers running 0 to 1 in steps of 0.001 x = x * 6 * sigma - 3 * sigma; widen x to range over 6 sigma x = x + center; cen...

Python programming language login error

I'm an absolute beginner to Python, but I have used visual basic.net a little. I'm currently working on a login using a pair of lists as a form of database but I'm receiving an error when running the module. Like I said I'm an absolute beginner so it is probably something stupid, but any help resolving this error would be really appreciated, thank you! Here is the *.py file: username = ["nathan","ge...

python - UDP sockets in D Programming Language

I'm attempting to convert a python program to D; the program is for sending Art-Net DMX packets. Python: import sys, socket, math, time from ctypes import * class ArtNetDMXOut(LittleEndianStructure): PORT = 0x1936 _fields_ = [("id", c_char * 8), ("opcode", c_ushort), ("protverh", c_ubyte), ...

python - OPENGL User Interface Programming

Closed. This question does not meet Stack Overflow guid...

python - Which is the most useful Mercurial hook for programming in a loosely connected team?

I recently discovered the notify extension in Mercurial which allows me quickly send out emails whenever I push changes, but I'm pretty sure I'm still missing out on a lot of functionality which could make my life a lot easier. notify-extension: https://www.mercurial-scm.org/wiki/NotifyExtension Which Me...

Socket programming for mobile phones in Python

I've written code for communication between my phone and comp thru TCP sockets. When I type out the code line by line in the interactive console it works fine. However, when i try running the script directly through filebrowser.py it just wont work. I'm using Nokia N95. Is there anyway I can run this script directly without using filebrowser.py?

functional programming - Pattern matching of lists in Python

I want to do some pattern matching on lists in Python. For example, in Haskell, I can do something like the following: fun (head : rest) = ... So when I pass in a list, head will be the first element, and rest will be the trailing elements. Likewise, in Python, I can automatically unpack tuples: (var1, var2) = func_that_returns_a_tuple()

python - Convert CVS/SVN to a Programming Snippets Site

I use cvs to maintain all my python snippets, notes, c, c++ code. As the hosting provider provides a public web- server also, I was thinking that I should convert the cvs automatically to a programming snippets website. cvsweb is not what I mean. doxygen ...

What are the benefits of using Python for web programming?

What makes Python stand out for use in web development? What are some examples of highly successful uses of Python on the web?

functional programming - Modify bound variables of a closure in Python

Is there any way to modify the bound value of one of the variables inside a closure? Look at the example to understand it better. def foo(): var_a = 2 var_b = 3 def _closure(x): return var_a + var_b + x return _closure localClosure = foo() # Local closure is now "return 2 + 3 + x" a = localClosure(1) # 2 + 3 + 1 == 6 # DO SOME MAGIC HERE TO TURN "var_a" of the closure into 0 # ...

Interested in Collective Programming for the web -- Ruby or Python or PHP?

Closed. This question is opinion-based. It is not c...

network programming - How do you determine if an IP address is private, in Python?

In Python, what is the best way to determine if an IP address (e.g., '' or '') is on a private network? The code does not sound difficult to write. But there may be more edge cases than are immediately apparent, and there's IPv6 support to consider, etc. Is there an existing library that does it? ...

c# - ORM (object relational manager) solution with multiple programming language support

Is there a good ORM (object relational manager) solution that can use the same database from C++, C#, Python? It could also be multiple solutions, e.g. one per language, as long as they can can access the same database and use the same schema. Multi platform support is also needed. Clarification: The idea is to have one database and access this from software written in several different pr...

Still can't find your answer? Check out these communities...

PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python