Mostrando entradas con la etiqueta metacompilers. Mostrar todas las entradas
Mostrando entradas con la etiqueta metacompilers. Mostrar todas las entradas

miércoles, 15 de septiembre de 2021

Oh Yes You Can Use Regexes to Parse HTML!

This is Perl, and regexes, and parsing, so if you enjoy those kinds of things, you'll love the comment on this HN Thread. that points to this insane "oh yes, you can use regexes to parse HTML" .


Wow. We've seen all the "you can't parse html with regexes",  and if you were into Perl and knew about the superpowered regexes, you knew it was possible. 

And you might even remember that Regex::Grammars was some amazing Damian  Conway's thing that twisted regexes to their limits.  

Or, my Meta-II compiler implemented in a Perl regex...

 

But it's great to see all those twistings of the common tools. 

jueves, 23 de enero de 2020

More META-II

So there's a new article on the internet about META-II!  And not just that, but it also talks about Forth!  And not just that, but it also talks about raku!


That thing alone already deserves a detailed browsing of the whole blog.

domingo, 17 de enero de 2016

Bootstrapped metacompiler using Perl5 and lua

I wrote a Shchorre's metaII implementation myself using perl regexes.

The whole code that is run is just a recursive regexp match against a string (/$bootstrap/ =~ /$program/), which makes it even more mindfucked than usual. It's a simple way to create recursive descent parser just using regexes and perl extended patterns.  The string that tries to match is a representation in meta-II of the very same syntax the string is written on.  Yes.  :-)

I'm taking advantage of the Perl5 extended pattern '(?{})' that runs perl code whenever the regex reaches that point.  The idea is pretty similar to how metaII outputs work themselves even syntax-wise, so I thought it was a nice way to implement it as it's using the same idea that is going to use metaII after being bootstrapped (sorry if this post is difficult to read, but I can't find easy ways to write about without it in clear non-chained-and-recursive-and-self-referent-way). 

To be able to run recursive regexes, we need what MJD calls a proxy parser which is just a delayed 'thunk' that will be evaled just at runtime. We can achieve it in the regex world with (??{}).

If you're not familiar with metacompilers, my advise is to google a bit about them, and find out about them. It's an amazing piece of technology.  Basically you can get a compiler build itself in very few lines of code, and then augment it step by step by modifying the rules it consumes, and creating a slightly more evolved copy of itself, that you can use as a stepping stone to create more advanced compilers.

I added a makefile that shows the process of compiling a compiler using itself and a description of itself.

Here's the repo where there  are more insights in the readme file. Also, check my other posts on metacompilers.


viernes, 8 de enero de 2016

dabbling with metacompilers

Lately, I've been reading about metacompilers, and I have to say it's a really impressive piece of technology.  A compiler that can read high level descriptions of grammars to generate other compilers, and it can generate itself. And once you have a description of itself, you can keep tweaking both syntax and semantics using a two step compilation.

It all started in this page when I saw what seemed a fine tutorial with some lua code as example.  I read the code a few times and the amusement was bigger the more I understood what was all that about.  There are very few resources on this technique on the web, so the chances of having to understand everything by myself were big. Btw, the original paper from Schorre is here

Big plans

 While practicing with it I tried to write a parser for lua, because , as you know, lua syntax is quite simple. The first problem was that most syntax descriptions out there are in ebnf... So I thought I should use metaII to write a stepping stone compiler that would undesrtand ebnf, and then feed it the lua syntax. And then I would be happy and have my utterly useless lua parser.

Problems 

MetaII has its own problems, like no backtracking, and really poor error handling. so your parsing either fails or succeeds, but you have notmuch info where or why....

The no backtracking issue is a big one, as ebnf syntax is difficult to convert to a dfa-like grammar. I kept falling into infinite left recursions and dying out of 'stack limit reached'.

Slow Start

So I wiped everything and went back to the basics and started by doing really stupid changes to the metaII syntax. For now I've added comments to it. And I still have the ebnf branch 'alive', so we'll see if I can manage to do something with it.


It's nice I only had to add support for .line, comment and accept comment in place of a rule. It's worth noting that the syntax approach of the compiler makes it easy to have comments in place of full rules, but it would be more complex to have 'line oriented' parsing rules instead of syntax ones. Btw, the only change in the runtime is to add
 local function parseCMT() return read(match("^[^\n]*")) end

More reading

Since I started this adventure, I've read Alessandro Warth Phd about Ometa (I heard about it many many times, but now finally I understand it), and read this metacompilers tutorial on and off. So even with the not-so-much-success situation, the learning is there :)

The end?

Probably no, but I wanted just to write some of the progress in case anyone wants to join me in the quest.