lördag, oktober 24, 2009

What are BIF's?

There has always been much confusion about what a BIF really is, how they relate to the Erlang language. An example of this was a discussion earlier this year when someone wanted to have some more functions in lists coded in C and so become part of Erlang. Saying that they are functions coded in C just tell us how they are implemented not what they *are*.

The source of this confusion is very old and exists in the earliest documentation, where a BIF was described as function "which could not be written in Erlang or written in C for efficiency". They were in some sense implicitly part of Erlang and considered to be in the module 'erlang'. Some could be used without specifying the module but there was no clear reason why some needed the module and others did
not.

A first proper attempt to define what a BIF is came when Jonas Barklund and I wrote the Erlang specification. There we proposed that a BIF was part of the Erlang language that did not have a special syntax but looked like a normal function call. Spawn, link and trapexit are just as much part of Erlang as ! and receive even though they use normal function syntax and semantics. We also decided that it was irrelevant in what language the function was written, what was important was the function itself. This is analogous to processes where the semantics are what is important. I think this definition is correct but calling them BIFs was wrong.

Our BIF proposals were put on ice together with the Erlang specification.

The problem still remains and has not become better - there is still much confusion as to what a BIF is, if being coded in C has anything to do with it, and if being coded in C means that the functions automatically becomes part of the language. BEAM and the compiler also handle BIFs in many different ways which does not always help.

Currently BIF is used to describe at least three different things:

- Functions which are part of Erlang, spawn and link are examples of this.

- Functions which are not part of Erlang but which are part of a basic runtime system, the trace BIFs are an example of this.

- Functions which have been coded in C for efficiency, for example lists:reverse.
(Some might not agree with separating the first two)

Moving the module 'erlang' from kernel to erts is a step in the right direction. As is putting the new unicode BIFs (?) in a separate module, because they are not part of the language just a library where some functions are coded in C.

OK, so why worry about it? Does it really matter? I think it does (or else I wouldn't be writing this mail) for the following reasons:

- We need a proper language specification and this would be a part of it. Some day I plan to resurrect the old Erlang spec and get it up to date and we have to start somewhere.

- It would clarify what is part of the language and what is not. This would put various discussions about adding things to the implementation/language in their proper context. So coding a function in lists in C would just become an implementation detail where the issue would be whether it is worth the extra effort to do this, and not misplaced discussions about adding to the language.

- I think it would allow you to do more with the FFI than is proposed today and still be consistent with the language.

- And it would help people realise and understand what is what - unclarity and confusion are never good.