Actions

Difference between revisions of "Internal Documentation"

From Gambit wiki

(Linker)
Line 65: Line 65:
  
 
(From Gambit ML 2009-03-22)
 
(From Gambit ML 2009-03-22)
 +
 +
=== Program startup ===
 +
 +
* The entry point function (either main(), winmain()...) will be generated (in linker file) to call either <code>___main()</code>, <code>___main_UCS_2</code> or <code>___winmain</code>.
 +
* These functions do very basic initialization (setup <code>___base_mod</code> and <code>___program_startup_info</code>) then passes to <code>___main()</code> in main.c.
 +
* This function in turn calls <code>___setup()</code>, which does
 +
** sets up the VM
 +
** Call linker
 +
** Initialize tables (symbol, keyword, global variables, primitives)
 +
** Kick off the kernel
  
 
== Compiler ==
 
== Compiler ==

Revision as of 14:12, 16 November 2009

People who want to contribute to Gambit development will need to learn something about how the Gambit-C runtime and compiler are organized. While we intend that source code documentation be included in the source itself (currently there is very little documentation), we intend that descriptions of program design or algorithms used in the runtime and compiler could be included here.

Namespace handling

See Namespaces.

Runtime Library

Memory Management

General notes on internal object storage and memory consumption is on the Debugging page. Also see Notes on Memory Management.

Thread System

I/O System

Arithmetic implementation

Eval

Continuation manipulation

The manual lists continuation-graft, continuation-capture, and continuation-return but doesn't describe them. The REPL debugger, and possibly other things, use them. See Marc Feeley's paper A Better API for First-Class Continuations.

REPL

The REPL has some fairly interesting functions and variables, especially for hackers.

Variables

##repl-location-relative
Should the REPL give relative or absolute pathnames. Note: When using emacs with gambit, it is useful to set it to #f, especially if you change the current-directory.

Functions

##cmd-x
where x is a REPL command letter (typed after a comma from the REPL). Executes that command as if it was executed inside of the REPL. For instance ##cmd-b displays a backtrace.

Record system

That is, define-type. Based on SRFI-9, but extensions not documented. This email provides the best explanation [1]

Introspection

Symbol introspection

To get list of interned symbols:

(define (symbol-table->list st)

  (define (symbol-chain s syms)
    (let loop ((s s) (syms syms))
      (if (symbol? s)
          (loop (##vector-ref s 2) (cons s syms))
          syms)))

  (let loop ((lst (vector->list st)) (syms '()))
    (if (pair? lst)
        (loop (cdr lst) (symbol-chain (car lst) syms))
        (reverse syms))))

(define (interned-symbols)
  (symbol-table->list (##symbol-table)))

(pp (length (interned-symbols)))

(From Gambit ML 2009-03-22)

Program startup

  • The entry point function (either main(), winmain()...) will be generated (in linker file) to call either ___main(), ___main_UCS_2 or ___winmain.
  • These functions do very basic initialization (setup ___base_mod and ___program_startup_info) then passes to ___main() in main.c.
  • This function in turn calls ___setup(), which does
    • sets up the VM
    • Call linker
    • Initialize tables (symbol, keyword, global variables, primitives)
    • Kick off the kernel

Compiler

Script igsc.scm inside gsc directory can be used to get REPL of compiler so you can inspect details.

Frontend

The frontend entry point is cf, main function to do compilation is compile-parsed-program, which generates GVM instructions. Some optimization is done by frontend via function normalize-program.

TODO: Optimizations, program tree representation.

Intermediate representation

The closet document to describe Gambit Virtual Machine is probably A Parallel Virtual Machine for Efficient Scheme Compilation.

Operands

There are 6 types of operands, described in _gvmadt.scm. All operands are encoded to a number. The following list is extracted from _gvmadt.scm:

 reg(n)       n*8 + 0
 stk(n)       n*8 + 1
 lbl(n)       n*8 + 2
 glo(name)    index_in_operand_table*8 + 3
 clo(opnd,n)  index_in_operand_table*8 + 4
 obj(x)       index_in_operand_table*8 + 5

Global variables (glo), closed variables (clo) and objects are saved in *opnd-table*. Reg, stk, lbl are respectively abbreviations of register, stack and label. All these operands can be created by make-X, where X is the abbreviation.

In .gvm output, registers are prefixed by "+", stack by "-", objects by a single quote, labels by "#". Global variables are displayed by variable name, closed variables are enclosed in brackets, objects

Instructions

GVM instructions include apply, copy, close, ifjump, switch, jump, comment and label, in _gvm.scm, "Virtual machine instruction representation" section.

Instructions in .gvm output are represented by their name. If it's a poll jump, the instruction will be followed by a star. If it's a safe jump, it is followed by a dollar.

Optimization

After GVM generation, dead code is removed by bbs-purify!.

Backend

Backend is selected by target-select!. All backend functions start with target.. The only supported backend is C, which explains the "C" part in "Gambit-C", reside in _t-c-[1-3].scm.

The initial state of a module includes global variables, symbols, keywords, subtypes, number of used labels and procedures. All are maintained during compilation (see "Object management" in _t-c-1.scm) and dumped out as C declaration. The entry point that does this is targ-heap-dump. C output is written in C macros only. gambit.h will actually produce C code suitable for each architecture.

Linking

Because each module contains each own global variables, symbols and keywords. All must be combined to produce single tables for those. Linking works by reading linker info in generated C files. Linker info is actually sexp embedded in C files (and be never read by C compiler as it is protected by #ifdef). Structure of linker info can be found in targ-read-linker-info, which is a list of

  1. Compiler version
  2. Module name
  3. Modules
  4. List of symbols
  5. List of keywords
  6. List of supplied and demanded globals
  7. List of supplied and not demanded globals
  8. List of not supplied globals
  9. Script line

Linking is started with targ-linker. When incremental link is demanded, INCREMENTAL_LINKFILE will be defined in generated C file. gambit.h will handle the rest.