-*- org -*- Information I've found out about chicken internals. define-record will create a structure (named collection of slots) with ##sys#make-structure, and its accessors will do ##sys#check-structure before using ##sys#slot. (Note that data slots start with slot 1, since 0 is the structure name.) Slot 0 can indeed be anything which can be compared eq?, not just a symbol. (define rtd:name '(name x y z)) ; record type descriptor (define n (##sys#make-structure rtd:name 1 2 3)) ,d n ; => structure of type `(name x y z)': ... (##sys#structure? n rtd:name) ; => #t * chicken runtime ** Function name meanings On 5/20/05, Carlos Pita wrote: > One point I would like you to clarify to me is the difference between: > C_XXX > C_a_i_XXX > C_h_XXX > C_i_XXX "C_..." names C-API functions and macros for the Chicken runtime system. "C_i_..." name inline (non-CPS) functions that are used by code generated by the compiler. "C_a_i..." are allocating inline/non-CPS functions. "C_h_..." are alllocation functions that usually allocte on the heap (this can be important in certain situations, for example when building the "literal-frame", the symbol-table used by a compilation unit.=20 The "CHICKEN_..." names are for embeddeding, some sort of high-level API. *** C_u_* -- potentially unsafe versions. E.g. C_i_exactp vs . C_u_i_exactp Both: (exact? 3) => #t (exact? 1.0) => #f C_i_exactp: (exact? 'f) => Error: (exact?) bad argument type: f C_u_i_exactp: (exact? 'f) => #f * functions keyword->string == symbol->string but checks for keyword argument. Same output otherwise. * lolevel To check if a c-pointer argument is working: ; (define buf-to-int (foreign-lambda* integer (((pointer "int") obj)) "return(*obj);")) ; (define v (byte-vector #xDE #xAD #xBE #xEF)) ; (define p (make-locative v 0)) ; (buf-to-int p) => #xdeadbeef * macroexpansion To see a macroexpanded version of your program, create a file called exp.scm containing "(user-pass (lambda (exp) (pp exp)))" and then call csc with -X exp.scm. Better, use `csc -debug 2`. * FFI ** locations [this is on the wiki now at ] Say you have a C function called {{profile}} which allocates and returns a string, expecting you to free it. The main return value is an {{int}}, the string pointer is returned by reference in a {{char**}} argument. You may use {{let-location}} to easily handle the returned string, if you follow the recipe below. ; int profile(const char *in, char **out); (define profile (foreign-lambda int "profile" (const c-string) (pointer c-string))) (let-location ((out c-string*)) (let ((rv (profile "LAMBDA-CALCULUS.ORG" (location out)))) (let ((out out)) ; clean up out (if (eqv? rv 0) (print out) (error "profile error" rv))))) It's fine to use {{let-location}} with a {{c-string}} type. Internally, every time you dereference this type, Chicken calls {{##sys#peek-c-string}}, which creates a new scheme string and copies the C string in with {{strcpy()}}. However, is it safe to use a {{c-string*}} type to get its automatic free() behavior? Yes, if you're careful. Chicken will call {{##sys#peek-and-free-c-string}} every time your location ({{out}}) is dereferenced. This does a {{strcpy()}} ''plus'' a {{free()}}. So, it is not safe to dereference {{out}} more than once during the flow of control, as it will cause a double-free and, worse, a reference to freed memory. This also implies that if you do not refer to {{out}} at all, its memory will leak. The safest way to handle this situation is to bind all location variables in a surrounding let statement, which guarantees the free occurs exactly once. It is safe to dereference {{c-string*}} locations even if they are NULL, as it will just return {{#f}} without calling {{free()}}. Of course, you may instead structure your code so you're sure the location is always dereferenced exactly once, when not NULL. This happens to be true in the simple example above, in fact. On the other hand, it is easy to accidentally violate this requirement, and the special behavior of {{out}} may not be apparent to readers of your code. This technique is also applicable for regular {{c-string}} locations that you use more than once, to avoid redundant strcpy() calls. --[[http://3e8.org/zb|zbigniew]] * egg repository List of all mainline eggs, last updated date, and dependencies: http://www.call-with-current-continuation.org/eggs/repository * Chicken symbol representation (hash function improvement, not in core) CLOSED: [2007-06-15 Fri 00:51] Symbol hash function improvement is http://trac.callcc.org/ticket/245 Curious if symbol table is heavily unbalanced (due to only last 3 chars being significant). If so, would a better hash function help? Or would the hash computation outweigh any gain? Note that compute_symbol_table_load calculates an average length, but can't tell if you have 1 bucket with 1 symbol and another with 29 (average will be 15). symbol_table->name: symbol table namespace symbol_table->size: number of buckets symbol_table->table[size]: a C_word array of lists (buckets), empty on initialization Main symbol table is called "." and contains DEFAULT_SYMBOL_TABLE_SIZE buckets (override with -:tSIZE option), and global variable symbol_table points to it. C_new_symbol_table(name, size): create a new symbol table and add it to the global C linked list of symbol tables, symbol_table_list C_find_symbol("hello", table) // return a scheme object len = length("hello") key = hash_string(len, "hello", table->size) lookup(key, len, "hello", table) C_intern_in(ptr, len, str, table) key = hash_string(len, str, table->size) s = lookup(key, len, str, stable) return s if s s = C_string(ptr, len, str) -- make new C_string from str (backing store ptr) add_symbol(ptr, key, s, table) -- add symbol to hash table, with its string form in a slot C_lookup_symbol(C_word sym) hash_string(len, str, modulus) unsigned int key = 0; while(len--) key = (key << 4) + *(str++) return (int)(key % modulus) (Appears to be a fairly weak hash function. Only the last 8 characters of any symbol can affect the hashed value. The default symbol table size is 2999, so only the low 12 bits (2^12=4096) can be significant, so in actuality only the last 3 characters make a difference.) (Testing, it's the last 8 that are significant. This is because division by a prime number involves all 32 bits, whereas division by a power of 2 simply masks off bits). eternallyconfuzzled.com says "poor hash functions require an extra mixing step of division by a prime to resemble a uniform distribution". (There are various hash-map implementations at http://code.google.com/p/google-sparsehash/) (There is some analysis and recommended hash fxn at http://www.burtleburtle.net/bob/hash/doobs.html) (More detailed analysis at http://bretm.home.comcast.net/hash/) (C hash library at http://uthash.sourceforge.net/ -- also see tpl.sf.net, serialization library by same guy) [bernstein seems to be relatively fast and no modulo required, for us-ascii charset ... need to actually test chicken's current symbol table with hashq -- implement in shared library -- then change the symbol hash function and retest ] [possible showstopper: we can't choose at compile-time between % and &= because the symbol table size is user-controllable. add_symbol(ptr, key, string, table) // individual list elements are considered "buckets" // each bucket is a cons as this is a scheme list; however, their header bits are set to C_BUCKET_TYPE // this function creates a new symbol object, with the string name in slot 1 // then does (set! table[key] (bcons sym table[key])) ; bcons is cons with BUCKET_TYPE header // table-load: csi -> ,r -> ##sys#symbol-table-info -> C_get_symbol_table_info -> compute_symbol_table_load [default table 2999] (use sxml-tools sxml-transforms sxml-tools-extra vector-lib char-set easyffi format-modular fp2scheme silex) Symbol-table load: 1.05 Avg bucket length: 1.94 Total symbols: 3174 [table 4093] Symbol-table load: 0.77 Avg bucket length: 1.8 Total symbols: 3174 [table 2048] Symbol-table load: 1.54 Avg bucket length: 4.8 Total symbols: 3174 --- [darcs 2.621 2999] Symbol-table load: 1.07 Avg bucket length: 1.96 Total symbols: 3222 [4093] Symbol-table load: 0.78 Avg bucket length: 1.83 Total symbols: 3222 [33*hash 2048 startup] Symbol-table load: 0.69 Avg bucket length: 1.41 Total symbols: 1423 [33*hash 2048] Symbol-table load: 1.57 Avg bucket length: 1.96 Total symbols: 3222 [33*hash 2999] Symbol-table load: 1.07 Avg bucket length: 1.64 Total symbols: 3222 [33*hash 4096] Symbol-table load: 0.78 Avg bucket length: 1.45 Total symbols: 3222 [37*hash 2048] Symbol-table load: 1.57 Avg bucket length: 1.98 Total symbols: 3222 [37*hash 2999] Symbol-table load: 1.07 Avg bucket length: 1.63 Total symbols: 3222 [37*hash 4096] Symbol-table load: 0.78 Avg bucket length: 1.46 Total symbols: 3229 [65*hash 2048] Symbol-table load: 1.57 Avg bucket length: 1.97 Total symbols: 3222 [65*hash 2999] Symbol-table load: 1.07 Avg bucket length: 1.65 Total symbols: 3222 [65*hash 4096] Symbol-table load: 0.78 Avg bucket length: 1.46 Total symbols: 3222 ** Check symbol table load in practice See ticket 245, below * fixing ##sys#error-hook problem on intel 5751 fails [bootstrapped with old Chicken: Version 2 Build 3] 5361 ok [from scratch, bootstrapped with same old Chicken 2.3; installed this] 5500 ok [from 5631, ran ./configure and make] 5600 [from 5500, same] 5700 ok [from 5600, same] 5751 ok [this must be a flawed test] 5751 ok [used make clean; ./configure; make] clean checkouts: sh autogen.sh; ./configure CFLAGS="-O2 -fomit-frame-pointer -fno-strict-aliasing"; make BOOTSTRAP_PATH=/usr/local/bin [note that BOOTSTRAP_PATH points to installed chicken r5361] 5700 [from scratch -- ok!] 5751 [from scratch -- ok!!] Can't reproduce the problem! This could be the result of bootstrapping off the wrong compiler. Perhaps 2 steps are required: 1) compile r5361 or thereabouts; 2) compile HEAD. mario's error showed up between r5505 (ok) and r5530 (fail) and he's using what appears to be Chicken 2.613. There is an anecdotal report on 2.631 (Jul 28) failing to work as bootstrap either. Only r5526 (felix's apply-hack fix for OS X) and r5529 (blob changes) fall within the -r5505:5530 range. Both on 2007-08-19. The original blob additions happened on 2007-05-23, in 2.618. One guess is compilers before the initial blob change in 2.618 cannot directly bootstrap compilers after the r5529 change r5361 is 2007-08-09. Installing Chicken 2.600 as bootstrap. r5751 fails! So success. r5528 succeeds. r5529 fails. Clearly it is the blob change. Can't easily see problematic C output from 5528->5529 using Chicken 2.6, so trying Chicken 2.6 5528 vs. Chicken 2.636 5529. Ideally, I would find exactly which compiler breaks, but we don't have version history anymore. Thu Jun 7 08:08:56 CEST 2007 felix@call-with-current-continuation.org * - renamed ":optional" to "optional" (":optional" is deprecated) [this was during 2.621 -- so 2.622 has this change] Fixed by changing recent (renaming of ":optional" -> "optional") to #!optional. * fixing egg segfault issue on 2.709 intel mac We know r5757 works (2.637) -- with autotools build. Built r6060 with new build process (make PLATFORM=macosx) This fails. Try (use hostinfo). Note: test was done with existing hostinfo egg, but according to mailing list this will happen even if we regenerate the egg). Test was done without installing, i.e. "./csi -R hostinfo" Actually, ptables is disabled in this build. There was an error in defaults.make. Re-enabled and retrying. Ptables bug fixed, but doesn't affect this. Try: r5853 (first revision with new build system) Crashes. Try: r5852 (using autotools) ./configure CFLAGS="-Os -fomit-frame-pointer -fno-strict-aliasing" [Matching newest optimization options] make -j2 BOOTSTRAP_PATH=/usr/local/bin This works. Differences in make output 5852 uses '-DPIC', 5853 '-fPIC -DPIC' Note: Chicken now uses /usr/local/lib/libchicken.dylib, but in hostinfo.so, /usr/local/lib/libchicken.0.dylib is embedded, and will be loaded libchicken.0.dylib is a symlink to /usr/local/lib/libchicken.0.0.0.dylib, which is the old library It works to recompile the eggs (with 2.701, at least). Using install_name_tool to modify the library path allows e.g. vector-lib to load, but it craps out with an unbound symbol (signifying an internal error). 5852 has -install_name usr/local/lib/libchicken.0.dylib -compatibility_version 1 -current_version 1.0 Added this compat / current version information to libchicken.dylib itself This works Notes: Unbound variable |crap| will occur if the target library is not found. This is the case if the library is missing (for example, the libchicken.0.dylib link is gone) or the compatibility version is wrong. Segfault will occur if you are pointing at the old chicken. This is the case if libchicken.0.dylib points to the old libchicken.0.0.0.dylib and you are using existing eggs. ---------- Recommendation: In install script, link libchicken.dylib to libchicken.0.dylib so that older eggs will continue to work. Optionally, delete libchicken.0.0.0.dylib as well (not required). ln -sf libchicken.dylib libchicken.0.dylib ----------- Note--libchicken.dylib in uninstalled directory uses the CWD path, not /usr/local/lib. And new eggs are built with the CWD path for libchicken.dylib. So, old eggs (with the absolute path) will crash when tested from the staging directory, as libchicken.dylib will be loaded twice. Indeed if you run install_name_tool on the old eggs and change them to relative paths, they will work. When installed, the makefile changes the path of the binaries to be absolute. However, it seems that eggs are still generated with relative paths, and libchicken.dylib refers to itself (!) with a relative path. * GC Mutation.-- C_mutate pushes |slot| onto the mutation stack without checking if slot is the nursery or in fromspace nor where it is pointing. The mutation stack is only marked during a GC_MINOR. Why is it necessary to track a nursery->fromspace (or nursery->nursery) pointer? Why not just track pointers from fromspace->nursery? This is described on page 12 of Appel's Simple GC paper; mutation is done for all assignments, leaving it to the GC to sort out. And the mutation stack is discarded after a minor GC. Part of the appeal seems to be a simple implementation and relies on fast mutation code (which C_mutate is not, particularly). Nevertheless only tracking pointers from fromspace->nursery appears to be a valid algorithm. * inlining As of 4.2.10 automatic inlining is NOT done on (define byte-string-length string-length) but it is done on (define (byte-string-length x) (string-length x)) * signal handling ** SIGTERM not handled immediately http://3e8.org/blog/2011/08/02/take-off-every-sigterm/ While working on the zguide for Chicken, I encountered a problem with terminating a process in a timely manner when it has a SIGTERM handler installed via the posix unit. (use posix) (on-exit (lambda () (print "exiting"))) (set-signal-handler! signal/term (lambda (s) (print "terminating") (exit))) (read) (print "finished") If you compile and run this program, and kill -TERM it from another window, nothing happens right away. You have to hit enter afterwards, and then it will print "terminating" and "exiting". The reason for this is that the posix unit installs signals with signal(2), which sets the sigaction flag SA_RESTART under the hood. "Slow" system calls (read(), zmq_recv()) which have received no input yet will automatically restart when the signal handler returns. Chicken has a single global signal handler that just sets a flag indicating which signal was received, schedules the interrupt handler to run in a moment, and returns immediately. The syscall is then restarted, preventing the interrupt from being handled. To hack around this behavior you can add (foreign-code "siginterrupt(SIGTERM, 1);") to the top of your file, so slow syscalls will immediately exit with EINTR even when no input is available, and your handler will be invoked. (foreign-code "siginterrupt(SIGTERM, 1);") (use posix) (on-exit (lambda () (print "exiting..."))) (set-signal-handler! signal/term (lambda (s) (print "terminating...") (exit))) (read) (print "finished") If you do not explicitly hook SIGTERM, a TERM signal will interrupt (read) immediately and terminate. This is true on Linux and Mac OS X at least, but it's not clear to me whether this can be relied upon. You can see this behavior as well with SIGINT. By default Ctrl-C terminates a (read) immediately. However, when the posix egg is loaded, it hooks Ctrl-C with signal(2) and so Ctrl-C will not immediately terminate a (read). (use posix) (read) Instead, you have to press ENTER afterward (assuming line buffering is active). To terminate immediately you can do: (use posix) (foreign-code "siginterrupt(SIGINT, 1);") (read) This situation might need to be addressed with an egg that provides more nuanced signal handling than the posix unit. Final note: It's not possible to restore the default signal handler (SIG_DFL) with the posix unit, either. Passing #f to set-signal-handler! will ignore the signal (SIG_IGN). * floating point precision Precision for doubles should be 17 and floats (32-bit doubles) should be 9 to ensure r/w invariance in all cases. (Kahan, Lecture notes on IEEE Standard 754, 1995) By default it is 15 & 7. Use (flonum-print-precision 17) to fix this. ceil(1+N log_10(2)) = ceil(1+24*.301) = 9 # 24-bit mantissa ceil(1+N log_10(2)) = ceil(1+53*.301) = 17 # 53-bit mantissa