Date: Sat, 30 Mar 2002 18:20:06 -0500 (EST) From: Stan Brown Subject: Re: PCRE question: word boundaries and word characters >On Sat, 23 Mar 2002, Stan Brown wrote: >> Is there any way to alter the meaning of a "word" character, >Yes, but only provided you are prepared to read some of the code and >understand what's going on. >1. Read the documentation about the pcre_maketables() function. ... >4. Write code to modify the tables that it creates, before passing them to >pcre_compile(). >If you do it that way, you can do different things in different >circumstances, and you have not modified the code of PCRE itself, so you >don't have to maintain a patch for different releases. That's the approach I have taken. I append the code. It's not terribly exciting, but reusing it might be marginally easier than writing fresh code. :-) Plerase feel free to include it, omit it, or include an altered version with your next PCRE or in any other way you see fit. -- Regards, Stan Brown, Oak Road Systems, Cortland County, NY, USA http://oakroadsystems.com mailto:stan@oakroadsystems.com Source file worddefine.c follows: /******************************************************************************* pcre_worddefine( ): redefine "word" characters for use with PCRE usage: pcre_worddefine(tables, charblock) tables is the return from pcre_maketables( ). Caution: that is a pointer to const, so you will have to cast it to call pcre_worddefine spec is an unsigned char array (effectively 256 bits) with each bit set, or not, according to whether the corresponding character is a "word" character Example: unsigned char block[256/8]; memset(block, 0, sizeof block); for (i=0; i<256; ++i) if (isgraaph(i)) block[ i/8 ] |= 1 << (i&7); pcre_worddefine(tables, block); 2002-03-26 new program; author: Stan Brown, Oak Road Systems Copyright 2002 Stan Brown, Oak Road Systems http://oakroadsystems.com PCRE is a library of functions to support regular expressions whose syntax and semantics are as close as possible to those of the Perl 5 language. It was written by Philip Hazel . Permission is granted to anyone to use this software for any purpose on any computer system, and to redistribute it freely, subject to the following restrictions: 1. This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2. The origin of this software must not be misrepresented, either by explicit claim or by omission. 3. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software. 4. If PCRE is embedded in any software that is released under the GNU General Purpose Licence (GPL), then the terms of that licence shall supersede any condition above with which it is incompatible. *******************************************************************************/ #include "internal.h" void pcre_worddefine( unsigned char *tables, const unsigned char *charblock) { int i; unsigned char *p; /* 1. Copy 'charblock' to the table of "word" characters. */ memcpy(tables+cbits_offset+cbit_word, charblock, 256/8); /* 2. Update "word"-character bits the character type table. */ p = tables + ctypes_offset; for (i = 0; i < 256; i++) { if ( charblock[i/8] & (1 << (i&7)) ) *p++ |= ctype_word; else *p++ &= (~ctype_word); } } /* end of wordtables.c */