@johnraff: well, i am not a native speaker so for words i am not sure about i look up their syllable-count in an online dictionary. that is also what told me 'fire' has only 1 syllable, as i would count 2.
this stuff is open to debate indeed, and i constantly try to just go with the easiest solution. there are real problem-words here though, where the word is pronounced a certain way, while a different word with almost exactly the same letters is pronounced in another way again, making it really hard to write a rule for them ('interesting'... is that 'intresting' or 'interesting'? and if it is the first, how to separate it from something like 'interelectrode'). so that is where i either say 'interesting' will be 4 syllables for my purposes, or the exceptions come in. indeed, not simple at all 
@machinebacon: i have been thinking along these lines as well. something like 'when a certain stem is inside the word-list, separate it'. but when you really go and look into this you'll find lots of situations where a stem is just a stem in one word, but only a partial stem in another. an quick thought-up example would be 'rat' and 'brat'. the latter contains the stem 'rat' but that is actually part of the stem 'brat', so should not be separated (i am aware this example does not use any relevant vowels but it just serves as an example and nothing more). there are myriad examples of this sort of thing.
well, still thinking about how to do this, may give another stab at this tonight. i figured the only way to handle this stuff is by detecting a pattern in composite words. something like 'VC(C)VCV' (V=Vowel, C=Syllable). so 'horsepower' has 'orsepo', 'forehead' has 'orehe', 'facepiece' has 'acepi' etc, which should tell me that the middle 'e' is something special where the word should be split. or something. not 100% sure about this and really no idea of the amount of exceptions to this thing, but it is all i have come up with so far.
bonus fun: i thought i was so smart separating the 'ing' suffix from words ('ageing' > 'age-ing', so 'age' can be handled separately). then along came the word 'evening', getting split up to 'even-ing', and the silent 'e' from 'eve' not being removed because it wasn't detected by my rules, because of the 'n' at the end of it. damnit. another exception. 
Last edited by rhowaldt (2011-12-10 19:14:38)