Re: detect syllables (english)

rhowaldt wrote:

i removed some over-complicated ideas i implemented when i last worked on the script, because they were over-complicated big_smile

BLOAT? big_smile

rhowaldt wrote:

so what is on the agenda now is to figure this problem out. how to determine where to split a word? i think this might prove pretty difficult.

Yes, though there is a set of pre- and suffixes which are (quite) regular, for example: in-, en-, re-, im-, ex-, un-, pre-, pro- (funnily these are mostly - when used as verbs - regular verbs), and some are two-syllable (super-, hyper, hypo-, endo-, extra-, ...)

Probably a kind of database would cover quite a lot of "loanwords" (I mean those coming from Latin/Greek)

This is the counterpart to your suffix-idea smile

Nothing right in the left brain. Nothing left in the right brain.

Re: detect syllables (english)

^ that is true, and i might eventually go with at least some type of database-method. however, there is a problem with a suffix such as 're', for example. when you separate it, the script will make something like 're-enactment', and all is fine and dandy, because that would translate into 4 syllables. however, feed the world 'real' to the script, and it will make it 're-al', and count 2 syllables, which is WRONG!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! big_smile

you see now a little bit of what i am struggling with. i am still wondering why i ever started this, but on the other hand still convinced i can make this work. i will probably have fame and glory after completing this script, and the whole world will cheer and give me the Nobel Prize for something and people will finally call me a genius, as they should've been doing all along. smile

Re: detect syllables (english)

rhowaldt wrote:

feed the world 'real' to the script, and it will make it 're-al', and count 2 syllables, which is WRONG!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! big_smile

That is actually open to debate. Some people would pronounce "real" with two syllables, along with "iron", "fire" and many other words. This really isn't simple at all...

John
------------------------
( a boring Japan blog , and idle twitterings )
“Good morning sir, which way up would you like your reality today?”  "As it comes, Jeeves, as it comes..."

Re: detect syllables (english)

rhowaldt wrote:

're-al', and count 2 syllables, which is WRONG!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! big_smile)

That's sad, but re-al-i-ty smile You could have taken the 'reel', too. But, if 're-' is a prefix, there is still a stem, for example: re-act, there's act, and the pySyllable script would know: act is an existing word (a stem) so it can add re- to it, while 'al' like in real is not a stem, so it cannot add re- big_smile

Ok, I understand the problem. Hm hm....

Nothing right in the left brain. Nothing left in the right brain.

Re: detect syllables (english)

@johnraff: well, i am not a native speaker so for words i am not sure about i look up their syllable-count in an online dictionary. that is also what told me 'fire' has only 1 syllable, as i would count 2.
this stuff is open to debate indeed, and i constantly try to just go with the easiest solution. there are real problem-words here though, where the word is pronounced a certain way, while a different word with almost exactly the same letters is pronounced in another way again, making it really hard to write a rule for them ('interesting'... is that 'intresting' or 'interesting'? and if it is the first, how to separate it from something like 'interelectrode'). so that is where i either say 'interesting' will be 4 syllables for my purposes, or the exceptions come in. indeed, not simple at all smile

@machinebacon: i have been thinking along these lines as well. something like 'when a certain stem is inside the word-list, separate it'. but when you really go and look into this you'll find lots of situations where a stem is just a stem in one word, but only a partial stem in another. an quick thought-up example would be 'rat' and 'brat'. the latter contains the stem 'rat' but that is actually part of the stem 'brat', so should not be separated (i am aware this example does not use any relevant vowels but it just serves as an example and nothing more). there are myriad examples of this sort of thing.

well, still thinking about how to do this, may give another stab at this tonight. i figured the only way to handle this stuff is by detecting a pattern in composite words. something like 'VC(C)VCV' (V=Vowel, C=Syllable). so 'horsepower' has 'orsepo', 'forehead' has 'orehe', 'facepiece' has 'acepi' etc, which should tell me that the middle 'e' is something special where the word should be split. or something. not 100% sure about this and really no idea of the amount of exceptions to this thing, but it is all i have come up with so far.

bonus fun: i thought i was so smart separating the 'ing' suffix from words ('ageing' > 'age-ing', so 'age' can be handled separately). then along came the word 'evening', getting split up to 'even-ing', and the silent 'e' from 'eve' not being removed because it wasn't detected by my rules, because of the 'n' at the end of it. damnit. another exception. smile

Last edited by rhowaldt (2011-12-10 19:14:38)

Re: detect syllables (english)

^ the b in brat is not a prefix wink

Nothing right in the left brain. Nothing left in the right brain.

Re: detect syllables (english)

even-ing is also a word btw: the process of making something even.
I suppose that would have 3 syllables where evening =early night would have 2? roll

rhowaldt, I'm beginning to think it might be an impossible task, though of course there are web sites that do it, more-or-less.

John
------------------------
( a boring Japan blog , and idle twitterings )
“Good morning sir, which way up would you like your reality today?”  "As it comes, Jeeves, as it comes..."

Re: detect syllables (english)

^ wow, you are right! hadn't even thought of that ambiguity with the word 'evening'. i do think this is an exceptional case though. i'm sure there aren't that many words like that.

about this being an impossible undertaking: it might be. but i think i'll just end up with a pretty okay script, and a list of exceptions. we'll see. i'm not giving up just yet smile

Re: detect syllables (english)

Hi rhowaldt! 

I've just stumbled upon this awe-some project you are/have put(ting) yourself through.  Where are you with this?  As an SLP-in-training, and a veteran English Literature / Linguistics major (graduate) I am overly-intrigued.  In all your "rule-making" I'm sure you searched several sites like this: http://www.brendenisteaching.com/gen/wordlist/. But if not, it is useful for word patterns, endings, and much more.  I've used it for several different and not-so-obvious purposes in language disorder therapy. 

Anyway, again, what is the status of the script?  Is the first post the most functional version?  I'd like to put this to use on my machine and toy around with it a bit.  Looks very cool. 
Thanks!

**AKA tiresias on IRC**      shantih   shantih   shantih     
      Contribute however you can!