Java, Speech, and inspiration…

I’ve been getting poked and prodded to keep this site somewhat updated, so I thought I’d share my experience over Christmas and New Years…and prove beyond a doubt that I truly need to get out more.

First the Java part of things.  This was my very first programming language–even before HTML, oddly enough.  I needed to turn an EE class into an honors credit (back when I was still in the Schreyer’s Honors College, mind you) and the professor agreed on the condition that I worked within a group to write a Java applet.  Somehow or other, I ended up doing the coding (despite the fact that my other two cohorts were Computer Science majors or Computer Engineers at that current time.)  Mike Mansell rewrote my code into something vaguely resembling humorous English and Emrys Smith ended up doing the PowerPoint.  Care to see it?

http://www.personal.psu.edu/ked2/CSE/Conversion.html

Pretty boring, right?  As I said, that was my first program and it taught me quite a bit.  But that was the last time I used Java…

…until just a few months ago.  I got it into my head that I wanted to do something involving Speech Control.  And I wanted something that I could toss onto any system from XP to Ubuntu (to be fair those are the only two OSes I use on a regular basis.)  Also, I didn’t want to have to upgrade to Vista just to get a taste of SAPI 5.3.  I was well aware of CMU’s Sphinx and its numerous variants, so I started out with that.  To put it mildly, it isn’t really for a novice in Java programming.  To put it realistically, I spent a week trying to get a GUI to work with it before abandoning it in favour of a third party interface that utilized the Sphinx speech recognition engine.  Then I tried both Julius and Simon, but they just weren’t what I was looking for.  Then I found Voce.  Using the sample code, I was able get a custom GUI up and running in about ten days with it understanding a dozen words clearly. In fact, the moment of my first breakthrough happened at about 1am on New Years Day.  Note to self, try to get out of house next year.  Then within three days, I had it outputting through a serial port for real world applications.

This is where things could get interesting–or fail disastrously.  My original notion was to create a speech controlled robotic arm with a camera end effector to document my projects.  “Left, Down, Cheese, *Click*.  You get the picture.  But as it turns out, robot arms don’t grow on trees and good robot arms are even more difficult to come by.  And I’ve got to say, the whole “Lights on, Lights off” bit gets rather boring if not outright cumbersome.  “Sarah, beer (coffee) me” (a la Eureka) has its appeal, but even that would get a touch dull.  Without a clear idea of what to do now that I’ve gotten so far along on the project, I’ve just been waiting for these past three months for a moment of inspiration.  But it appears that my muse is off on hiatus.  So I’m reaching out to anyone reading this entry to toss up your suggestions in the comments section.  It’s got to have some form of physical payoff, opening programs on the computer is just too easy. And it should be rather nifty or why do it at all?

Related links:

http://cmusphinx.sourceforge.net/html/cmusphinx.php

http://sourceforge.net/projects/speech2text/

http://julius.sourceforge.jp/en_index.php

http://voce.sourceforge.net/

Advertisement
This entry was posted in Projects. Bookmark the permalink.

7 Responses to Java, Speech, and inspiration…

  1. Pingback: Jacqueli

  2. A.M. says:

    Shouting out commands does get boring after awhile and in turn can make us a lazy society. At least the technological do-gooder is not human. Maybe a plus in eliminating human suggestion?? Can it be a possibility that speech recognition transcription be used in realtime regarding electronic voice phenomena? I see possibilities of a tool in controlled experiments or even aiding in analysis of data.

    • lightdesigns says:

      Eventually I could see utilizing speech recognition for just such an application–especially in regards to alleviating the human propensity towards suggestion and pareidolia. Currently the best (ASR) of the lot is Dragon Naturally Speaking; however, at its present level of recognition, I worry about the likelihood it would introduce false positives. I am hopeful that the technology will improve to the point that this is possible within my lifetime.

      • A.M. says:

        Has a “true” speech audio anomaly ever been analyzed using speech recognition software? I’m curious to see what the accuracy is at in comparison to human analysis. If a device like the ovilus can be conjured up with relatable programmed words surely a recording device that incorporates speech recognition can’t be that far off. If it records and translates text, it can, at the very least be used to compare data. Thus, cutting down the possibility of pareidolia. Audio anomalies of bangs and scuffs are one thing, but can’t speech anomalies be confirmed through frequency analysis and selective frequency filtering? The potentials for false positives will never go away and even apparent when filtering. At least with the frequency analysis, speech can be confirmed. As long as the movements and audio of researchers are documented, why can’t it work?

        Pfft… J.L. is highly capable of conjuring up such technology!

        • lightdesigns says:

          To my knowledge, no it hasn’t. But this is as a result of the level of accuracy more than the result of the anomaly. To compound that, most speech recognition needs to be trained specifically to the speaker in question, for the inflections and intonation to be spot on. While we explored the Ovilus with an open mind, we ultimately concluded that it was not only inefficacious, it proved a hindrance the investigation. It’s rather difficult to capture an EVP over an electronic voice. Moving over to the frequency analysis, that’s effectively what is occurring during speech recognition and Dragon, Sphinx and the like has more collective experience than I could ever hope to achieve! And I’ve seen all too often what-over filtering does–suddenly anything can become a voice. I’d much rather under process and get nothing than over process and find everything but the kitchen sink in every known language.

          haha, hardly! I’m but a lowly EE with a bit of programming under my belt!

          • A.M. says:

            I’m sorry to be so vague here, but “any” anomaly identified as an EVP is essentially a false positive. Yes? When will the level of accuracy ever be enough? There are times when it is ok to search in the box instead of looking outside the box. Booyah!

            HA! Hardly?! Light isn’t your last name for nothing now. The writing flowed well btw, so don’t forget my chai.

            • lightdesigns says:

              Hmmmm…perhaps. I was more referring to something that is decidedly not a word or phrase yet contains a speech-like cadence. I’d say 99% accurate with an untrained voice via a portable digital recorder. That should certainly suffice. It’s certainly easier this way!

              Hey, thanks. And we’ll see what Hood River has to offer in terms of chai. All I know is I see Powell’s in my future…and mayhap a trip to Voodoo Donuts. Oh yes.

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s