HATSUNE MIKU: The First SCIENCE from the Future…



  • Crypton Future Media, the inventors of the MIKU Vocaloid and "Character Voice" series, unveiled for an upcoming exposition in Japan; the science behind the dancing technology for the live performances, "The "real-time 3DCG control system (R3)". Now, even if you could give a **** about MIKU, this is important because the tech is basically the underlying future of motion-capture video games.

    A GOOGLE Translated version of this site is available here. If it doesn't work, simply take the above link and go to translate.google.com . It works preety well, but the YOUTUBE video they have demoing it is region locked outside of Japan.

    Basically, what it does is it takes motion capture data, record it as a "MIDI", and dynamically syncs to music. That is, live music. That doesn't sound impressive at first, but the biggest challenge with live performance is the tempo, as you can see with one of my YOUTUBE videos:

    http://youtu.be/j4bfwci26qQ

    It took 4ever (like 50% of the production time was spend on syncing) for me to get the tempo close to the prerecorded track. And it still doesn't work well… as you can see if you listen closely (especially the last part of the song), whereas the tempo goes up/down within a single measure, or depreciates over time. Now think about, if you were trying to sync to a live concert, whereas a band or symphony changes; because they are not robots like MIKU of course, and not perfect tempo, that would be almost impossible.

    With what they have, the dancing animation and her voice will speed up/slow down based upon "markers" that is pays attention to. So the music is in line with a song, even a live song. Now, how does this affect anything else of "importance", you ask? When you want to talk to your "computer-AI Anime Girlfriend", that's when.

    With the ability to adjust the dynamics of the motion, the animations of a 3D projection (whether on a TV screen or 3D) can be altered. So, if you are talking to your GF, or if she is talking to you, she can respond visually using body language appropriately; as opposed to old-school style, random "Idle" animations have to be programmed with triggers. Now, if she is talking to you, her arms can "move around" based upon the speed of the voice input.

    ...for a moment, think about how Bernie Sanders; when he waves his hand's around when railing about "the 1%". Or Trump, whenever he says "Huuuuuuge", his hands come out. To be able to take a text-to-speech program (like Siri/Cortana), get a 3D "character" projection, and the arm motion to follow, you would have to manually program everything. Now, at least in theory, you can have it do it in real time.

    IN other words, if the AI says "HUUUUUGE" real slow, the arm motion will come out accordingly, If it says it real fast, it will come out fast.

    Now, this is also true if the AI is monitoring your voice (or in theory, even the motion of other MIDI-recorded objects). An AI Girlfriend with this sync ability could "get a smiley face" if it detects you getting excited (like a set-up for a joke), or detect sadness. For instance, instead of just grinning widely for whatever reason, the AI could "smile softly; 'half-smile'" as based upon the voice analysis, and can increase the "smiliness" to full open if it detects a momentum increase in action. In other words, think about how a 16-old girl gets when she talks to other 16-old girls. You think they have like "bi-polar" or something.

    Because the motion data inputs are recorded as a MIDI, in theory; if it is "listening" to an input of the "music of human speech", in can adjust accordingly.

    So there, that's why you should care. Now, IDK if Crypton has "realized" this full potential yet (perhaps I let them know myself at end of AUG), but it is there. For now, the current deployment goes as far as syncing motion data (dancing; VOCALOID-based singing) to music, so if you sped up the music, she would dance faster. But the real-world applications go well beyond just "MIKU MIKU-ness", it is so...



  • Interesting, I thought it was just made by motion capture like they do for video games.

    It's really awesome that it is able to sync with the live playing of the band, though if I'm not mistaken, the drummer could also listen to a metronome (and other cues) through his headphone and be able to play at a constant rythmn which would fit a prerecorded animation.



  • @the_krock:

    It's really awesome that it is able to sync with the live playing of the band, though if I'm not mistaken, the drummer could also listen to a metronome (and other cues) through his headphone and be able to play at a constant rythmn which would fit a prerecorded animation.

    That's easier said then done. As you can tell if you study some pieces closely (like mine above)… 1 beat per minute (so like difference between tempo of 160 and 159) will completely mess up the synchronization. And humans are not simply that good to keep that close, it is so...



  • @thegrandalliance:

    That's easier said then done. As you can tell if you study some pieces closely (like mine above)… 1 beat per minute (so like difference between tempo of 160 and 159) will completely mess up the synchronization. And humans are not simply that good to keep that close, it is so...

    Yeah thinking about it, you're right, it has to be precisely synced or else it can really mess up the song if some movements or lyrics came at the wrong moment.
    As in a full-human band, everyone will adjust their playing to the tempo change, it's logical that "recorded" voice/dance moves have the hability to do the same, considering that it's acheivable with the current technology.



  • @the_krock:

    As in a full-human band, everyone will adjust their playing to the tempo change, it's logical that "recorded" voice/dance moves have the hability to do the same, considering that it's acheivable with the current technology.

    INdeed, as that is the point of the post above… CFM finally is showcasing that tech at some expo in Japan this weekend. Of course, many people might find "it boring"; nevertheless they still use the technology. As far as they care, it goes something like this:

    @HUMANS:


Log in to reply