Monday, January 20, 2020

The Incredible Shrinking Camino!

When creating the 3DS and Switch versions of OutRun, developer M2 added a number of new music tracks. Thankfully, most adhered to the original Music Macro Language (MML) format used by the original game. Once extracted from the 3DS, the music data plays out of the box on original hardware, with minimal modifications to the Z80 program code. Delightful dedication on behalf of the composers. 

Unfortunately, as mentioned in the previous blog post, the file size of this new music is substantially larger than the original music. As a comparison, the new tracks weigh in at around 3-4 times the size.


This isn’t a result of additional length or musical complexity. File size is simply not a concern on modern hardware. However, my dream is to add multiple music tracks to a future release of OutRun Enhanced Edition. This relies on the hope that they could be reduced in size and programmed back to the original arcade PCB, without the need for additional hardware. As such, I’ve spent my evenings studying and optimizing the first OutRun 3DS tune: Camino a Mi Amor.




The original music was composed by Hiro on a Roland MC-500 keyboard and transcribed as sheet music, before being hand translated to MML. This ensured the original MML was well structured and highly optimized. After coding an MML decompiler, I could study the new music and determine why there is a size disparity, and more importantly recompile any optimizations back into the OutRun audio engine.  

It was clear that the new MML data was auto-generated by some kind of tooling. I believe that the new audio was composed in a modern Digital Audio Workstation package (DAW)  and then run through a conversion process for reasons I’ll outline below.

1/ The music is incoherently structured. 
One of the powers of Sega’s Music Macro Language is the ability to use nested loops and subroutines. When used wisely, these radically reduce data duplication and therefore save a lot of space. Music is inherently repetitive, especially when divided into individual channels of audio. When studying Camino, it’s apparent the subroutines have been automatically generated, rather than created by hand. Rather than a subroutine containing a musical pattern that make sense in isolation, subroutines frequently start and end at illogical points from a composition perspective. Many of the subroutines are called just twice whereas you’d expect, especially with repetitive channels containing drum patterns, a much greater degree of reuse. 

M2 were on point for writing tooling to identify repeated sections of data. Theoretically, it’s a smarter, faster approach than attempting to optimize by hand. However, it’s a difficult problem to solve well and the results are only as good as the algorithm. And in this case, the results are mediocre. Aside from badly structured subroutines, the tooling created subroutines that are called just once, rendering them pointless. And furthermore they’d overlooked a separate problem further up their toolchain...

2/ The music does not adhere to the inherent timings of the audio engine. 
As Cmonkey explained in his documentation: the overall tempo of the tune is controlled by timer A on the Yamaha 2151 sound chip. This timer is loaded with a value of 524 during initialisation of the audio engine.  The calculation used for the timer A period (in ms) is:

   tA = 64 * (1024 - tAvalue) / input clock (KHz)

The sound chip has an input clock of 4 MHz (4000 KHz).  So this means the timer A period is calculated as:

   64 * (1024 - 524) / 4000 = 8ms

So, to play a note for 1 second, you'd pass a value of 125 as the duration (125 * 8ms = 1000ms = 1s).

Now, where Camino a Mi Amor falls foul of this system is that its core timing is not divisible by units of 8ms. As such the music is quantized to fit the audio engine’s timing, ensuring the notes align to 8ms boundaries. Let’s look at a typical sequence of notes to clarify this point. This series of commands simply plays the note D at octave 4 a number of times with a few rests thrown in for good measure. 


Command
Time V1
Time V2
D4
57
58
D4
28
27
D4
14
14
D4
14
14
REST
15
15
D4
14
14
REST
14
14
D4
14
15
D4
29
28



Length
199
199


So far so good. However, the second time this sequence is exported from the DAW to MML, there are subtle timing differences:

  1. The first version of this sequence plays D4 for a length of 57, which is 456ms (57 * 8).
  2. However, the second version of this sequence plays D4 for a length of 58, which is 464ms (58 * 8).
  3. This disparity is offset by the second use of D4 in the sequence, where the timing is inverted. 
Both versions of the sequence last the same total duration, but the notes are aligned differently. Imagine the original composer setting a chosen tempo in his audio software. When the exporter reached the second version of the sequence, it was quantized to the closest possible duration, in order to work with the default audio engine timing. The second time round, the quantization was applied to different notes in the sequence. The difference is inaudible to the ear and, in fact, an artefact of M2’s tooling, rather than a deliberate artistic choice. The timing differences also affect the drum patterns, which you would expect to be rigid, rather than variable. It should also be noted that the timing differences are only ever +/- 1. Any additional difference would be an artistic choice. To compound the problem, the tooling inserts additional ‘REST 1’ commands in various sequences to compensate for the timing differences, which wastes further space.

For file size, this is a critical problem. Each version of the sequence is now treated as a separate block of data, rather than a shared subroutine. It’s effectively different from a data perspective, despite sounding identical. This is part of the reason the previously described subroutine automation is a failure. It is fed imperfect data to process and cannot identify sections of the audio that should be identical. Whilst my example shows just two versions of the same sequence, in reality there are often many more. This is incredibly wasteful, as well as making the resultant MML unwieldy.

We’re faced with bulky MML, littered with illogical subroutines, that needs a major restructure to wrestle it into shape. I tackled the problem by listening to each channel of audio in isolation and capturing it to a waveform. This helped build a mental and visual image of the structure of the music channel. The next part of the process wasn’t an exact science, but I started visually identifying chunks of MML data that looked similar, unrolling subroutines where necessary. I built a Google Sheet that would help me do this.


I could simply copy and paste two giant blocks of MML data that I suspected were identical into the sheet. The sheet formulae would verify the list of commands were in fact identical, verify the timing of each command didn’t differ by more than +/-1 timing unit and finally sanity check that the overall timing of the block was identical. Once happy, I could return to the MML, remove the obfuscated original data and subroutines and move them to a shiny clean subroutine. 

Effectively, I was consolidating all of the data variations back into a coherent section of music. Some of these could be reused multiple times, which was a huge optimization. If you think back to the previous example with two versions of the same sequence, there is no reason both of these sequences shouldn’t be identical, as long as the overall timing of the block is the same. The upside to this, is that we’re returning to the vintage 1986 Hiro approach of hand-crafting MML. We can make genuine use of powerful loops and smartly organised subroutines. 

I’ve made this process sound a breeze, but in reality it was time consuming and error prone.  With over 10,000 lines of MML data to work with for the first track alone, one small error could throw the timing of the entire tune, especially if the error was contained in a loop that was iterated over multiple times. I found I could manage 2 to 3 hours of this work at a time, before needing to call it quits. Despite that, it is an incredibly fun puzzle to crack. My score was the byte count. Every time I hit recompile, my savings were output to the console. The lower the score the better I felt. Sometimes I needed to increase the overall size in the short-term as a strategy to reduce it considerably in the long-term. I’m unsure whether this process could be automated to produce MML that was as clean. Maybe it could and I just don’t want to admit it. Certainly, it would be easier to improve M2’s tooling to create better MML in the first place, if I had access to it. 

3/ Cross channel optimizations are missed. 
Wait, we’re not done yet! There were other easy trends to spot. For example, FM channel 0 and FM channel 1 shared a bunch of note data. I suspect the conversion tooling did not work across channels. It was trivial to move this into its own subroutine. 

4/ The REST command is everywhere! 
For FM channels, the REST command largely serves a purpose. It’s akin to depressing the note you’re playing. However, for percussion channels it serves less of a purpose. Consider the following sequence of commands:

    KICK_DRUM 42
    REST 10
    KICK_DRUM 14

This translates to: 
  • Play the kick drum sample. Wait 336ms (42*8).
  • Rest for 80ms (10*8).
  • Play the next kick drum in the sequence. 

The technicality to note is that once a sample is initiated, it can only be interrupted by another sample. The REST command adds little value, beyond inserting a delay before the next command. Therefore, the above block can be optimized to:

    KICK_DRUM 52
    KICK_DRUM 14

We’ve shaved 2 bytes from this 6 byte sequence! This might not sound much in isolation, but when the command is littered across all percussion channels, you can claw back a considerable number of bytes and save Z80 cycles in the process.

Generally speaking, for FM channels the REST command should be left well alone. Removing REST commands would change the way notes sustain and decay. However, there are exceptions to this rule. Earlier, I mentioned M2’s tooling had inserted ‘REST 1’ commands in the FM channels to compensate for audio timing differences. This was particularly obvious when comparing two identical blocks. One might look like this:

F4 14
  REST 1
  F4 28
  REST 14

The second might look like this:

      F4       15
      F4       28
      REST     14

The first block adds an 8ms rest between the two F4 notes. The second block adds the delay to the time the note is played for and does away with the REST command entirely. Therefore, both versions could be condensed into the succinct second version. This optimization might appear to be a leap of faith, but it does become apparent as artefact removal when analyzing 10,000 lines of MML by hand! I studied both variations in a wave editor and found no visual difference, let alone an audible one. In reality the block would be longer than the example provided of course.  

5/ Patches and Erroneous commands.

My MML decompiler performs other handy analysis. For example, it denotes which FM patches (or FM sounds if you like) are in use by the track. The unused patches can quickly be removed by hand. Furthermore, the MML contained junk data. For example you’d see command sequences as follows:

  LOAD_PATCH 6
  REST 10
  LOAD_PATCH 16
  C4 10

Clearly the initial LOAD_PATCH command is trumped by the second and can be removed. There's no need to load the first sound patch, as no notes are played! There were other examples of redundant commands that also provided a small but welcome saving. 


In Conclusion

In total, the above methods sliced a whopping 10k from the original track - a saving of 46% - with hopefully no loss of musical integrity! But this hard work is only just the beginning, and soon I'll need to tackle the next track - Cruising Line.




Parting Words

I’m frequently asked if a feature or idea is possible. Can something be done? Couldn’t you just…? And the answer is often theoretically yes. Yes, if you’re prepared to pour time and energy into seeing a hair-brained scheme through to fruition. Of course, you could, and maybe should, view this entire process as complete madness. All this effort to trim mere bytes from a binary file: reversing the MML format, cmonkey’s robust tooling to create and compile MML files, the decompiler to reverse 3DS binaries back into an editable format, countless evenings spent manually manipulating data with a hodgepodge of makeshift tools. And we’re nowhere near done yet, but let’s keep going, because no one else will!

Thursday, November 28, 2019

OutRun's Music Macro Language & Tooling

My recent post, which covered Sega's audio creation process in the 1980s, inspired fellow coder cmonkey to create a nifty tool to facilitate composing new music for OutRun.

The music in OutRun is, at the most basic level, a simple stream of MIDI note values and durations. The tunes were originally transcribed from sheet music into MML (Macro Music Language). Presumably, after transcription, Sega had tooling to compile the MML data into the format used by the game's audio engine. Cmonkey's tool, SiMMpLified, essentially recreates this tooling.

The format used by OutRun is compact and efficient. Tunes are comprised of up-to 8 FM tracks, played by the Yamaha YM2151 sound chip and up-to 5 PCM tracks, played by the custom Sega 315-5218 PCM sample playback chip. Technically, it's possible to have a tune that's purely 16 channels of PCM samples, but practically sample space is limited. In addition, some PCM channels are reserved for the game's sound effects including engine revs, tire screeches and voice samples. In some respects, the format can be considered a more advanced version of the MOD format, popularised by the Amiga home computers.

The power of Sega's MML format is a result of a relatively advanced looping and nested sub-routine structure. This enables the data for a tune like Passing Breeze to be squeezed into under 5K of EPROM space, excluding the percussion samples! If you're interested in the technical structure and command format of MML, it's worth reading the SiMMpLified documentation.

The SiMMpLified tooling inspired me to consider what could be achieved with OutRun's audio in the forthcoming Enhanced Edition, currently slated for release in 2020. Ideally, I'd love to include additional music, as per AfterBurner: Enhanced Edition. There are already candidates for the tracks. M2's 3DS version of OutRun introduced two new tunes, Camino and Crusing Line. The later Switch version introduced Step on Beat, which first premiered on the Megadrive and Hiro's Radiation. The rest of the music in the Switch version is streamed audio and doesn't use MML format.


However, whilst extracting the new music reveals it to be in the correct format, it's bulky in terms of file-size. Each new song is as large as all four of the original OutRun compositions combined! There was an impressive project to get the music running on original hardware, but it required a custom manufactured PCB. This allowed the Z80 audio processor to access the additional address space.


I'd like to achieve this without the need for custom hardware. The existing OutRun audio EPROM is 32K. The maximum size supported by the PCB is 64K, with a jumper swap. However, part of the 64K memory map is mapped to RAM. Therefore, there's only 60K of accessible EPROM space. Any new music would need to fit into ~28K of free space.

At this point I realised the new music would require analysis and optimization. Reversing the music by hand would be error-prone and time consuming. Instead, I coded a tool that would read a binary and output an MML file, compatible with the SiMMpLified tooling. The end result is that we can instantly produce a formatted MML file containing the notes and commands for any OutRun tune. This file can be recompiled with a Z80 assembler and inserted back into the audio EPROM.

For the sake of completeness, you can think of the flow through these tools as follows, although clearly this example is an extremity:

OutRun Z80 EPROM -> MML Reassembler -> ASM file + siMMpLified Libs -> Z80 Assembler -> MML Injector -> OutRun Z80 EPROM

The MML Reassembler suggests that the original music is hand-crafted, whereas the new tracks appear to have been produced using some kind of tooling developed by M2. M2 did not need to be concerned about file size, and as such the output is bulky and contains repeated data.

I started work hand-optimizing Camino last night and shaved 2.5K off the filesize within a few hours (or closer to 9K if you take into account the unused data which my MML Reassembler automatically strips out). I don't know whether this project will be possible, but it's certainly worth a serious attempt!

Why not write a tool to optimize the tracks automatically? At this stage, I don't think it would achieve the results of a manual analysis where every single byte counts. Why not ZLIB compress the tunes in EPROM? The Z80 only has access to 2K of Work RAM, much of which is already in use by the program code.

Bear in mind that this is a work in process, and I'm unsure whether this will be of use to anyone right now. Nevertheless, the tooling can be found here.

Tuesday, October 01, 2019

Raiding Hiro's Sega Audio Archives

In recent years, Sega consolidated its scattered Tokyo offices to a centralised building in Ousaki. One office that closed at the start of 2019 was the Otorii office. This office had been the key development hub since September 1985, shortly after Hang-On development concluded. As such, intriguing development materials were unearthed in the process.

Sega Otorii Office Building

Hiro posted a vast collection of audio planning documents and media from 1985 through to the early 1990s. When Hiro joined Sega in 1984, music was composed on hand-written sheets.


Space Harrier Sheet Music. Note that game was called 'Heli' at this stage, as the fantasy theme had not yet been adopted.

Synthesizers used to compose the melodies included the Yamaha DX-7 for Space Harrier and the Yamaha PSR-70 for AfterBurner. The sheet music was manually transcribed into Macro Music Language for the audio engine to process. This took the format of the note, followed by its length (e.g. C-4, L4, A#4, L8). Once assembled, the actual audio could be tested on hardware. Needless to say this would have been a time consuming process.

Hiro's ROM cartridges for the Yamaha DX-7 Synthesizer

For the arcade games of this era, sound comprised lo-fi 8-bit samples used for voices, drums and sound effects. The samples were paired with a YM sound source (typically a Yamaha YM2203 or YM2151 chip), mostly used for melodies. YM sounds could be created and edited with an audio editor, which ensured each game had its own style, despite sharing audio hardware. For example, AfterBurner's 'Final Take Off' uses the YM to drive the melodies, but the overlaid guitars are in fact samples.

Data was saved to 8 inch floppy disks.

The following labels read 'OutRun 2' in reference to the revision of the music, as opposed to being anything to do with OutRun's sequel. I wonder if the earlier revisions of the OutRun music still exist?



OutRun's 'Passing Breeze' was initially named 'Passing Wind', until someone pointed out the flatulence reference. So the disk below must date from the end of the development process!


AfterBurner was aptly referred to as 'Top Gun' during development, and the final audio program code appears to have been named TG.HEX. Some of the disks are dated 9th September 1987. 





It appears that data was transferred over the years between 8 inch floppy, through to 5.25 and 3.5 inch floppy for preservation. 




Here we can see the list of commands Power Drift's main 68k program code needed to send to the Z80 Sub program in order to trigger the relevant sound. 





The final Power Drift master. Later on Digital Audio Tapes (DAT) replaced reel-to-reel recordings. 




Friday, July 12, 2019

AfterBurner City Cabinet

This popped up on Yahoo auctions and I thought it was worth preserving here. In Japan, Sega appear to have released an official conversion kit to turn a generic Sega City cabinet into AfterBurner.

City cabinets are relatively small (580 x 715 x 1000 mm) and at only 60kg, less than half the weight of a normal upright AfterBurner.








Sunday, February 10, 2019

Sega Game Cards

Western arcade gamers were accustomed to overflowing pockets of loose change, but Japanese arcade centres had a more elegant solution: Game Cards. These were magnetic cards, pre-loaded with credits and read by a card reader attached to the arcade cabinet. As the card was used, the reader punched holes to denote the number of credits used.


Two types of card were common: 500 cards provided 12 credits and 1000 cards 24 credits. Most games were set to 2 credits per play, although this was variable. For ¥500 you therefore gained credits and a collectable card to keep.

Cards were branded by game, but could be used with any compatible machine. It was common for game centres to add their personal branding to the cards and many variants exist. The system was reportedly not successful in the long-term (source: Sega Arcade History).

The Sega cards were numbered as follows. I'll complete missing entries as I find out more information.

1 SPACE HARRIER (Number not shown on card)
2 FANTASY ZONE (Number not shown on card)
3 OUTRUN
4 ALEX KIDD
5 SUPER HANG-ON (1000 version)
6 DUNK SHOT
SUPER HANG-ON (500 version)
8 AFTERBURNER
9 SUPER LEAGUE
10 HEAVYWEIGHT CHAMP
11 THUNDER BLADE
12 HOT ROD
13 GALAXY FORCE II
14 POWER DRIFT
15 UNKNOWN

16 UNKNOWN

17 TURBO OUTRUN

18 SUPER MONACO GP

19 G-LOC (1000 Version)

20 G-LOC (500 Version)

21 R-360

Cards were also available exclusively at the AM and AOU trade shows from Sega booths. Some example follow.

24th AM SHOW (OUTRUN & HOTROD)


25TH AM SHOW (HEAVYWEIGHT CHAMP & THUNDER BLADE)

87 AOU SHOW (SUPER HANG-ON, then HANG-ON II)

88 HAPPY NEW HARRIER
SEGA ATTRACTIONS

OTHER GAMES



Special thanks to Sean Tagg for helping me with images and information for this post. Don't let this man spend any more money on game cards. Or at least donate him some for free!