TECHNICAL DOCUMENTATION
How SHE works.
A technical overview of the engine architecture, input parameters, output formats, and generation pipeline. For teams evaluating SHE for integration.
HOW IT WORKS
Four layers, from physics to export.
SHE is not a neural network. It is a rule-based generation engine with four distinct layers of responsibility. Each layer has a single job and a clear boundary.
Core — Music Theory Primitives
The mathematical foundation. Intervals, timing grids, harmony math, and meter models that remain true regardless of genre or style. This is the universal physics of music.
Library — Validated Catalogs
Structured collections of chord progressions, rhythmic patterns, genre behaviors, mood profiles, and instrument definitions. Every entry is schema-validated and indexed. Labels like "Soul" or "Uplifting" map to specific, tested generation behaviors — not prompt interpretations.
Services — The Composers
Where musical decisions are made. Services choose progressions based on mood, generate basslines from harmonic context, plan section structure, and shape arrangement energy. Five domains: composition, selection, arrangement, rendering, and analysis.
Execution — Orchestration & Export
The conductor layer. Takes your parameters, routes them through the services, and exports the final package: audio files, MIDI, structured metadata, and machine-readable JSON/XML — all simultaneously.
INPUT PARAMETERS
What you control.
Every generation starts with a set of parameters. Each one maps to specific musical logic in the engine — not a text prompt interpretation.
Genre
Selects from validated genre catalogs that shape rhythm, instrument choice, and harmonic style. Each genre maps to specific generation behaviors — not a text prompt.
House, Jazz, Electronic, Soul, Ambient, Hip-Hop
Mood
Mood profiles influence harmonic color, rhythmic intensity, and arrangement density. Multiple moods can be combined.
Energetic, Melancholic, Uplifting, Dark, Peaceful
Key
Sets the tonal center for the entire piece. All harmonic decisions resolve relative to this key.
C, D, F#, Bb — any of the 12 chromatic pitches
Mode
Determines the scale and harmonic character. Modes shape whether music feels bright, dark, suspended, or resolved.
Ionian (major), Dorian, Mixolydian, Aeolian (minor), Phrygian
Chord Progression
Choose from a validated catalog of progressions, or let the engine select based on genre and mood context. Each progression carries metadata: formula, cadence type, and harmonic signature.
I–V–vi–IV, ii–V–I, vi–IV–I–V
Tempo
Beats per minute. Affects rhythmic grid density, swing behavior, and section energy calculations.
60–200 BPM (default: 120)
Song Form
Defines the structural blueprint: which sections appear, in what order, and how many bars each contains.
Full (intro–verse–chorus–bridge–outro), Loop, Verse-Chorus
Complexity
Controls harmonic and rhythmic density across the piece. Higher complexity increases chord-per-bar rates, syncopation, and voice movement.
Low, Medium, High
Resolution Style
Shapes how phrases and sections resolve harmonically — whether cadences feel final, suspended, or open-ended.
Resolved, Suspended, Open
OUTPUT FORMATS
What you get.
Every generation produces four synchronized outputs. No post-processing required.
- Full mix and individual stems (chords, bass, melody, drums, pad)
- WAV format, 44.1kHz sample rate
- Ready for playback, production, or direct integration
- 480 ticks per beat resolution
- Separate tracks per instrument role
- Every note carries velocity, timing, and duration data
- Compatible with all major music production software
- Key, tempo, mode, and time signature
- Chord progression with per-section harmonic rhythm
- Section markers with bar counts and energy density
- Instrument roles and per-section muting/velocity profiles
- Genre, mood, and style tags — embedded at generation, not inferred
- Complete harmonic and formal description of every piece
- Schema-validated Pydantic models
- Progression signature, cadence type, and formula
- Paired with audio for supervised ML training workflows
- Generation manifest with full parameter provenance
Under the hood
These are some of the musical models that drive generation decisions. They demonstrate the depth of control SHE operates at — far below what a prompt-based system can reach.
Beat Strength Model
SHE uses a hierarchical emphasis model for every time signature. In 4/4, beat 1 carries the most weight, beat 3 is secondary, and off-beat subdivisions carry the least. This model drives how loud notes are played, where rests fall, and which chord tones are chosen on strong versus weak beats. Compound meters like 6/8 and 12/8 have their own models.
Chord Tone Selection
When bass and harmonic roles choose notes within a chord, they follow a weighted palette. The root is chosen most often, followed by the fifth and third. Octaves, chromatic approaches, and stepwise walks toward the next chord are used less frequently. These weights ensure basslines and harmony feel musically grounded rather than random.
Section Energy
Each section type has a baseline energy level that determines how busy it sounds. Intros and outros are sparse. Verses sit at moderate activity. Choruses and drops are dense and full. Bridges pull back for contrast. Builds ramp tension gradually. These baselines shape everything from note density to instrument layering.
Swing & Feel
Rhythmic feel is controlled per generation. Swing ranges from straight (no swing) to heavy shuffle, with off-beat notes delayed according to genre conventions. A jazz output swings naturally. An electronic output locks to the grid. The engine detects which notes sit on off-beats and adjusts their timing accordingly.
GENERATION PIPELINE
From parameters to finished music.
A single generation runs through five stages. Each stage is logged and traceable.
Define
Set genre, mood, key, mode, tempo, form, and complexity.
Select
Engine selects progression, instruments, and rhythmic patterns from validated catalogs.
Compose
Services generate bass, melody, chords, drums, and pads — each role aware of the others.
Arrange
Section structure is planned: energy arcs, transitions, muting, and density per section.
Export
Audio, MIDI, metadata, and JSON/XML are produced simultaneously. Every file is labeled and traceable.
System requirements: SHE runs locally — no GPU required. The engine operates on standard hardware, including low-power devices. No cloud dependency for generation.
API ACCESS
Integrate SHE into your product.
The SHE API accepts a set of musical parameters and returns a complete generation package: audio files, MIDI, structured metadata, and machine-readable JSON/XML — all in a single response.
Request
Send your parameters — genre, mood, key, mode, tempo, form, complexity — and the engine generates a complete musical output.
// Example parameters
genre: "electronic"
mood: "energetic"
key: "C", mode: "dorian"
tempo: 128
form: "full"
Response
Receive a complete package: rendered audio, MIDI tracks, embedded metadata, and a structured manifest describing every musical decision.
// Response includes
audio: [full_mix.wav, stems/...]
midi: [chords.mid, bass.mid, ...]
metadata: {key, tempo, mode, ...}
manifest: {progression, form, ...}
Authentication
API access is granted per-organization. Each partner receives dedicated credentials and rate limits tailored to their use case.
Output Formats
WAV audio (44.1kHz), standard MIDI files, JSON metadata, and XML export. All formats are delivered simultaneously per generation.
Local Deployment
SHE can also run entirely on-premise. No GPU required. Standard hardware, including low-power devices, is sufficient for generation.