Stereo Rendering Environments - An inquiry into the application of virtual acoustics encoders in music post-production, for stereo formatting.
Abstract
Within the field of immersive audio, a unified consensus to adoption and application has been eluded. This research inquiry focuses on a gap of applied technological context, which has brought about an alternative method to rendering panoramic sound fields. Through the implementation of an audio encoding framework, virtual generation of acoustic enhancement encapsulates a proposed post-production approach to immersive audio within the two-channel capacity of standardised & accessible stereo audio.
The presented encoding framework and rendering method allows for multi-channel immersive audio that is stereo compatible. The fusion of ray-casting algorithms, virtual ambisonic convolution recording, and binaural head related transfer function processing culminate into the generation of virtual acoustic space rendering and recording.
Through a methodology of iterative action research, findings have indicated that the proposed approach is potentially both practically & creatively significant. Outputs demonstrate the variety to application methods, which have lent to an exploratory experimentation whilst echoing workflow of analogue processing. The practice-based approach has been conducted within an interpretivist paradigm, oriented towards practising engineers in the field who may find insight in this alternative method to immersive audio processing.
An overarching tenet of liminal conjunction has guided this development of LR stereo fusion. With the methods of implementation being applied to the post-production of an EP, produced through the included VA Studio.
Additionally to this document, further documentation is presented in these appendix items which have been produced as key outputs of the research inquiry :
Appendix L - Primary Action Research Iteration Sessions & Demonstrations
Within this appendix is the iterative session documentation that has been produced alongside the primary practice-based research inquiry into practical and creative application of the encoding method itself. Including individual production tracks, numerous practical demonstrations and a supplementary VA IR pack.
Appendix R - Research Notes and Observations Journal
Within this appendix is additional information and notations regarding discussed topics within the major research project’s integral document. These notes have been organised as an accompanying journal to convey additional research-based inquiry. Included is a linked supplementary VA AV portfolio.
1. Introduction
This practice-based research project inquired into the current landscape of surround-sound audio encoding. Aimed at synthesis of a software encoding framework for implementation into audio mixing, development of rendering virtual acoustic space was undertaken through an iterative action research model. Informed by review of background research context and involving creative experimentation, development of the framework and practice was used to document an account of practical utilisation from a participatory practising audio mixing engineer’s perspective.
Rationale for the project stemmed from the stereo issue of surround sound formatted audio, with an alternative stance to industry norms by staking technological inclusion on a compatibility requirement of accessibility for any existing stereo format usage. Focus for solving the 2-channel limitation centred around the fusion of stereo audio signals in the post-production stage. A benefit to this solution of simulating the illusionary displacement of the listener through externalised exocentric placement of sound, is that accessibility is as inclusive as existing stereo playback mediums.
Alongside the encoding framework implementation that has been developed throughout this research, the provided demo material and music productions have been undertaken to demonstrate the creative potential and practical versatility.
1.a Statement of Research Focus
Objectives that guided the project consist of :
- Thorough understanding through review of technical background research context and discourse to inform implementation approach, application method, and configuration development.
- Establishment of an encoding framework for development and application into a post-production audio mixing context. Documented to produce an account of iterative research sessions into its implementations.
- Provide creative and practical applications and outputs, culminating in the post-production of an EP.
The goal of this research has been to investigate whether there is potential for further capacity within PCM stereo to carry in its waveform additionally encoded binaural signal information, which emulates multi-channel surround systems. Aimed at taking implementation further with additional configuration to virtually simulate acoustic characteristics around the listener perspective, to impart an illusionary auditory displacement. The practice-based research has culminated in the output of a documented account of development and implementation in relation to the production of an EP with tracks that contain encoded stems, finally delivered via stereo cassette tape.
2. Literature review & Background Research Context
Within the field of audio production over the last several decades there has been emerging technological advancements in rendering sound which carries encoded psychoacoustic spatial information. This has come about however with a decentralised approach across wider audio culture with regard to utilisation, application and implementation. Although there has been extensive documentation written about spatial attributes in audio production, there is no singular unified approach (Denis Smalley, 2007).
Creative use of spatial concepts can be found in acousmatic music consisting of an aesthetically created ‘environment’ which structures transmodal perceptual contingencies through source-bondings and spectromorphological relations (Denis Smalley, 2007). This approach exemplifies the inherent spatial properties of sound in relation to music and scene, in a way that considers its interconnected form of being in scene as inseparable from the source image itself as well as the musical qualities of timbre, rhythm and melody. Smalley goes on to state sounds in general, and source-bonded sounds in particular, therefore carry their space with them – they are space-bearers.
A creative approach such as this carries significant creative and philosophical weight, and is a clear indicator towards the application of creative use within the conceptualization of spatial sound, taken further within Marije A.J. Baalman’s (2010) framework of spatial composition techniques. This also corresponds to similar artistic philosophies on the integration of space and its element of musicality when considered holistically in compositional context. This is mirrored
both by academic Ulf A. S. Holbrook (2019) who claims that the space in which music is experienced is as much a part of the music as the timbral material itself, and impressionistic composition such as the work of Claude Debussy, who infamously stated “Music is the space between the notes” (Jonathan G. Koomey, 2001).
This viewpoint has a counterpart within the technical side of applications, which uses these same applied principles but within a practical scientific context framed with variations of corresponding terminology. As J. L. Gonzalez-Mora (2006) sets out in their research project documentation on audio processing virtual acoustic space [VAS] with HRTFs, the same source-bonding principles that are used which Smalley applies to creative application of soundscaping natural ambience, (see Appendix R - A:1). By specifying acoustically proximate space from the listeners vantage point, to enhance their listening in replacement of sight that is consistent with reality (Dmitry N. Zotkin et al, 2004).
These more rigorously academic and scientifically oriented approaches of application are prevalent within the field, demonstrating the breadth of technical application along with research output from neuroscience field labs such as from the City University of Hong Kong (2023a), and Chiba Institute of Technology Japan (Spatial Hearing Lab, 2022).
However, having no unified approach across the wider field has led to use of differing terminology, which has caused discrepancies between interpretations in discourse of the field. As a result of this, the dissemination of ungrounded and misconstrued understanding around binaural audio are common in discourse of the topic.
With studies into such areas as binaural beat brain entrainment being deemed inconclusive at best (Ruth Maria Ingendoh, 2023), where results corroborate the impression of an overall inconsistency of empirical outcomes (see Appendix R - A:2). This demonstrates both the misunderstanding of what constitutes the generation and playback of encoded audio, as well as the lack of unified approach to adoption both technically and creatively, but still highlights an academic / cultural interest in the field of binaural audio.
2.a Virtual Rendering Inquiry
Within the context of this project care will be taken to describe concise and specific functions with use of relevant and applicable terminology. As this nuanced topic of virtual acoustic rendering from a technical and computational perspective within the wider field of spatial sound is tighter in its descriptors, I will use this phrasing and terminology to accurately convey these specifics.
The depth of technical background research context is essential for informed understanding of the topic at hand, to prevent applications such as those studied in the field by Ruth Maria Ingendoh (2023). Implementation that demonstrates by ensuring a foundation of robust academic rigour to substantiate informed technical comprehension, is in aim of the prevention of misconstrued interpretations and applications which may otherwise lead to misinformed adoption of unsubstantiated use. To ensure a robust academic approach in discourse around the topic, applied knowledge of all fundamental processes is essential for both practical and theoretical understanding (see Appendix R - B:1 - Figure 1).
For clarity, differentiation from standard virtual rendering models is in regard to the virtual acoustic space [VAS] encoding involved, in comparison to other areas of spatial technologies which otherwise typically only integrate HRTFs with no active reflecting or occluding virtual geometry. The presence of this interactivity is what additionally encodes acoustic information which is simulated as virtual acoustics [VA] within a software development environment such as Unity Engine.
2.a.i Ray-tracing Audio and Virtual Acoustic Space
Ray-tracing makes up the foundation of encoding taking place with regard to the acoustic properties of geometry within a virtual rendering environment. Whereas image sources and HRTF receivers take the place of sound sources and microphones, the additionally generated incoming cues that cast from the image source as reflections in the virtual space are ‘rays' which emulate the propagation of sound (see Appendix R - C:1).
The ray-tracing implementation in use for this project is responsible for two spectral elements of the processed sound, being the reflections from and occlusion of sources. When rendering reflection cues in Steam Audio (Lakulish et al, 2021a), ambisonic IRs are emanated from the listener position to get a spherical soundfield image from the HRTF position to localise incoming reflection cues similarly to recording impulse responses for binaural convolution reverb (see Appendix R - C:2). Where occlusion, singular or volumetric raycasting is utilised to determine if and to what degree sound is obscured by present geometry.
One application example of this being the spatial audio and virtual acoustics research project from the University of York (2023). This project focuses in the developing field of heritage science to explore ways of accurately recreating the sound of no-longer existing structures, in their implementation method of utilisation they state their project includes : Virtual Acoustic Modelling, Auralisation and spatial psychoacoustics with particular focus on virtual room acoustics simulation, spatial encoding of virtual environments, and room acoustics/impulse response measurement and rendering.
To reiterate the differentiation here, these virtual acoustic modelling methods and processes stand aside from solely HRTF / Binaural processing such as by imparting these additional acoustic encodings of virtual environment simulations. Software packages such as Audio Futures (2023) “360 WalkMix Creator” or Applied Psychoacoustics Lab (2023) “Virtuoso” which only utilise HRTF / Binaural processing, lack these additional virtually rendered acoustic signals in comparison to those such as Odeon Room Acoustics Software (2023), or “Steam Audio” (Lakulish et al 2023b) .
Setting the boundaries then of what constitutes the virtual acoustic rendering for this project within the wider context of the field of spatial audio. The total encapsulation of encoding will consist of : a virtual rendering environment generating virtual acoustic space populated with raycasted cues. Rendering cues between the HRTF receiver and image sources with interactions from reflecting and occluding geometry, creating the perception of exocentric source-bonded sound with these virtual acoustics properties centred to the listener perspective. Being done within the context of music production, specifically the post-production mixing stage, this will combine both the creative and technical aspects of application.
2.b Stereo Audio Encoding Inquiry
With encoding being a core aspect of the proposed rendering method, understanding its function is fundamentally necessary to fully comprehend what takes place during signal reconstruction within VAS generation and recording. Within the realm of data processing, encoding is the function whereby information is reconstructed to be represented as numerical data (Cloud Developers, 2022). This is a fundamental aspect of digital signal processing, including the recording and representation (including playback) of digital audio. One simple example of encoding can be done utilising binary, where a data packet (N) is reconstructed (encoded) depending on the specified system interpretation (decoding) (RapidTables, ND). For another - (see Appendix R - D:1 - D:3).
When regarding raw audio encoding, there are a number of codecs which handle this reconstruction differently depending on required outcomes of either quality or data (file) size.
An example of this in digital audio processing is Pulse Code Modulation (PCM) encoding, which takes input from an analogue receiver (measuring incoming audible waveforms) and reconstructs (encodes) them into a data format which is compatible with digital signal processing (see Appendix R - D:2).
The example given here is of a PCM encoded waveform reconstructing an incoming signal’s amplitude into decimal and (16-bit) binary representation at points across a waveform (Jorge Juan Chico, 2022).
The relevancy here with regard to this project’s aim of rendering virtual acoustics is that the process of rendering encapsulates a number of reconstruction functions within the capacity of stereo formatting. In the context of HRTFs, as the function involved will calculate minute details imparted by positioning relating to the signal ILD / ITD later elaborated, these are then needed to be encoded into the PCM data for accurate reconstruction on playback.
This specifically relates to PCM waveforms, as this is the data that is directly altered by these calculations via adaptations of a given signal's frequency response and timings. Through a series of encoding processes, PCM stereo audio signals can be adapted to emulate multi-output surround-sound, and additionally include encoded virtual acoustic raycast cue signals.
Encoding directly onto the digital stereo waveform in this way also determines that the audio format is within the capacity of stereo and therefore compatible with existing stereo playback mediums. This ease of use and compatibility for both further audio processing or listener consumption separates it from the requirements of typical multi-channel surround-sound audio codecs and playback mediums. This is crucial to this project of creating audio tracks with VA processing, as it stakes technological inclusion on a compatibility requirement for the audio to be accessible to any existing stereo format usage.
2.b.i Encoding Head Related Transfer Functions
As in natural environment acoustics, the direct and reflected propagation of sound from source to listener imparts timing and frequency response alterations which makeup the perceived acoustic characteristics of the present space (see Appendix R - E:1). As stated by Xiao-li Zhong, et al (2014), these head-related transfer functions characteristics have been identified within psychoacoustics as :
“The interaural time difference (ITD), i.e., the arrival time difference between the sound waves at left and right ears, is the dominant directional localization cue for frequencies approximately below 1.5 kHz. The interaural level difference (ILD), i.e., the pressure level difference between left and right ears caused by scattering and diffraction of head etc., is the important directional localization cue for frequencies approximately above 1.5 kHz.”
The third key characteristic specified by Dmitry N. Zotkin et al, (2004), is the additional monaural and binaural cues resulting from sound scattering off the lister’s body and outer-ear. The encoding process of receiving and rendering a simulation of these characteristics is regarded as the Head Related Transfer Function [HRTF], otherwise described as a frequency-response function.
The rendering of virtually simulated ITD / ILD, scattering cues, along with additional air transmission and absorption level rolloff from sources is the overarching audio encoding function intended for this project, enhanced with additionally rendered raycast reflection cues that are reactive to present acoustic geometry. All these simulated physical characteristics will encode unique additional sound signals onto the PCM waveform, which is regarded as a synthesis of Virtual Acoustic Space [VAS], and if done correctly will enable the listener to perceive sound in externalised exocentric space as if in the room with the sound sources. Similarly to the properties of live sound, exocentric sound is the perception of audio sources from outside of oneself.
One model of encoding HRTFs incorporates parametric peak-notch frequency cut and boost filters to simulate the positioning of audio sources around the listener position. These are adapted in custom HRTFs which individualise the function to the listeners specific head dimensions to gain better results (head-related impulse response or HRIR, (Isaac Engel et al, 2021a). The peak-notch frequency processing is also a key aspect of rendering the zenith positioning above the listener position (Spatial Hearing Lab, 2022).
Two examples of HRTFs in operation can be found in the resources below, with the first being an implementation done using Steam Audio and the second with a custom program :
- Wildcat (2019), “HRTF Demo - Steam Audio”
- Katsuhiro Chiba (2016), “Synthetic HRTF 3D Audio Test 2 (for Headphones)”
2.b.ii Encoding Binaural and Ray-tracing Reflections
The generated reflection cues specifically, which contain ITD / ILD that are derived from the original sound source positioning, are defined by localisation and room model parameters that are encoded at pickup by ambisonic impulse responses and rendered by the HRTF receiver. These cues are the binaural encoded signals brought about in the transfer of audio through virtual acoustic space which work in tandem with the HRTF.
“The generation of a VAS is based on the reproduction of binaural acoustic cues related to the relative sound source location such as timing, intensity and spectral features.” - Camille Bordeau et al (2023).
Binaural cues are synthesised sound sources positioned at reflection points and impart acoustic properties, with ray-tracing operating the placement and positioning. Through ray-tracing, acoustic patterns are generated which mimic the propagation of sound (or light) in a virtual space. These calculate time differences and level differences of reflections from all surfaces, including simulated listener geometry, within range of the source and propagate until they reach the HRTF receiver.
“The static cues are both the binaural difference-based cues, and the monaural and binaural cues that arise from the scattering process from the user’s body, head and ears” - Dmitry N. Zotkin et al (2004).
Included is a raycast diagram made using Amray (ND), a 2D ray-tracing sketchpad, which also illustrates the ray-tracing process of localising binaural cues onto surfaces for reflections similar to real-world acoustics. The walls reflecting are the custom designed model of the 3D rendering environment, with each reflection being a cue derived from the original point source (see Appendix R - F:1). A HRTF receiver can then be positioned, and to the listener it should sound as if they’re in the VAS with audio perceived as exocentric incoming from a surrounding diffuse pattern (see Appendix R - F:2).
2.b.iii Encoding Ambisonics
Ambisonic processing was first developed with the focus on its first-order formats, a framework of utilising microphone placement at the recording stage to capture 360° sound fields (Blue Ripple Sound, 2023), otherwise denoted as a 360° sonic sphere by M.A. Gerzon (1980). These 360° placements are similar in pickup pattern to omnidirectional microphones, however with the addition of multiple stacked figure-eight pattern cardioid microphones which would then capture the ITD / ILD from sources positioned around the array, imparting directionality to the playback (see Appendix R - G:1).
To do this, first-order ambisonics would typically require four channels to capture a full periphonic (3D) sound image (Blue Ripple Sound, 2023). Now with the advancement of surround-sound playback systems such as 5.1 / 7.1, Higher Order Ambisonics takes the framework further into more multi-channels (Jörn Nettingsmeier, 2010). With fifth-order virtual ambisonics utilising 36 channels in an implementation with Rapture3D game-engine, and up to 16-channel 3rd order pickup configurations as utilised in Steam Audio (Lakulish et al, 2023a).
The aforementioned physical microphone setup utilising an omnidirectional cardioid / 3 figure eight patterned cardioids is denoted as A-format (see Appendix R - G:2), which is generally taken to mean the direct signal set from this tetrahedral microphone configuration (Paul Hodges, 2011).
B-format is another configuration, this format consists of the spherical harmonics of the sound field up to the order being considered (Paul Hodges, 2011). The soundfield itself is described by combination of the four audio streams to represent 3D space around the listener (Apple Inc, 2023) -
- W: sound pressure
- X: front-to-back
- Y: left-to-right
- Z: top-to-bottom
The specific difference between A / B orders is that whilst A format regards the sound pickup at the recording stage, B format is specifically in reference to the stage where the audio signal is then encoded and stored. This encoding is done in such a way that the information does not directly correspond to speakers as LR signals, but rather the four channels contain the encoded soundfield which allows for the manipulation required to generate speaker signals through numerous methods such as UHJ Stereo (see Appendix R - G:3) or HRTFs, to rotate the sound field and various other transformations (Paul Hodges, 2011), including binaural playback.
3. Methodology
3.a Participatory Practitioner Action Research
Going into this project the approach to the method will be within an interpretivist & subjective paradigm assuming a relativist ontology of intangible reality (Charles Kivunja et al, 2017), to orientate the participatory research appropriately to the nature of subjective observations of individualised psychoacoustic sound. This interpretivist approach is due to the subjective nature of the output of technologies which will be implemented to render virtual acoustic space in stereo waveforms. Operating as the participatory practitioner (Rachel Pain et al, 2012) undergoing the project, an informed action research method will be carried out by following iterative cycles of action and reflection to address practical / technical / creative concerns, (Lisa M. Vaughn, 2020). In taking a critical perspective for the implementation and refinement of included processing and its practical application, the relativist orientation of ontology will still convey relevant observations of output.
Although the subjective aspect of this technology can be distinct to each listener, due to the listeners perception and head geometry, there are still transferable sonic perceptions of this audio between all listeners (City University of Hong Kong, 2023c). Meaning it will therefore still be a responsibility to conduct this research with active acknowledgement of subjective bias, mitigated by employing a fair and informed approach of critical listening in practice to produce good quality results for a distinct listening experience which is apparent for any listener. Critical listening approach is discussed further in the section 3.b.i.
The practice-based inquiry incorporates action research practices of development through practical application and implementation of a rendering method through experimentation with various encoder technologies available. With regard to the action research method there are four basic elements evident in the definitions and descriptions of action research (W.J. Cilliers 2007). Provided are four processes of Zuber-Skerrit’s (1991) which describe a spiral of cycles consisting of four major stages, to which has been added the corresponding stages setout in this project’s proposal methodology :
1. Planning / Development
Planning and development will be undertaken to conceptualise an application of encoding, this will work to a specified aim of with the focus shifting dependent on the specific area of implementation.
2. Acting / Action
Action will consist of undertaking test iterations of the specific process which is in question, adaptations to numerous relevant properties may be carried out based on previous reflection, with the specific area of this shifting dependent on the stage of implementation.
3. Observing / Analysis
Results will be observed, analysed and scrutinised by undertaking critical listening and qualitative assessment, to inform following iterations or practical applications. The aim being to reach an application of encoding which is both practically worthwhile and creatively effective.
4. Reflecting / Conclusion
Reflection will follow with conclusions drawn off the results, considerations will be noted and will then inform the next sequence of iteration.
This will closely align with the experimental method of participatory technical development, also provided by W.J. Cilliers (2007), whereby participation in the utilisation / iteration and reflection of said technologies brings about results to further develop whilst aiming to improve the technological context.
To identify both optimal and creative configurations to utilise in the render iterations, these cycles of action and research will hone in on what is both practically and creatively worthwhile in application to production tracks. The action research approach will inform and guide the experimental process of implementation for both the integration of encoding processes within the unity engine virtual environment, and the integration into mixing for creative application. This process has been selected, along with it fitting to the practice-based research methodology, as produced practical outputs should evidence the rigour of a cyclical reflective approach to implementation.
A cyclical sequence to iterations, similarly to those illustrated (see Appendix R - F:1 - Figure 4), should result in a large pool of demo material resources which inform, illustrate and support research as well as demonstrate the potential of creative use and practical application. This developmental approach is similar to computer programming methods such as debugging and software improvement models, but being in the area of audio with the additional creative aspect relating it more to post production mixing projects and scenarios.
The research aims of this project are to gain a detailed technical understanding of necessary audio reconstruction for VAS generation so that a thoroughly informed documented account of inquiry, into the process and use of the applied encoding technologies, can be produced. This will encapsulate the accumulation of research topics within this document, accrued alongside the practical action of application within an audio post-production practitioner’s context.
The practical output and application of development and implementation will be documented in the appendix document - Primary Action Research Iteration Sessions & Demonstrations - and will orientate around implementation through in-depth iterative research sessions of encoding for music productions following on that setout in the methods section 3.b.iii - Implementation and Development. The cyclical framework of action research with reflection will develop the iteration sessions themselves and the approach to implementation, as well as with regard to the specific development of demo material. The aim of these iteration sessions is to research the process of practical application, and to understand and illustrate the creative use with the production of an EP production which incorporates VAS encoding into mixes.
3.b Methods and Approaches
3.b.i Critical Listening
Within audio production, critical listening makes up a key part of analysing and determining intrinsic properties of any given sound. Critical listening as a practice is otherwise known as timbre solfege or timbre solfeggio, and these focal properties constitute the timbre of any sound. The properties of timbre under critical analysis consist of (Dr. Jason Corey, 2017):
- Timbre differences
- Cause and character of differences
- Frequency power spectrum (spectra)
- Character of initial transients
- Attack / decay synchrony of higher overtones
With identification of these properties of sound, analysis becomes more accessible with issues or areas of development being labelled and contextualised to better associate a qualitative based analysis of the audio itself. This project context specifically relates to technical ear training, which differs from musical ear training as it focuses on technical elements of sound spectra (see Appendix R - H:1 - Figure 7) such as :
- Tonal balance
- Level / amplitude envelope
- Mix balance
- Signal processes
- Frequency Responses
Within audio production more broadly, balancing spectra is a key aspect to mixing which constitutes the analysis undertaken of instrumentation within shared frequency bands of multi-tracks (Music Production Glossary, 2023). The overlapping of sounds across already occupied frequency bands is denoted as clashing, and inversely space is attributed to bands of low frequency power or level (see Appendix R - H:2). To visualise these frequency bands, spectrum analysers are often implemented to analyse the data of power balance across frequencies to an objective precision which our ears cannot replicate.
Approaching the audio production output of this project in the context of a post-production mixing practitioner, the subjective creative aims for specific stems of encoded audio will also steer progression of individual demo outputs and implementation into track production. However, ensuring that development of the encoding framework itself as an overarching processing unit is consistent in generating ideal spectral and timbral effects, critical listening observations of the imparted and additionally rendered signal present is important to ensure each encoding function is optimal in both performance quality and perception.
3.b.ii Creative Method of Application, and Approach to Stereo Virtual Acoustics 3.b.ii.1 Stereo & Transferability of Encoded Audio
Binaural rendering allows us to present auditory scenes through headphones while preserving spatial cues, so the listener perceives the simulated sound sources at precise locations outside their head (Frederic L et al, 1989). With the preservation of cues when rendering VAS / Ambisonics into binaural, via HRTFs convolving an anechoic audio signal with a head-related impulse response (HRIR), headphones position the interior facing stereo output in position of the HRTFs LR input allowing it to accurately recreate all the ITD / ILD / Scattered cues / spherical harmonics present in the encoded audio. This headphones exclusive use is a well known trait of binaural sound as being intended specifically for headphone playback. Without the interior facing LR playback at the HRTFs LR position the reproduced soundfield does not accurately reconstruct to fit the encoded model and thus decouples the effect from accurate spatial directionality (see Appendix R - I:1).
However, as characteristics such as depth and positioning (ITD / ILD) are still maintained within encoded audio regardless of loudspeaker placements, sound sources are still decoupled from the loudspeakers in a holographic image field (Paul McGowen, 2019), the addition of natural room acoustics then may further enhance the listeners perceptual experience. This can be illustrated, as a virtual acoustic room within a naturally acoustic room, as a liminal conjunction between defined and internal virtual acoustic space into changing external physical space.
3.b.ii.2 Creative Approach and Supplementary Application
The exocentric spectra to the resulting audio transmitted through virtual acoustics, rendered into stereo is one of the main creative conceptualisations which has driven the artistic approach to this research project focused on generating virtual acoustic spaces. Although descriptions of this process are technically described elsewhere in this document, covering the creative reasoning is a relevant aspect to fully express the approach to the project.
The auditory aesthetics of virtual acoustics intertwines an overarching creative approach (see Appendix R - J:1), which combines conceptual ideas of the immersive culture phenomenon and applies a philosophy to the generation of sound to create a type of in-between liminal space. VAS as an aesthetic whilst being an antithesis to the nostalgic liminal space visual aesthetic, still maintains the same form of liminality with emulation of subjective and illusionary exocentric space through displacement. To supplement and accompany the practical creative outputs generated throughout the action iteration sessions which accompany this research, and demonstrate both the creative and mixing application potential, an additional illustrative AV set of material will be produced. The supplementary VA AV demonstrates the in-scene interactivity of source placements and listener perspective and how the present virtual acoustic encoding functions propagate signal and cues relative to listener position (see Appendix R - J:2).
3.b.iii Implementation Method and Development
3.b.iii.1 Unity Engine Virtual Rendering Environment
For implementation development & documentation attention should be given to the document Appendix L - Primary Action Research Iteration Sessions & Demonstrations - for the full scope of practical & creative applications & outputs within this research project.
As the project demands the use of a virtual rendering environment to produce the output of a virtually simulated acoustic space, Unity Engine has been chosen as the foundational software to process audio within. The framework of a virtual environment is set out astutely as interactive head referenced computer displays that give users the illusion of displacement to another location (Stephen R Ellis, 1994) (see Appendix R - K:1).
As this project’s context is within audio, Unity will be utilised within the iterative research sessions as a virtual rendering environment as it is integrated with programming capabilities to make the necessary encoding possible for auditory displacement (see Appendix R - K:2). A nuanced capability requirement for this project is geometric modelling, which encapsulates a number of functions to represent objects within the virtual space (John M. Hollerbach et al, 1999), being essential for interactive acoustic modelling and scene building. As these aforementioned capabilities are required aspects to render physics based auditory scenes [VAS], this software framework will be implemented as it meets all of these requirements.
3.b.iii.2 Operation and Application Method
The initially empty virtual rendering environment which forms the basis of VAS, simulating 3D space in a virtually rendered environment. Within this following preliminary a gameobject asset appropriately denoted Sphere (Appendix R - L:1) exists within the virtual rendering scene, and displays parameters and functions within the inspector which are utilised to transform the specific object (position / orientation / size) and it’s intrinsic properties (see Appendix R - L:2).
Through a combination of these functions and variables, complex scenes can be generated which incorporate the required functionalities of geometric modelling for simulated physics-based auditory scene building. Utilising these gameobjects, audio sources and receivers make up the core components of the audio rendering component signal chain. Along with the generation of architectural gameobject geometry, potential capability reaches into being able to generate model room geometry within the rendering environment, to simulate interior space and to imbue virtual acoustic characteristics onto stereo signals.
Below are some examples of rendered virtual spaces within unity using both simple and complex geometry, these are a combination of gameobject based models and SKP models, the latter have been gathered from 3D warehouse - A depository of .skp files created within SketchUp (3D Warehouse, 2023).
The models illustrate the concept of virtual rendered environments, as they represent 3D space simulated through computational processing. These dry environments however still lack crucial components not only to render sound, but also to create the illusionary displacement of the user. These two components however are intrinsically connected as the orientation of this project focuses on the auditory aspect. Once sound source and receiver components, which utilise encoding processes set out in the audio encoding inquiry, are programmed with additional gameobject components then the audio renders should displace the listener into these virtual auditory spaces.
To virtually render audio and encode these room models with acoustic characteristics, the developmental implementation that will be undertaken consists of iterative action research in a method which aims to produce accurate renders of exocentric audio which conveys room characteristics. This will be undertaken to inform the research inquiry into the developmental application of these virtual rendering environment spaces with regard to the audio encoding inquiry research and implementation process itself.
4. Virtual Acoustics Overview
Summary report of developmental iteration sessions
4.a Research Iteration Sessions - Findings And Interpretation
Following investigation into integration (see Appendix L - Steam Audio integration into Unity Engine for recording and mixing in Logic Pro X) & upon completion of the iteration sessions of implementation and development (see Appendix L - Primary Action Research Iteration Sessions & Demonstrations) alongside the 8-track music production (see Appendix A), findings evidence that 2-channel stereo is capable in it’s capacity to carry encoded signals to emulate multi-channel playback (see Appendix L - Session 2).
Additionally, it is evident (see Appendix L - Impulse Response Generation and Observation) that synthesised acoustic characteristics (see Appendix L - Virtual Acoustic Space Models) are imparted whilst transmitting through Virtual Acoustic Space, from present active geometry models (see Appendix L - Session 1 / 3 / 4 / 5 Demos). In this way, auditory displacement of the listener is achievable, presented throughout the full portfolio of demonstration material within Appendix L.
Virtual Acoustics constitutes a dynamic and alternative processing method, with raycasted cue signals successfully rendering a panoramic soundfield upon HRTF encoding of the Ambisonic pickup from the listener perspective (see Appendix L - Session 11- Track 15 VA Demo). The utilisation of virtual rendering environments to achieve this is both suitable and advantageous as it presents innumerable variety regarding both approach and practical accessibility to encoding functions.
Findings also demonstrate the requirement for thorough understanding of the technical background research context and discourse, especially for troubleshooting further configuration development and informed fundamental technical comprehension of the implementation (see Appendix L - Pilot Session).
Practical and creative use of virtual acoustic space configured within a virtual rendering environment is prevalent, with a wide variety of implementation methods allowing for exploratory approaches to the mixing stage of audio post-production (see Appendix L - Session 9 / 10 Demos).
Findings indicate that the variety of potential applications ranges from initial compositional work (see Appendix L - Session 12 - TapeTransitionTexture Demo), novel recording render methods (see Appendix L - Session 14 - VA Mic 1 Virtual Mic Test), wide practical applications for mixing (see Appendix L - Session 6 / 13 - Stem Demos) including remixing (see Appendix L - Session 8), and final post-production mastering (see Appendix L - Session 7)
Interpreting these findings, the established encoding framework leads to significantly worthwhile experimental versatility, evidenced within the documented range of approaches to implementation. The iterative research session approach to implementation and development has also allowed for a rigorous inquiry with constructive flexibility, informed by the wide frame of background research context.
Final outputs of demo material culminate in the 8-track music production (see Appendix A), which cements the evident potential for practical and creative application. Even going so far as to form an underlying liminal philosophy to artistic approach and utilisation (see supplementary AV - Creative Approach And Supplementary Application)
4.b Reflection On Results
Reflecting on the practical output of audio tracks produced within the context of implementation, the final EP has demonstrated as the participatory practitioner that use of this framework is creatively and practically worthwhile. Throughout progression of the tracks, the variety of applied implementations supports an experimental approach with capacity for creative flexibility.
Regarding Engine Room (Appendix L, Track 1), the iterative development here focused on understanding the integration process itself into a mixing workflow. This being the session which would lead into the EPs conceptualisation as a whole piece, the process of utilising the software framework of Unity Engine alongside Logic Pro X immediately showed promise for swift practicality, comparable to analogue processing equipment, (see Appendix L - Session 6).
Monitoring real-time rendering lends the process to a focused and committed approach, with renders solidifying encoded properties as there is no opportunity to alter later without rerendering. This steered the approach with commitments to ensuring high quality pre-render, which prevents overtime reworking. Along with this, adaptability of parameters in real-time allows for quick alterations on the fly, resulting in a fast exploratory experimentation which quickly brings about positive and unexpected results for consideration.
Continuing with you&me. (Appendix L, Track 2) brought about demonstrable application alongside standard stereo mixing process, used in this context more so as a processing unit for emphasised spectral characteristics of guitar and vocal layerings, (see Appendix L - Session 13). Tonal adjustments resulting from encoding included dynamic vertical movement across the full stereo width, responsively to input frequency ranges giving a textural harmonic quality that is distinct in its image.
Onto Sapphire / Burden / How We Livin ? (Appendix L, Tracks 3 / 5 / 6), all were approached at the mastering post-production stage to understand different application methods that are present, alongwith inquiry into whether phase relationships of dense stereo information could persist and be taken into exocentric space to any degree. Following several iterations with experimental adaptation across the encoding framework, the end results demonstrated that existing phase relationships do maintain throughout encoding. Along with this, the existing stereo image through source placement can be accompanied by the additionally rendered cue signals which also maintain correct directionality, with incoming cues in the correct orientations to fit what is already there.
The mastering implementation does emphasise the nuance to the addition of encoded signal, as it consists of small incremental reconstructions the overall effect can be lost without conscious awareness of what is present to an untrained ear. This relates to the factor of subjectivity within this, as participatory practitioner the understanding of what takes place is apparent however it must be recognised that most listeners will not consciously identify the final encoded sound. Although this at the surface counters how worthwhile the application method is, in taking into account all aspects of practical and creative application, the utilisation in this case still lends to significant value despite its subtlety.
With the tracks : Record Store / What Will Happen ? (Appendix L, Tracks 4 / 7), the production context revolved around application into remixing. The initial investigation here was on whether room tone was a consistent factor in renders, given that when the same VAS was used the overall tonality of renders did result in uniform room characteristics (see Appendix L - Session 8). Additionally, novel placements of sources were arranged with experimentation revolving around finding distinctly perceptive placements in the nearer field (see Appendix L - Session 9). Stems benefitted from encoding characteristics as layering takes on more defined depth and dynamic imaging, with dense areas of the mix pre-render subsequently being diffused into a wider stereo image. Symmetrical placements had caused issues with phase cancellation in some configurations involving stereo stems, however this was resolved with flipped placements for each side of the listener’s perspective (see Appendix L - Session 10 - Strings Arrangement), or additional ambisonic source components.
Finally with Ingleton Road (Appendix L, Track 8), documentation on the production of a fully encoded multi-track mix using only processed audio demonstrates the applicability to any mixing context. With the resulting track fully encoded, VAS placement techniques are apparent with the rendered sonic sphere being distinct from a standard stereo field. Additionally, encoded virtual acoustic spectra are seen present in the used VAS’ frequency response visualised through a spectrum analyser (see Appendix L - Session 11).
Consistent virtual acoustics are across all encoded stems, containing specified acoustic timbre resulting from the defined parameters of assigned geometry material (see Appendix L - Session 3).
Differentiation between VAS room models is also evident in the testing undertaken (see Appendix L - Session 4 - Demos) where parameters are consistent with the exception of room models. Varying interiors and their respective volumes, results in varying spectra and tonality which can be adapted to best fit the mixing application at hand through exploratory adaptation to positioning (see Appendix L - Session 5), or through creative recording techniques (see Appendix L - Session 14). The included VAS models provided within the VA Studio assets demonstrate the creative potential for variety in room emulation modelling, supported by the additionally included impulse response archive (Appendix L - IR Pack).
4.c Discussions
Opening discussions on the variety of implementations, the versatility of the encoding framework lent to the experimental process of design and configuration. Although other application methods are possible and surely worth independent inquiry, the integration into Logic Pro x with Unity Engine as the operating virtual rendering environment has led to a streamlined software workstation for VAS rendering (see Appendix B - VA Studio).
Comparably to existing software packages, this alternate method includes the flexibility for different environments and audio workstations to be integrated together to fit the user’s requirements. Therefore without restriction to use of only those presented here, the encoding framework lends to an inclusivity of approach and resulting accessibility of VAS encoded audio. Going further, accessibility to playback of encoded signals is evidenced (Appendix L - Session 2) through the lack of necessity for anything but any existing stereo playback mediums.
As implementation has focused on virtual room acoustics simulation, and spatial encoding of virtual environments, it is in line with research project outputs from University of York (2023). This could therefore inform further inquiry and development into this niche within the field as well as steer potential application nearer into the context of creative use for music post-production. Demo outputs are also in-line with the foundational functionality found in outputs from City University of Hong Kong (2023a), or Spatial Hearing Lab (2022), but have successfully taken encoded characteristics further with the addition of virtual acoustic spectra synthesis.
The conjunction of space and sound within the framework of interconnected form of being in scene as inseparable from the source image itself (Denis Smalley, 2007), has been significantly impactful to utilisation of virtual environments in generating virtual acoustic space. These approaches have imparted a creative perspective to approaching the encoding framework, which has left a lasting impression of the produced practical works. Following presented principles such as specifying acoustically proximate space from the listeners vantage point to enhance their listening (Dmitry N. Zotkin et al, 2004), has both been significant in producing implementations and in communicating aspects of enhanced positioning within the rendered spherical soundfield.
Regarding creative aims of this inquiry, considering the success of the enhanced perception of space intrinsic to the stereo waveform, this imbued perceptive spectra has resulted in a musicality to operation of the encoding framework. In-line with Ulf A. S. Holbrook (2019) who claims that the space in which music is experienced is as much a part of the music as the timbral material itself, this instilling encapsulates that these encoded stereo waveforms carry their space with them – they are space-bearers (Denis Smalley, 2007).
On discussion of the method to research, the effectiveness of the action research approach to bring about iterative documented development has been invaluable to the research project as a whole. As practical outputs have been informed by numerous aspects of research, undertaking an immersion in this field has not only been lucrative but also retrospectively appropriate. During the undertaking, the amount of facets interworking with one another across the wide project scope came with challenges, being relatively heavy for myself undertaking the participatory practitioner role. However, this approach has led to substantiated outputs, documented and applied in practice which culminate in a value of significance for informing this inquiry and potential further research.
Following the requirement for informed technical understanding to approach, contrary to those applications studied by Ruth Maria Ingendoh (2023), the action research method following iterative cycles of action and reflection has addressed many practical / technical / creative concerns (Lisa M. Vaughn, 2020) resulting in implementations that have been critically considered and substantially informed. Allowing for experimental versatility has not only led to wider flexibility of creative approach, but has also brought about an exciting exploratory approach to mixing differentiating significantly from standard practice in the field. Following in-line with the experimental method of participatory technical development provided by W.J. Cilliers (2007), has brought about an alternative approach to the technological context of the post-production mixing practice. The relevance of this to engineers in the field is that the framework invites an alternative method to approach, which could inform future applications across a range of implementation styles and configurations. With the additionally imparted synthesis of virtual acoustic spectra, the resulting enhanced stereo surround-sound could present numerous applications within growing areas of technological hardware and software models.
4.d Conclusions
In summary of the discussed results drawn from inquiry, findings demonstrate the compatibility between multichannel signal and stereo audio through a process of encoding reconstruction & deconstruction. Through rendering ray-casted cue signals via ambisonic impulse pickup, virtual acoustic spectral characteristics are able to be generated to emulate natural perceptibility of sound sources. Within the provided VAS Studio, generation of room characteristics from user defined parameters and models hands creative freedom to the post-production practitioner for rendering enhanced stereo recordings in an experimental and exploratory process. The proposed method demonstrates the versatility and variety to application approaches, providing an alternative route for further progression within the technological context of immersive audio post-production.
Evaluating the research focus and objectives, thorough understanding of the technological context has been critical in informing and carrying out the practical application and action research based methodology. Configuration development has been directly guided by this research context, with the informed approach also lending significantly to the success of establishing application of the presented encoded framework within a post-production audio context.
Creative outputs have demonstrated the suitability and versatility of the encoding framework in approach to music making & post-production mixing. Additionally, the success of creating a VA studio session imbued with VAS for utilisation has proved fruitful and beneficial in both creative adoption and practical application. With encouragement that other engineers in the field may try this immersive stereo audio rendering method, the effectiveness of the action research methodology to approach iterative sessions provides a detailed and documented interpretivist yet informed account of utilisation which could prove insightful.
Regarding next steps for further action, continuing to push the quality of rendered signal will be undertaken to further progress this alternative method. As limitations are far-off with regards to quality parameters and source counts, further exploratory investigation into this and the variety of applicable methods will no doubt impart further insight and development.
With regard to the produced EP works, demonstrable mixing and implementation applications are evident from working within the VA Studio. Configuration and development has been insightful for creative application, and for practical understanding of the present encoding functions at hand which play a key part of the immersive audio landscape.
With the research focus honing in on the specific niche of immersive audio generation for stereo formatting, the wide scope of included background research and practical application illustrate in an exocentric manner around the focal of significantly enhancing stereo audio with enveloping panoramic virtually acoustic spectra. Through the emanation of LR dual signal sources in virtual acoustic space, reconstructed into thousands of individualised rays and recorded via multichannel impulse responses reconstructed into psychoacoustically perceived stereo signal, auditory displacement has been attained. In this way, fruition of the overarching creative philosophical tenet of liminal fusion has been realised.
-
Bibliography
3D Warehouse (2023), ”Models”, Depository : [Online] : Available at :
https://3dwarehouse.sketchup.com/search/models
[Accessed : 12/10/23]
Andreas Silzle (2003), “Quality of Head-Related Transfer Functions - Some Practical Remarks”, [Online] : Available at : https://www.researchgate.net/publication/344532774_Quality_of_Head-Related_Transfer_Functions_-_Some_Practical_Remarks
[Accessed : 05/10/23]
Amray (ND), “Amray Tool”, Amcoustics, [Online], Available at : https://amcoustics.com/tools/amray
[Accessed : 28/09/23]
Apple Inc (2023), “Overview of B-format surround encoding in Impulse Response Utility”, [Online] : Available at :
https://support.apple.com/en-ie/guide/logicpro-iru/dev022fbc493/mac
[Accessed : 26/07/23]
Applied Psychoacoustics Lab (2023), “Virtuoso”, [Online] : Available at :
https://apl-hud.com/product/virtuoso/
[Accessed : 11/07/23]
AudioKinetic (2023), “Wwise Spatial Audio”, [Online] : Available at :
https://www.audiokinetic.com/en/products/wwise-spatial-audio/
[Accessed : 11/07/23]
Audio Futures (2023), “360 WalkMix Creator”, [Online] : Available at : https://360ra.com/ [Accessed : 11/07/23]
Authority Media (2023), “What is bit depth? Everything you need to know”, [Online] : Available at :
https://www.soundguys.com/audio-bit-depth-explained-23706/#:~:text=On%20balance%2C%2016%20bits%20
[Accessed : 26/07/23]
Blue Ripple Sound (2023), “HOA Technical Notes”, [Online] : Available at:
http://www.blueripplesound.com/notes/hoa
[Accessed : 26/07/23]
Brereton, J (2017), “Music perception and performance in virtual acoustic spaces.”, [Online] : Available at:
https://psycnet.apa.org/record/2017-21406-012
[Accessed : 26/07/23]
Carl Jung (1954), “an account of the transference phenomena based on the illustrations to the “Rosarium Philosophorum.” 4. Immersion in the bath”, The Practice of Psychotherapy (The Collected Works of C. G. Jung, Volume 16), Princeton University Press. 1966. 2nd ed (p. 241-246).
Camille Bordeau et al (2023), “Cross-modal correspondence enhances elevation localization in visual-to-auditory sensory substitution”, [Online] : Available at: https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1079998/full
[Accessed : 26/07/23]
Charles Kivunja et al (2017) “Understanding and Applying Research Paradigms in Educational Contexts”, [Online] : Available at : https://files.eric.ed.gov/fulltext/EJ1154775.pdf
[Accessed 06/10/23]
City University of Hong Kong (2023a), “Spatial Hearing”, [Online] : Available at:
https://auditoryneuroscience.com/spatial-hearing
[Accessed : 26/07/23]
City University of Hong Kong (2023b), “Virtual Acoustic Space”, [Online] : Available at:
https://auditoryneuroscience.com/spatial-hearing/virtual-acoustic-space
[Accessed : 26/07/23]
City University of Hong Kong (2023c), “Spatial Hearing - Acoustic cues for sound location”, [Online] : Available at: https://auditoryneuroscience.com/spatial-hearing/acoustic-cues-sound-location
[Accessed : 9/08/23]
Cloud Developers (2022),”Introduction to audio encoding”, Cloud Speech-to-text > Documentation, Google, [Online] : Available at : https://cloud.google.com/speech-to-text/docs/encoding
[Accessed : 20/09/23]
Craig Smith (ND), “Gold Tape”, USC School of Cinematic Arts, Online : Available at
https://archive.org/details/GOLD_TAPE_43_Seismic_Effects/sound-effect-libraries-red-gold-sunset-editorial
[Accessed : 16/11/23]
Denis Smalley (2007), “Space-form and the acousmatic image”, [Online] : Available at :
https://www.cambridge.org/core/journals/organised-sound/article/abs/spaceform-and-the-acous matic-image/8B80E6A25A065A3D37DA7F9568A23432
[Accessed : 10/10/23]
Dmitry N. Zotkin, et al (2004), “Rendering Localized Spatial Audio in a Virtual Auditory Space”, [Online] : Available at: http://users.umiacs.umd.edu/~ramani/pubs/ZDD_cvs.pdf
[Accessed : 26/07/23]
Dr. Jason Corey (2017), “Developing Critical Listening Skills Through Technical Ear Training - Jason Corey”, [Online] : Available at :
https://www.youtube.com/watch?v=Y7e8pHLRT4c
[Accessed 15/10/23]
Edgar Y (2010), “Optimal Crosstalk Cancellation for Binaural Audio with Two Loudspeaker”, [Online] Available at :https://3d3a.princeton.edu/sites/g/files/toruqf931/files/documents/BACCHPaperV4d_0.pdf
[Accessed : 05/10/23]
Frederic L et al (1989), “Headphone simulation of free‐field listening. I: Stimulus synthesis”, [Online] : Available at : https://pubs.aip.org/asa/jasa/article-abstract/85/2/858/807019/Headphone-simulation-of-free-field-listening-I?redirectedFrom=fulltext
[Accessed : 05/10/23]
George Berry (2017), “Binaural in Music”, Online : Available at :
https://geberry.wixsite.com/binaural/challenges-and-issues
[Accessed : 05/10/23]
Isaac Engel et al (2021a), “Auditory models: from binaural processing to multimodal cognition”, [Online] : Available at :
https://acta-acustica.edpsciences.org/articles/aacus/full_html/2022/01/aacus210029/aacus210029.html#:~:text=When%20a%20sound%20field%20is%20encoded%20into%20the,as%20truncation%20order%2C%20which%20dictates%20its%20spatial%20resolution.
[Accessed : 05/10/23]
Isaac Engel et al (2021b), “Improving Binaural Rendering with Bilateral Ambisonics and MagLS”, [Online] : Available at : https://www.researchgate.net/profile/Isaac-Engel-3/publication/354048148_Improving_Binaural_Rendering_with_Bilateral_Ambisonics_and_MagLS/links/6120da68232f9558659e2d60/Improving-Binaural-Rendering-with-Bilateral-Ambisonics-and-MagLS.pdf
[Accessed : 10/11/23]
Jakob H et al (2004) “Managing Risk in Software Process Improvement: An Action Research Approach”, Mis Quarterly, [Online] : Available at : https://www.jstor.org/stable/25148645 [Accessed 02/07/23]
John M. Hollerbach et al (1999), “Virtual Environment Rendering”, [Online] : Available at :
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=d0ae6753a4f16c1a69258f518680744a5969f806
[Accessed : 12/10/23]
Jonathan G. Koomey (2001), Turning Numbers into Knowledge: Mastering the Art of Problem Solving, Analytics Press. 2001. 1st ed (p. 96).
J. L. Gonzalez-Mora (2006), “Seeing the world by hearing: Virtual Acoustic Space (VAS) a new space perception system for blind people.”, [Online] : Available at: https://ieeexplore.ieee.org/document/1684482
[Accessed : 26/07/23]
Jorge Juan Chico (2022), ”Audio encoding fundamentals”, University of Seville, [Online] : Available at :
https://www.youtube.com/watch?v=nTo4mwd_aKg
[Accessed : 20/09/23]
Jörn Nettingsmeier (2010), “Higher order Ambisonics - a future-proof 3D audio technique”, [Online] : Available at :
https://www.researchgate.net/publication/308785855_Higher_order_Ambisonics_-_a_future-proof_3D_audio_technique
[Accessed 04/08/23]
Josh Luis Gonzalez Mora (2002), “Virtual Acoustic Space Research”, [Online] : Available at:
http://research.iac.es/proyecto/eavi/investigacion.html
[Accessed : 26/07/23]
Katsuhiro Chiba (2016), “Synthetic HRTF 3D Audio Test 2 (for Headphones)”, [Online] : Available at :
https://youtu.be/QhzgQ2j0miI
[Accessed : 26/07/23]
King’s College London Baroque Orchestra (ND), “Telemann's La Lyra Overture”, Online : Available at :
https://cambridge-mt.com/ms/mtk/
[Accessed : 23/10/23]
Lakulish, et al (2021a), “Steam Audio Integration”, [Online] : Available at:
https://valvesoftware.github.io/steam-audio/doc/unity/index.html
[Accessed : 26/07/23]
Lakulish, et al (2023b), “Steam Audio”, [Online] : Available at:
https://github.com/ValveSoftware/steam-audio
[Accessed : 26/07/23]
Lisa M. Vaughn (2020), “Participatory Research Methods – Choice Points in the Research Process - Fig 1 Participatory Research Frameworks, Orientations, and Approaches”, [Online] : Available at : https://jprm.scholasticahq.com/article/13244-participatory-research-methods-choice-points-in-the-research-process?attachment_id=36974
[Accessed : 9/08/23]
M.A. Gerzon (1980), "Practical Periphony", [Online] : Available at :
https://intothesoundfield.music.ox.ac.uk/what-is-ambisonics
[Accessed 05/08/23]
Marije A.J. Baalman (2010), “Spatial Composition Techniques and Sound Spatialisation Technologies” [Online] : Available at : https://www.cambridge.org/core/journals/organised-sound/article/abs/spatial-composition-techniques-and-sound-spatialisation-technologies/F1B8A56697E8F9922689E8586A37EC6A
[Accessed : 10/10/23]
Martin Naef (2002) “Spatialized audio rendering for immersive virtual environments”, VRST, [Online] : Available at :
https://dl.acm.org/doi/abs/10.1145/585740.585752
[Accessed : 12/10/23]
Mike Senior (2023), “The 'Mixing Secrets' Free Multitrack Download Library”, Online : Available at : https://www.cambridge-mt.com/ms/mtk/
[Accessed : 11/07/23]
Music Production Glossary (2023), “Critical Listening”, [Online] : Available at :
https://musicproductionglossary.com/what-is-critical-listening/
[Accessed 15/10/23]
Nicholas Tsingos (2004), “Perceptual audio rendering of complex virtual environments”, ACM Transaction on Graphics, [Online] : Available at : https://dl.acm.org/doi/abs/10.1145/1015706.1015710
[Accessed : 12/10/23]
N.J.V.Vlaun et al (2016), “A Sound Working Environment: Optimizing the Acoustic Properties of Open Plan Workspaces Using Parametric Models”, Delft University of Technology, [Online] : Available at : https://www.researchgate.net/publication/303913923_A_Sound_Working_Environment_Optimizing_the_Acoustic_Properties_of_Open_Plan_Workspaces_Using_Parametric_Models [Accessed : 28/09/23]
Odeon Room Acoustics Software (2023), “Ray tracing and hybrid methods in ODEON Room Acoustics Software”, [Online] : Available at : https://youtu.be/vkDHgH00MFQ?si=MkX_e_LW16twDrNu
[Accessed : 11/07/23]
Oercommons(ND), “Coordinate Plotter”, Online, Available at :
https://oercommons.s3.amazonaws.com/media/courseware/relatedresource/file/imth-6-1-9-6-1-coordinate_plane_plotter/index.html
[Accessed : 20/10/23]
Paul Hodges (2011), “Channel Formats”, [Online] : Available at:
https://ambisonic.info/ambisonics/channels.html
[Accessed : 26/07/23]
Paul McGowen (2019), “What is holographic audio imaging?”, [Online] : Available at :
https://youtu.be/luVAoECz0a0?si=m54MBX1jEie0jIH6
[Accessed : 11/07/23]
Rachel Pain et al (2012), “Participatory Action Research Toolkit“, [Online] : Available at :
https://www.dur.ac.uk/resources/beacon/PARtoolkit.pdf
[Accessed 06/10/23]
RapidTables (ND) “Binary Calculator” , [Online] : Available at :
https://www.rapidtables.com/convert/number/binary-to-octal.html
[Accessed : 20/09/23]
Ruth Maria Ingendoh (2023), “Binaural beats to entrain the brain? A systematic review of the effects of binaural beat stimulation on brain oscillatory activity, and the implications for psychological research and intervention”, [Online] : Available at : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10198548/
[Accessed : 11/07/23]
Shigeo Sekito (1975), “the word II”, Special Sound Series Vol. 2: The Word LP, Columbia Records
SketchUp (2023), ”SketchUp Online App”, [Online] : Available at :
https://app.sketchup.com/app?hl=en
[Accessed : 12/10/23]
Spatial Hearing Lab (2022), “Demonstration of 3D audio rendering using individualized Head-Related Transfer Function (HRTF)”, [Online] : Available at: https://youtu.be/ZTDOhZDkek4
[Accessed : 26/07/23]
Stanford University (2023), “Immersive virtual acoustic spaces”, [Online] : Available at:
https://otl.stanford.edu/researchers/high-impact-technology-hit-fund/Immersive-virtual-acoustic-spaces
[Accessed : 26/07/23]
Stephen R Ellis (1994), “What Are Virtual Environments?”, Nasa Ames Research Centre, [Online] : Available at :https://goldberg.berkeley.edu/courses/S06/IEOR-170-S06/docs/EllisVE.pdf
[Accessed : 12/10/23]
StudyTonight (2023), “Game Engine and History of Game Development”, [Online] : Available at :
https://www.studytonight.com/3d-game-engineering-with-unity/game-engine
[Accessed : 12/10/2 3]
Tapio Lokki (2008), “Handbook of Signal Processing in Acoustics - Virtual Acoustics”, [Online] : Available at :
https://link.springer.com/chapter/10.1007/978-0-387-30441-0_39
[Accessed : 26/07/23]
Teach Me Audio (2020), “Audio Spectrum”, [Online] : Available at :
https://www.teachmeaudio.com/mixing/techniques/audio-spectrum#:~:text=The%20audio%20spectrum%20is%20the,impact%20on%20the%20total%20sound
[Accessed 15/10/23]
The Lonely Wild, “Scar”, Chasing White Light, Entertainment One Music, Online : Available at :
https://cambridge-mt.com/ms/mtk/#Acoustic
[Accessed : 20/10/23]
Tone Boosters (ND), “Spectrogram”, Online : Available at :
https://www.toneboosters.com/tb_spectrogram_v1.html
[Accessed 17/10/23]
Ulf A. S. Holbrook (2019), “Sound Objects and Spatial Morphologies” [Online] : Available at :
https://www.cambridge.org/core/journals/organised-sound/article/abs/sound-objects-and-spatial-morphologies/3F20D79AAB66D8E1059F8D27778D7632
[Accessed : 10/10/23]
Unity Technologies (2023), “Documentation”, [Online] : Available at : https://docs.unity.com/ [Accessed : 12/10/23]
University of York (2023), “Spatial audio and virtual acoustics”, [Online] : Available at:
https://www.york.ac.uk/physics-engineering-technology/research/communication-technologies/audio-and-acoustics/spatial-audio-virtual-acoustics/
[Accessed : 26/07/23]
U.P. Svensson (2002), “Modelling acoustic spaces for audio virtual reality”, [Online] : Available at:https://www.researchgate.net/publication/215514469_Modelling_acoustic_spaces_for_audio_virtual_reality
[Accessed : 26/07/23]
Vikash Jugoo et al (2016), “The use of action research in a computer programming module taught using a blended learning environment.”, [Online] : Available at : https://www.researchgate.net/publication/312093693_The_use_of_action_research_in_a_computer_programming_module_taught_using_a_blended_learning_environment
[Accessed 06/10/23]
Wavtones (ND), “Online Audio Frequency Signal Generator”, Online : Available at :
https://www.wavtones.com/functiongenerator.php
[Accessed 17/10/23]
Wildcat (2019), “HRTF Demo - Steam Audio”, [Online] : Available at:
https://youtu.be/c6SDKfHCDm8
[Accessed : 26/07/23]
W.J. Cilliers (2007) “Research methods”, [Online] : Available at :
https://base.socioeco.org/docs/research_methods.pdf
[Accessed 06/10/23]
Xiao-li Zhong et al (2014), “Head-Related Transfer Functions and Virtual Auditory Display”, Soundscape Semiotics, [Online] : Available at : https://www.intechopen.com/chapters/45612 [Accessed : 22/09/23]
Zuber-Skerritt (1991), “Action Research for Change and Development”, Routledge, 1st Edition, [Online] : Available at : https://www.taylorfrancis.com/books/edit/10.4324/9781003248491/action-research-change-development-ortrun-zuber-skerritt
[Accessed : 06/10/23]