Integrating X3D and glTF

By Nicholas Polys, Ph.D.


There is great excitement around the development of new features and formats, from glTF 2.0 to X3D 4.0. This blog provides a high-level view of the leading open formats and how and why they should be used together.

See glTF + X3D integration on WebGL examples:

  • X3DOM demo set – shows numerous glTF models in X3D scenes, including WebVR
  • X_ITE examples – shows numerous glTF models in X3D scenes
  • K Shell’s examples: one and two – shows a glTF model in an X3D scenes
  • Andreas Plesch’s interactive mashup – shows X3D with glTF together with regular X3D Shapes, Chaser nodes,
    Routing, and HTML event integration

The Web

The evolution of the World Wide Web stepped into a new era with the widespread support of the HTML5 Standard. HTML5 and the associated suite of Standards for the Web Platform provide a rich multi-media platform to deliver information and services. More recently, Web browsers have improved the speed of Javascript engines and given access to graphics hardware through WebGL. The result is a performant mix of declarative content, procedural graphics, and event-based logic.

At the Web Application layer, most developers are familiar with the markup, styling, and scripting logic that enables clients and servers to interact through the Document Object Model (DOM): a higher-level abstraction of a document. In contrast, low-level 3D graphics APIs such as OpenGL and WebGL allow developers to program every polygon and shader routine and to work much closer to the hardware. For 3D graphics, the higher-level abstraction is called a scene graph (analogy = 3D document object model). Thus, a high-level API enables developers to work with objects, appearances, lighting, animations, and interactions rather than low-level Graphics Library calls. This rich scene graph is exactly the sweet spot that the ISO-IEC Standards Virtual Reality Modeling Language (VRML) and Extensible 3D (X3D) were designed for.

The ability to access 3D graphics hardware from within the Web browser was a true game changer. Where once, user-installed programs – browser plug-ins – were required to interact with the Metaverse, now Javascript ‘shims’ (libraries and shadow DOMs) enable multi-namespace content to co-exist in the Web browser environment, including interactive 3D visualizations for immersive platforms and mobile phones. VRML and X3D content from the days of SGI and Sun runs in all the Web browsers and several engines with hardware and form-factors we could barely imagine back then (such as 27 million pixel stereo CAVEs, $250 consumer HMD headsets, cardboard fold-ups for WWW mobiles, … !).  Such developments are especially exciting because 3D assets, like documents, can be produced, represented, transformed, composed, and consumed, across the Web and across time.

ISO-IEC Standards: X3D & VRML

Over two decades of graphics innovation and community have proven the durability of declarative Standards for Web3D content and the Standard scene graph. The steady progress of royalty-free, publically-available Web3D ISO-IEC Standards demonstrates the value of interactive 3D content compatible with the WWW.  X3D is forward-compatible with VRML; this means that the interactive 3D worlds built before GPUs, Broadband, or Linux run even faster today. As enterprises realize that getting serious about 3D data requires a strategy longer than Silicon Valley cycles, Standards become central. From infrastructure such as urban and rural development, to power, to scientific results and resources for education, to medical records for a lifetime, the value and investment is clearly in the interoperability and portability of 3D data and scenes.

As new technologies and features stabilize, Standards evolve. The not-for-profit Web3D Consortium is the vehicle for developing and standardizing new extensions and functionalities to the ISO-IEC VRML, HAnim, and X3D Standards. The Extensibility of VRML and X3D has been proven over 25 years of academic papers where leading-edge graphics techniques have been developed, tested, and proposed for standardization. X3D yields 20,200 documents in Google Scholar and 4,440 in Semantic Scholar; VRML yields 84,600 documents in Google Scholar and 13,100 in Semantic Scholar. Since the specifications work through the Web3D Consortium and the ISO-IEC process, the Standards technology is vetted and ratified by experts from around the world.

X3D provides a rich and extensible scene-graph that has stood the test of time for interactive 3D content and proven its adaptability for the evolving WWW, emerging consumer devices, and new graphical techniques, such as Projective Texture Mapping (PTM) and Physically Based Rendering (PBR). X3D is modular, and Version 3.3 has extended its standard components to include support for GIS, Medical, and CAD data. It’s encodings now include XML, utf8, binary; a JSON encoding is proposed and in the pipeline for standardization. X3D’s API bindings are specified through a Unified Object Model for JavaScript and Java, with C#, C++, and Python specified and in the standardization pipeline. X3D’s XML encoding brings the benefits of integrating with the Web Platform; for example, the XML ecosystem supporting compression, encryption and authentication, and digital signatures on 3D assets.

X3D Version 4.0 is currently in development, leveraging many of the lessons from the W3C Declarative 3D Community Group and the open source, plugin-free, WebGL based,  implementations of X3DOM and X_ITE. Both of these Javascript X3D implementations have demonstrated integration with WebGL, HTML5 and the DOM. And both now include the ability to integrate the new kid on the block, glTF 2.0..


glTF (Graphics Library Transfer Format) Version 2.0  is a specification by the Khronos Group designed to be lean and mean: a transport format that could provide binary data for GPUs directly, without needing to be parsed. For example, if you are presenting a webpage with a preview of a complex 3D model on it, you would not necessarily want to incur the time and processing overhead from parsing large arrays of coordinates and indices and UV maps into memory and then pass them to the GPU. This is especially true if your Web application does not need to access or change the specific vertex information through the DOM.

glTF includes a new lighting and material model called Physically Based Rendering (PBR), which specifies different physical properties of a material and its interaction with light in the larger scene (its appearance). glTF also targets the download of objects and animations over the network. Using buffers and accessors described in JSON headers, binary arrays of geometry and animations can be passed directly from the wire to the GPU without being parsed or touching memory. Solving this low-level delivery problem well (and elegantly) is the reason for glTF’s attraction. glTF specifies the data typing for a minimal scene graph that can represent shape, appearances, and animation information. The early Fraunhofer SRC work for binary data delivery in X3D is very similar to the data structures deployed by glTF [[1] . The Khronos Group describes glTF as ‘JPEG of 3D’ (but without licensing or royalties).


The design goal of glTF 2.0 is to be the JPEG of 3D. In the same way, X3D and VRML are the HTML of 3D – a higher level representation that can compose JPEGs (glTF and others) into a Document: a 3D, VR, or AR World. Thus glTF’s sweet spot is focusing on the asset delivery problem between network and GPU. Its core  does not represent many of the crucial ingredients for 3D worlds, including lights, interactivity, and structured metadata functionality, leaving them up to the application. In this way, glTF is much closer to the graphics hardware (lowest common denominator), while X3D is much closer to the Web and the Application layer (greatest common denominator). X3D’s sweet spot is composing interactive scenes (via its Scene Graph) and connecting them with higher-level logic, APIs, and services.

Let’s use some concrete examples:


  • Consider that ‘touching’ a lamp turns on and off a light, or that ‘touching’ a door handle will open or close the door no matter what the display and interaction device. These interactive aspects of a model can be represented in X3D/VRML, but not glTF.
  • Consider an architectural model designed to be appreciated and experienced by a walk-through in first person view. X3D offers built-in navigation and
    avatars for a walk mode, or guided exploration through a series of defined perspectives, with the level of user control managed by the
    model designer: X3D represents this information, but glTF does not.
  • Getting close to a door or inside an elevator will trigger an animation. Other sensors such as visibility, collision, ‘dragging’ can be described in an X3D scene.
  • More complex behaviors and event logic such as shifting gears or creating puzzles and games can be part of the scene, travelling with the model and carrying its interaction semantics.


  • Without Lights, the 3D world is black. In many cases, such as architectural models, lighting placement, type, and color are crucial aspects of the model and its presentation. X3D and tools support the classic Lighting model; in glTF, lighting needs to be set up in the external application, or by use of an optional Extension
  • The material model of many 3D objects is still defined through Blinn-Phong shading Appearances. X3D and tools support these natively, while glTF requires an (yet to be finalized) Extension
  • GLSL Shaders are also supported by X3D and the HTML5 X3D engines; in glTF, these require an Extension
  • PBR rendering is compact and visually attractive, but requires many older models be translated to the new paradigm. PBR is natively supported in the HTML5 X3D engines (see links above), and the subject of X3D 4.0. PBR benefits from environment lights and X3D already includes support for environmental CubeMaps through the CubeMapTexturing Component

Web Informatics

  • Worlds move; objects travel across networks of machines, authors, and consumers. For enterprises, Metadata is a central requirement to track an assets’ provenance, licensing, or to cross-reference with different vocabularies or ontologies. X3D enables multiple Metadata tags to be attached to any node in the scene. gltf has an extension draft for structured metadata; glTF currently has unstructured metadata.
  • URL/URIs are literally the link to connect information and resources over the Web. In X3D, the Anchor semantic is the same as HTML. glTF also uses URIs to reference its buffers and image resources. glTF scenes cannot link to other glTF scenes. In X3D teleporting to another scene is a core feature.
  • It is common to build complex scenes out of simpler objects. Composing worlds from other worlds is done in X3D (and VRML) with the Inline mechanism. In this way, rich X3D worlds can also be built by Inlining glTF assets, as demonstrated by X3DOM and targeted as a X3D v 4.0 extension!
  • Leveraging the XML side of HTML5 provides many benefits for quickly building powerful Web applications. For example, X3D developers can immediately take advantage of the XML ecosystem and W3C Standards, such as compression, encryption, and authentication at the element level.

In 2018, Web3D Community Members have made a feature-by-feature comparison of X3D and glTF showing their different values and complementarity (here). Readers are encouraged to explore and experiment themselves! Tradeoffs are everywhere in real life; in order to make good decisions, we must be well-informed. Hopefully this blog brings a bit more clarity to the value proposition behind each format and its design.


The myriad of applications and enterprises on the Web work at different levels. The Web Platform is designed to support such an ecosystem. Different enterprises will pick different levels of abstraction to build their data investments and their 3D business. X3D can be justified for both wide delivery and long-term durability; the archive quality of ISO-IEC X3D also means that authors can have confidence in their content on timescales larger that Silicon Valley lifecycles.

Understanding the role goal of each technology, its community, specifications, and Standards are essential for a successful approach and strategy. If the car model simply drives around the world, use glTF for the car model; if you want to make sure the gears and doors, lights and cameras work as expected, use X3D.  If the appearance of the car should be exchangeable by changing a single attribute in the DOM, use X3D. If you want your 3D scenes and model data to be viable in 10, 20 plus years, use X3D.

The main differences can be summarized as follows:

X3D and glTF differences


declarative & DOM-readynon-declarative
core support for scene linkage, Inlining, and Metadata
no scene linkage

describe complex scenes with multiple models, lighting, and interaction

Describe objects geometry and their optical properties

Not-for-profit and ISO-IEC recognition of a Standard

Not-for-profit publication of a Specification

Phong material + GLSL
(X3D 4.0 will support PBR by Inlining glTF 2.0 models)

PBR material

Just like HTML Web pages compose and layout images (JPEGs), text, and multimedia into documents, X3D worlds compose models (like glTF), images, text, and multimedia into scenes. At the higher-level of declarative scene graphs and DOM-integrated Web applications, there are numerous strategies and tools.  On the production side, a number of open source tools support X3D and VRML content creation including Blender, MeshLab, OpenCascade, PostGIS, Titania, Paraview, Chimera, VMD, etc. On the client side, the open source X3DOM and X_ITE demonstrate the power of compatibility for interactive 3D delivery in the Web ecosystem in a HTML like, declarative style.

Both X3D 4.0 and glTF 2.0 are under development and it is an exciting time for groups to get involved and build the best of both worlds! Current work in the Web3D Consortium includes investigating methods to access the internal assets of Inlined glTF scenegraphs, and the integration of characters with the forthcoming Humanoid Animation 2.0 (HANIM) ISO-IEC standard.


[1] M. Limper, M. Thöner, J. Behr, D. W. Fellner: “SRC – a streamable format for generalized web-based 3D data transmission”, Proceedings of ACM Web 3D 2014, pp. 35-43,]

More Posts