Codex Blender Reconstruction Benchmark

Dataset Layout

The source/ folder contains the ground-truth geometry. The oneshot/ folder contains single rendered references, and muti-view-6-ortho/ contains front, back, left, right, top, and bottom orthographic inputs. Generated results live in ai-oneshot/ and ai-mv6o/. The full set is wooden chair, teapot, teacup, spoon, abacus, acorn, acoustic guitar, anchor, and a stylized anime character.

Review Method

The review is based on the provided renders plus read-only Blender scene metadata: object counts, mesh counts, material counts, and vertex/face totals. No mesh or image assets were edited during this pass.

Overall Pattern

One-shot reconstruction produced stronger canonical silhouettes for the cup, teapot, and chair. The six-view run helped most on the spoon, but it also introduced orthographic-view artifacts and occasional over-modeling. The full set adds varied topology, scale, repetition, thin details, symmetry, and character-shape challenges.

At a glance

Scorecard

Object	Source complexity	One-shot result	Six-view result	Best reconstruction
Wooden chair	172 vertices, 146 faces	Recognizable chair; many primitives; simplified grain.	Thicker, more textured, but back-heavy and proportionally off.	One-shot
Teapot	3,241 vertices, 3,464 faces	Strong body/lid/spout/handle read; clean but simplified.	Captures width and rings, with extra pads and a kinked spout.	One-shot
Teacup	2,659 vertices, 2,600 faces	Best silhouette match; cup, saucer, rim, and handle are coherent.	Good rim and saucer detail, but faceting and weak handle structure.	One-shot
Spoon	1,571 vertices, 1,555 faces	Flat, oversized bowl; boundary artifacts dominate.	Cleaner spoon silhouette and handle; bowl still too shallow.	Six-view
Abacus	5,840 vertices, 4,940 faces	Strong repeated-bead structure; simplified frame and bead variation.	Cleaner orthographic alignment with five rods and split bead groups.	Six-view
Acorn	354 vertices, 352 faces	Readable low-poly cap/body split with strong silhouette.	Closer frontal framing; still simplified surface detail.	One-shot
Acoustic guitar	5,684 vertices, 5,706 faces	Identifies strings, neck, headstock, and sound hole, but body shape drifts.	More complete guitar grammar with cleaner outline and bridge detail.	Six-view
Anchor	858 vertices, 878 faces	Strong anchor read with ring, stock, shank, arms, and flukes.	Similar structure with sharper flukes and more front-facing symmetry.	Six-view
Anime girl casual outfit	95,128 vertices, 126,768 faces	Readable T-pose character; simplified limbs, hair, and outfit detail.	More centered and consistent; still procedural and low-detail.	Six-view

Per-object review

Results

Wooden Chair

The source is a compact rustic chair with rounded wooden members, a slatted seat, three horizontal back rails, angled legs, and visible wood texture.

Source .blend

Wooden chair one-shot input — One-shot input

Wooden chair one-shot reconstruction render — One-shot output

Wooden chair six-view reconstruction render — Six-view output

One-shot review

This is the most readable chair result. It reconstructs the seat, front legs, rear uprights, back rails, and lower stretcher. The result uses 137 mesh objects and 1,492 vertices, which suggests a procedural assembly of simple parts rather than a compact mesh. The main misses are material fidelity and organic construction: the wood appears as light cylinders with dark scratch-like streaks, and the rustic irregularity of the source is mostly flattened.

Open generated .blend

Six-view review

The six-view result has fewer objects and a stronger procedural wood pattern, but the modeled chair is over-thick and back-dominant. It adds an extra back rail and turns several cylindrical members into large rectangular posts. As a 3D object it is more textured, but as a reconstruction it drifts further from the source proportions.

Open generated .blend

Teapot

The source is a squat white teapot with an oval body, domed lid, small knob, loop handle, spout, and subtle rim/foot-ring details.

Source .obj

Teapot one-shot reconstruction render — One-shot output

Teapot six-view reconstruction render — Six-view output

One-shot review

The one-shot teapot is a strong semantic reconstruction: the rounded body, lid, knob, spout, handle, and foot ring all land in the expected places. It is smoother and cleaner than the source preview, and the spout opening is simplified into a blunt capped end, but the whole object reads correctly from normal viewing distance.

Open generated .blend

Six-view review

The six-view teapot uses 20 mesh objects and captures the flattened body and rim stack, but it over-interprets view cues as extra side pads and a vertical front feature. The spout bends into a segmented, kinked tube with a visible cap. It is useful as a diagnostic example: more views did not automatically mean a cleaner fused object.

Open generated .blend

Teacup

The source is a white cup and saucer with a tapered cup wall, rounded lip, circular saucer, small foot ring, and a simple C-shaped handle.

Source .obj

Teacup one-shot reconstruction render — One-shot output

Teacup six-view reconstruction render — Six-view output

One-shot review

This is one of the best one-shot results. It captures the cup taper, open top, saucer, handle, rim, inner shadow, and foot ring. The geometry is idealized and the render is cropped tightly, but the reconstruction preserves the main object grammar with few distracting artifacts.

Open generated .blend

Six-view review

The six-view teacup adds more mechanical rim and saucer detail, yet the result becomes less faithful. The saucer turns faceted, and the handle is reduced to a front-facing vertical cue rather than a clear loop. It demonstrates a common six-view failure: local view evidence is preserved, but global 3D consistency weakens.

Open generated .blend

Spoon

The source is a white ceramic spoon with a shallow oval bowl, smooth neck, tapered handle, rounded end, and subtle concavity.

Source .obj

Spoon one-shot reconstruction render — One-shot output

Spoon six-view reconstruction render — Six-view output

One-shot review

The one-shot spoon is the weakest output in the set. It understands that the object is long, white, and flat, but the bowl becomes an oversized plate-like polygon with boundary artifacts and a decorative tan edge. The shallow concavity is indicated by a grey patch rather than convincing bowl geometry.

Open generated .blend

Six-view review

The six-view spoon is the clearest improvement from added views. It has a cleaner handle, neck, and bowl relationship, and the object reads as a single ceramic spoon. The bowl remains too shallow and ends in a central dimple, but the silhouette is far more controlled than the one-shot version.

Open generated .blend

Abacus

The first new reconstruction target uses repeated beads, parallel rods, and a simple side-frame structure. It tests whether a modeling agent can preserve count, spacing, and alignment without importing the source mesh.

Source .obj

Abacus one-shot reconstruction render — One-shot output

Abacus six-view reconstruction render — Six-view output

One-shot review

The one-shot abacus captures the important part grammar: side supports, top rail, horizontal rods, and repeated bead groups. It reads correctly, though the render is cropped close and the source's exact five-row layout is interpreted procedurally rather than copied.

Open generated .blend

Six-view review

The six-view output is more frontally organized and preserves the five rod rows clearly. Beads are simplified ellipsoids and the wooden side posts are blockier than the source, but the result is coherent and useful as a reconstruction baseline.

Open generated .blend

Acorn

The acorn target is compact and mostly radial, with a low-poly cap, dark nut body, lip band, pointed bottom, and short angled stem. It is a useful contrast to the more mechanical targets.

Source .obj

Acorn one-shot reconstruction render — One-shot output

Acorn six-view reconstruction render — Six-view output

One-shot review

The one-shot version captures the acorn's main stacked structure: dark lower nut, broad lighter cap, lip shadow, faceted sides, and short stem. It exaggerates the cap as a cleaner geometric cone, but the object identity is clear.

Open generated .blend

Six-view review

The six-view version is framed more like a reference sheet result and keeps the cap/body separation crisp. It remains procedural and simplified, with low-poly facets standing in for the source's subtler organic surface.

Open generated .blend

Acoustic Guitar

The guitar adds thin strings, small tuning hardware, a tall neck, sound hole, bridge, and a broad resonant body. It is the most detail-heavy new asset.

Source .obj

Acoustic guitar one-shot input — One-shot input

Acoustic guitar one-shot reconstruction render — One-shot output

Acoustic guitar six-view reconstruction render — Six-view output

One-shot review

The one-shot output gets the instrument identity right: strings, neck, frets, headstock, tuning pegs, bridge, and sound hole are all represented. The body becomes too hourglass-like and upright, and the render includes a simple floor plane, but the part vocabulary is strong for a single image.

Open generated .blend

Six-view review

The six-view reconstruction improves the full-body outline and uses more complete edge binding, rosette, bridge, and peg details. It still relies on procedural surfaces rather than true hollow construction, but it is a cleaner benchmark target than the one-shot version.

Open generated .blend

Acoustic guitar front orthographic input

Acoustic guitar right orthographic input

Acoustic guitar bottom orthographic input

Anchor

The anchor target is mostly symmetrical and metallic, with a top ring, horizontal stock, central shank, curved arms, and pointed flukes.

Source .obj

Anchor one-shot reconstruction render — One-shot output

Anchor six-view reconstruction render — Six-view output

One-shot review

The one-shot anchor is a strong semantic match. It reconstructs the ring, crossbar, center shank, curved arms, collars, and pointed flukes. Some curvature is simplified into primitive arcs and planar fluke surfaces, but the result reads clearly.

Open generated .blend

Six-view review

The six-view version keeps the same part structure with a more frontal, symmetrical presentation and sharper fluke plates. It is still procedural rather than forged, but it preserves the major silhouette cues across the orthographic references.

Open generated .blend

Anime Girl Casual Outfit

The character target is a detailed stylized human in a T-pose, with long brown hair, large anime eyes, a white crop top, black athletic shorts, bare legs, and sneakers. The source asset is attributed to the linked Sketchfab model.

Source .glb Attribution

Anime girl casual outfit one-shot input — One-shot input

Anime girl casual outfit one-shot reconstruction render — One-shot output

Anime girl casual outfit six-view reconstruction render — Six-view output

One-shot review

The one-shot reconstruction identifies the major character cues: T-pose arms, head and neck, large eyes, long hair mass, white top, black shorts, legs, and shoes. It is intentionally procedural and far simpler than the source, with cylindrical limbs, blocky hair locks, approximate clothing boundaries, and no fine facial or fabric detail.

Open generated .blend

Six-view review

The six-view reconstruction is more centered and consistent as a benchmark render. It keeps the face, hair curtain, outstretched arms, outfit blocks, shoes, and simplified body proportions readable from the main view. The result still compresses the original character into primitives, so hands, hair strands, laces, folds, and material nuance remain approximations.

Open generated .blend

Anime girl casual outfit front orthographic input

Anime girl casual outfit back orthographic input

Anime girl casual outfit left orthographic input

Anime girl casual outfit right orthographic input

Anime girl casual outfit top orthographic input

Anime girl casual outfit bottom orthographic input

What the benchmark shows

Takeaways

Canonical objects are handled better than subtle geometry.

The teapot and teacup benefit from familiar object priors: body, lid, handle, rim, and saucer are reconstructed even from a single render. The spoon's shallow concavity is much harder because it depends on small depth cues rather than named parts.

More views help when the main ambiguity is plan shape.

The six-view spoon improves because top and side information constrain the long handle and bowl outline. That same extra information does not guarantee better results for the chair, teacup, or teapot, where part fusion and proportion matter more.

View fusion is the main bottleneck.

Six-view outputs sometimes preserve orthographic cues as literal geometry: teapot pads, teacup faceting, and stored reference planes/materials in the Blender files. Better reconstruction would need stronger constraints for symmetry, continuity, and part identity across views.