Dataset Layout

The source/ folder contains the ground-truth geometry. The oneshot/ folder contains single rendered references, and muti-view-6-ortho/ contains front, back, left, right, top, and bottom orthographic inputs. Generated results live in ai-oneshot/ and ai-mv6o/. The full set is wooden chair, teapot, teacup, spoon, abacus, acorn, acoustic guitar, anchor, and a stylized anime character.

Review Method

The review is based on the provided renders plus read-only Blender scene metadata: object counts, mesh counts, material counts, and vertex/face totals. No mesh or image assets were edited during this pass.

Overall Pattern

One-shot reconstruction produced stronger canonical silhouettes for the cup, teapot, and chair. The six-view run helped most on the spoon, but it also introduced orthographic-view artifacts and occasional over-modeling. The full set adds varied topology, scale, repetition, thin details, symmetry, and character-shape challenges.

At a glance

Scorecard

Object Source complexity One-shot result Six-view result Best reconstruction
Wooden chair 172 vertices, 146 faces Recognizable chair; many primitives; simplified grain. Thicker, more textured, but back-heavy and proportionally off. One-shot
Teapot 3,241 vertices, 3,464 faces Strong body/lid/spout/handle read; clean but simplified. Captures width and rings, with extra pads and a kinked spout. One-shot
Teacup 2,659 vertices, 2,600 faces Best silhouette match; cup, saucer, rim, and handle are coherent. Good rim and saucer detail, but faceting and weak handle structure. One-shot
Spoon 1,571 vertices, 1,555 faces Flat, oversized bowl; boundary artifacts dominate. Cleaner spoon silhouette and handle; bowl still too shallow. Six-view
Abacus 5,840 vertices, 4,940 faces Strong repeated-bead structure; simplified frame and bead variation. Cleaner orthographic alignment with five rods and split bead groups. Six-view
Acorn 354 vertices, 352 faces Readable low-poly cap/body split with strong silhouette. Closer frontal framing; still simplified surface detail. One-shot
Acoustic guitar 5,684 vertices, 5,706 faces Identifies strings, neck, headstock, and sound hole, but body shape drifts. More complete guitar grammar with cleaner outline and bridge detail. Six-view
Anchor 858 vertices, 878 faces Strong anchor read with ring, stock, shank, arms, and flukes. Similar structure with sharper flukes and more front-facing symmetry. Six-view
Anime girl casual outfit 95,128 vertices, 126,768 faces Readable T-pose character; simplified limbs, hair, and outfit detail. More centered and consistent; still procedural and low-detail. Six-view

Per-object review

Results

Wooden Chair

The source is a compact rustic chair with rounded wooden members, a slatted seat, three horizontal back rails, angled legs, and visible wood texture.

Source .blend
Wooden chair one-shot input
One-shot input
Wooden chair one-shot reconstruction render
One-shot output
Wooden chair six-view reconstruction render
Six-view output

One-shot review

This is the most readable chair result. It reconstructs the seat, front legs, rear uprights, back rails, and lower stretcher. The result uses 137 mesh objects and 1,492 vertices, which suggests a procedural assembly of simple parts rather than a compact mesh. The main misses are material fidelity and organic construction: the wood appears as light cylinders with dark scratch-like streaks, and the rustic irregularity of the source is mostly flattened.

Open generated .blend

Six-view review

The six-view result has fewer objects and a stronger procedural wood pattern, but the modeled chair is over-thick and back-dominant. It adds an extra back rail and turns several cylindrical members into large rectangular posts. As a 3D object it is more textured, but as a reconstruction it drifts further from the source proportions.

Open generated .blend
Wooden chair front orthographic input Wooden chair back orthographic input Wooden chair left orthographic input Wooden chair right orthographic input Wooden chair top orthographic input Wooden chair bottom orthographic input

Teapot

The source is a squat white teapot with an oval body, domed lid, small knob, loop handle, spout, and subtle rim/foot-ring details.

Source .obj
Teapot one-shot input
One-shot input
Teapot one-shot reconstruction render
One-shot output
Teapot six-view reconstruction render
Six-view output

One-shot review

The one-shot teapot is a strong semantic reconstruction: the rounded body, lid, knob, spout, handle, and foot ring all land in the expected places. It is smoother and cleaner than the source preview, and the spout opening is simplified into a blunt capped end, but the whole object reads correctly from normal viewing distance.

Open generated .blend

Six-view review

The six-view teapot uses 20 mesh objects and captures the flattened body and rim stack, but it over-interprets view cues as extra side pads and a vertical front feature. The spout bends into a segmented, kinked tube with a visible cap. It is useful as a diagnostic example: more views did not automatically mean a cleaner fused object.

Open generated .blend
Teapot front orthographic input Teapot back orthographic input Teapot left orthographic input Teapot right orthographic input Teapot top orthographic input Teapot bottom orthographic input

Teacup

The source is a white cup and saucer with a tapered cup wall, rounded lip, circular saucer, small foot ring, and a simple C-shaped handle.

Source .obj
Teacup one-shot input
One-shot input
Teacup one-shot reconstruction render
One-shot output
Teacup six-view reconstruction render
Six-view output

One-shot review

This is one of the best one-shot results. It captures the cup taper, open top, saucer, handle, rim, inner shadow, and foot ring. The geometry is idealized and the render is cropped tightly, but the reconstruction preserves the main object grammar with few distracting artifacts.

Open generated .blend

Six-view review

The six-view teacup adds more mechanical rim and saucer detail, yet the result becomes less faithful. The saucer turns faceted, and the handle is reduced to a front-facing vertical cue rather than a clear loop. It demonstrates a common six-view failure: local view evidence is preserved, but global 3D consistency weakens.

Open generated .blend
Teacup front orthographic input Teacup back orthographic input Teacup left orthographic input Teacup right orthographic input Teacup top orthographic input Teacup bottom orthographic input

Spoon

The source is a white ceramic spoon with a shallow oval bowl, smooth neck, tapered handle, rounded end, and subtle concavity.

Source .obj
Spoon one-shot input
One-shot input
Spoon one-shot reconstruction render
One-shot output
Spoon six-view reconstruction render
Six-view output

One-shot review

The one-shot spoon is the weakest output in the set. It understands that the object is long, white, and flat, but the bowl becomes an oversized plate-like polygon with boundary artifacts and a decorative tan edge. The shallow concavity is indicated by a grey patch rather than convincing bowl geometry.

Open generated .blend

Six-view review

The six-view spoon is the clearest improvement from added views. It has a cleaner handle, neck, and bowl relationship, and the object reads as a single ceramic spoon. The bowl remains too shallow and ends in a central dimple, but the silhouette is far more controlled than the one-shot version.

Open generated .blend
Spoon front orthographic input Spoon back orthographic input Spoon left orthographic input Spoon right orthographic input Spoon top orthographic input Spoon bottom orthographic input

Abacus

The first new reconstruction target uses repeated beads, parallel rods, and a simple side-frame structure. It tests whether a modeling agent can preserve count, spacing, and alignment without importing the source mesh.

Source .obj
Abacus one-shot input
One-shot input
Abacus one-shot reconstruction render
One-shot output
Abacus six-view reconstruction render
Six-view output

One-shot review

The one-shot abacus captures the important part grammar: side supports, top rail, horizontal rods, and repeated bead groups. It reads correctly, though the render is cropped close and the source's exact five-row layout is interpreted procedurally rather than copied.

Open generated .blend

Six-view review

The six-view output is more frontally organized and preserves the five rod rows clearly. Beads are simplified ellipsoids and the wooden side posts are blockier than the source, but the result is coherent and useful as a reconstruction baseline.

Open generated .blend
Abacus front orthographic input Abacus back orthographic input Abacus left orthographic input Abacus right orthographic input Abacus top orthographic input Abacus bottom orthographic input

Acorn

The acorn target is compact and mostly radial, with a low-poly cap, dark nut body, lip band, pointed bottom, and short angled stem. It is a useful contrast to the more mechanical targets.

Source .obj
Acorn one-shot input
One-shot input
Acorn one-shot reconstruction render
One-shot output
Acorn six-view reconstruction render
Six-view output

One-shot review

The one-shot version captures the acorn's main stacked structure: dark lower nut, broad lighter cap, lip shadow, faceted sides, and short stem. It exaggerates the cap as a cleaner geometric cone, but the object identity is clear.

Open generated .blend

Six-view review

The six-view version is framed more like a reference sheet result and keeps the cap/body separation crisp. It remains procedural and simplified, with low-poly facets standing in for the source's subtler organic surface.

Open generated .blend
Acorn front orthographic input Acorn back orthographic input Acorn left orthographic input Acorn right orthographic input Acorn top orthographic input Acorn bottom orthographic input

Acoustic Guitar

The guitar adds thin strings, small tuning hardware, a tall neck, sound hole, bridge, and a broad resonant body. It is the most detail-heavy new asset.

Source .obj
Acoustic guitar one-shot input
One-shot input
Acoustic guitar one-shot reconstruction render
One-shot output
Acoustic guitar six-view reconstruction render
Six-view output

One-shot review

The one-shot output gets the instrument identity right: strings, neck, frets, headstock, tuning pegs, bridge, and sound hole are all represented. The body becomes too hourglass-like and upright, and the render includes a simple floor plane, but the part vocabulary is strong for a single image.

Open generated .blend

Six-view review

The six-view reconstruction improves the full-body outline and uses more complete edge binding, rosette, bridge, and peg details. It still relies on procedural surfaces rather than true hollow construction, but it is a cleaner benchmark target than the one-shot version.

Open generated .blend
Acoustic guitar front orthographic input Acoustic guitar back orthographic input Acoustic guitar left orthographic input Acoustic guitar right orthographic input Acoustic guitar top orthographic input Acoustic guitar bottom orthographic input

Anchor

The anchor target is mostly symmetrical and metallic, with a top ring, horizontal stock, central shank, curved arms, and pointed flukes.

Source .obj
Anchor one-shot input
One-shot input
Anchor one-shot reconstruction render
One-shot output
Anchor six-view reconstruction render
Six-view output

One-shot review

The one-shot anchor is a strong semantic match. It reconstructs the ring, crossbar, center shank, curved arms, collars, and pointed flukes. Some curvature is simplified into primitive arcs and planar fluke surfaces, but the result reads clearly.

Open generated .blend

Six-view review

The six-view version keeps the same part structure with a more frontal, symmetrical presentation and sharper fluke plates. It is still procedural rather than forged, but it preserves the major silhouette cues across the orthographic references.

Open generated .blend
Anchor front orthographic input Anchor back orthographic input Anchor left orthographic input Anchor right orthographic input Anchor top orthographic input Anchor bottom orthographic input

Anime Girl Casual Outfit

The character target is a detailed stylized human in a T-pose, with long brown hair, large anime eyes, a white crop top, black athletic shorts, bare legs, and sneakers. The source asset is attributed to the linked Sketchfab model.

Anime girl casual outfit one-shot input
One-shot input
Anime girl casual outfit one-shot reconstruction render
One-shot output
Anime girl casual outfit six-view reconstruction render
Six-view output

One-shot review

The one-shot reconstruction identifies the major character cues: T-pose arms, head and neck, large eyes, long hair mass, white top, black shorts, legs, and shoes. It is intentionally procedural and far simpler than the source, with cylindrical limbs, blocky hair locks, approximate clothing boundaries, and no fine facial or fabric detail.

Open generated .blend

Six-view review

The six-view reconstruction is more centered and consistent as a benchmark render. It keeps the face, hair curtain, outstretched arms, outfit blocks, shoes, and simplified body proportions readable from the main view. The result still compresses the original character into primitives, so hands, hair strands, laces, folds, and material nuance remain approximations.

Open generated .blend
Anime girl casual outfit front orthographic input Anime girl casual outfit back orthographic input Anime girl casual outfit left orthographic input Anime girl casual outfit right orthographic input Anime girl casual outfit top orthographic input Anime girl casual outfit bottom orthographic input

What the benchmark shows

Takeaways

Canonical objects are handled better than subtle geometry.

The teapot and teacup benefit from familiar object priors: body, lid, handle, rim, and saucer are reconstructed even from a single render. The spoon's shallow concavity is much harder because it depends on small depth cues rather than named parts.

More views help when the main ambiguity is plan shape.

The six-view spoon improves because top and side information constrain the long handle and bowl outline. That same extra information does not guarantee better results for the chair, teacup, or teapot, where part fusion and proportion matter more.

View fusion is the main bottleneck.

Six-view outputs sometimes preserve orthographic cues as literal geometry: teapot pads, teacup faceting, and stored reference planes/materials in the Blender files. Better reconstruction would need stronger constraints for symmetry, continuity, and part identity across views.