

When Tencent Street View invited us to tackle their patchwork of masked frames—think pedestrians, scaffolding, or obtrusive cars—we jumped at the chance. The image completion engine we delivered now fills gaps so convincingly it helped the program earn the First Prize of the China Electronics Society Science and Technology Progress Awards.
What we built
- Structure-aware fill. Instead of postcard-style diffusion, we baked architectural priors into the generator so it respects vanishing lines, facade symmetries, and the glitter of traffic at night.
- Speed-first pipeline. Tencent ingests millions of frames per day, so we rewrote the inference stack with TensorRT plugins, mixed precision, and patch-wise scheduling to hit near-real-time throughput.
- Quality gates. A discriminator trained on historical panoramas rejects uncanny fills; flagged tiles fall back to traditional photogrammetry so editors keep trust in the automation.
Why it was hard
Street scenes are chaotic. Street lamps slice through trees, window reflections shift with every meter, and anything larger than a scooter moves while you render. Getting the generator to anchor to global context without hallucinating new buildings required layered attention windows and copious augmentation.
Where it lands
- Digital preservation. Historic streets shot during renovation can be restored for cultural archives.
- AR navigation. Cleaned background plates make it easier to overlay arrows and contextual widgets without masking artifacts.
What’s next
We are experimenting with semantic edit controls so operators can request “remove truck, keep shadows” or “recover mural colors,” and we are pairing the tool with LiDAR priors to stay honest about geometry.



