【新闻周】2024年双语主播台 “新闻早知道”第八期新闻播报

发布者：潘苹发布时间：2024-05-13游览次数：55

Vidu

China's Shengshu Technology and Tsinghua University have unveiled Vidu, a text-to-video model capable of

generating 16-second clips at 1080p resolution with a single click.

The announcement was made at the 2024 Zhongguancun Forum in Beijing, where they tried to position Vidu as

a strong competitor to OpenAI's Sora.

Like Sora, Vidu is capable of producing 16-second clips at 1080p resolution.

Vidu is based on a Universal Vision Transformer (U-ViT) architecture, which the company says allows it to

simulate the real physical world with multi-camera view generation.

This architecture was reportedly developed by the Shengshu Technology team in September 2022 and as such

would predate the diffusion transformer (DiT) architecture used by Sora.

According to the company, Vidu can generate videos with complex scenes adhering to real-world physics, such

as realistic lighting and shadows, and detailed facial expressions.

The model also demonstrates a rich imagination, creating non-existent, surreal content with depth and

complexity.

Vidu's multi-camera capabilities allows for the generation of dynamic shots, seamlessly transitioning between

long shots, close-ups, and medium shots within a single scene.

The company, in its demo, attempted to recreate similar scenes that were previously shared by OpenAI during

the release of Sora.

And while Vidu is an impressive accomplishment and a testament to China's rapid progress in AI research, a

side-by-side comparison with Sora reveals that the generated videos are not at Sora's level of realism.

The output, while impressive, falls short in terms of visual fidelity.

However, it is important to acknowledge that the temporal consistency achieved by Vidu is commendable, and

this technology has the potential for further refinement and improvement over time.

—————每日英语听力

文稿｜夏嘉宝

播报｜王心羽

审核｜胡海燕盛雨濛李雪英

排版｜田卓雯