
Great Week For AI!
Monday: Stanford Alpaca 7B
Tuesday: GPT4, Anthropic releases Claude, Google's PaLM API, AdeptAI raises $350M and Google adds GenAI to workspaces
Wednesday: Pytorch 2.0 and MidjourneyV5
Thursday: Microsoft 365 Copilot
Memoji on Steroids: This AI Model Can Reconstruct 3D Avatars from Videos. Time to meet Vid2Avatar. A tool that can generate high-fidelity 3D avatars from videos captured in the wild. Vid2Avatar learns 3D human avatars from in-the-wild videos. It does not need without need ground truth supervision, priors extracted from large datasets, or any external segmentation modules. You just give it a video of someone, and it will generate a robust 3D avatar for you.

LERF (Language Embedded Radiance Fields): LERF optimizes a dense, multi-scale language 3D field by volume rendering CLIP embeddings along training rays, supervising these embeddings with multi-scale CLIP features across multi-view training images. After optimization, LERF can extract 3D relevancy maps for language queries interactively in real-time. LERF enables pixel-aligned queries of the distilled 3D CLIP embeddings without relying on region proposals, masks, or fine-tuning, supporting long-tail open-vocabulary queries hierarchically across the volume.

Automatic Reasoning & Tool-Use of LLMs:Automatic Reasoning and Tool-use (ART) is a framework that uses frozen LLMs to automatically generate intermediate reasoning steps as a program. Given a new task to solve, ART selects demonstrations of multi-step reasoning and tool use from a task library. At test time, ART seamlessly pauses generation whenever external tools are called, and integrates their output before resuming generation. ART achieves a substantial improvement over few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks, and matches performance of hand-crafted CoT prompts on a majority of these tasks.
Microsoft Proposes VALL-E X: Cross-lingual speech synthesis is basically an approach for transmitting a speaker’s voice from one language to another. The cross-lingual neural codec language model that the researchers have introduced is called VALL-E X. It is an extended version of the VALL-E Text to speech model, which has been developed by acquiring strong in-context learning capabilities from the VALL-E TTS model.
Do You Know Marktechpost has a community of 1.5 Million+ AI Professionals and Engineers?