AI + Voice Cloning + Face Animation + NLP

NostalgiQ

An AI memory preservation toolkit that brings old photos to life. Upload a photograph, feed it stories and voice recordings, and NostalgiQ generates a talking, animated version of that person, preserving their likeness, personality, and voice for future generations. View source →

01 Overview

Preserve memories as living, speaking portraits

NostalgiQ combines face detection, voice cloning, talking head generation, and personality prediction into a single pipeline. Give it a photo and some text or audio, and it produces a video of that person speaking in their own voice with their own mannerisms. Built for families preserving the memory of loved ones.

AI Models

Core Modules

Video APIs

NLP

Personality Engine

03 Core Modules

Four systems working together

Face Analysis Pipeline

Detects faces in photos and videos using InsightFace, DeepFace, and MediaPipe. Clusters identities across multiple images, estimates age, extracts facial landmarks, and generates scene descriptions with CLIP. Object detection via YOLOv8 and text extraction via EasyOCR provide additional context. Outputs cropped faces and a metadata.json with the full analysis.

Talking Video Generation

Three engines available: SadTalker for local, realistic talking head videos from a single image plus audio. HeyGen API for cloud-based generation from a public image URL and text. D-ID API as an alternative cloud option. Each takes a still photograph and produces a video of that person speaking.

Voice and Text Processing

ElevenLabs voice cloning creates a synthetic voice from audio samples. Whisper handles speech-to-text transcription. Gemini generates conversational responses in the style of the person based on their writing samples and personality profile.

Upload Photo

A photograph of the person. The face pipeline detects, crops, and analyzes the face automatically.

Provide Voice / Text

Audio recordings for voice cloning, or text samples for personality prediction and speech synthesis.

Generate Portrait

SadTalker, HeyGen, or D-ID animates the face to match the synthesized speech. The result is a video of the person talking.

Interact

Ask questions and receive responses in the person's voice and personality, powered by Gemini and the NLP personality model.

04 Technology Stack

What powers the system

👁

Face Analysis

InsightFace and DeepFace for detection and identity clustering. MediaPipe for landmark extraction. CLIP for scene understanding. YOLOv8 for object detection.

InsightFaceDeepFaceMediaPipe

🎤

Voice Cloning

ElevenLabs API clones a voice from audio samples. The synthetic voice speaks new text while preserving the original tone, cadence, and character.

ElevenLabsVoice Synthesis

🎬

Talking Head Generation

SadTalker for local inference, HeyGen and D-ID for cloud-based video generation. Each takes a still image and audio to produce a realistic speaking video.

SadTalkerHeyGenD-ID

🧠

NLP + Personality

Transformer-based personality prediction from text samples. Gemini generates responses matching the predicted personality. Whisper transcribes audio to text.

GeminiWhisperTransformers

💻

Frontend

React/TypeScript interface (App.tsx) for uploading photos, recording audio, and viewing the generated talking portraits. Clean, emotional UI design.

ReactTypeScriptVite

⚙

Backend

Python Flask backend orchestrating the pipeline. Manages model inference, API calls, file processing, and serves the generated video output.

PythonFlaskREST API