Supercharge Your Media Cataloging with VSN Explorer's AI Integration

Blogs

Supercharge Your Media Cataloging with VSN Explorer's AI Integration

3 dic 2025

Tecnología

Welcome back to a new VSN Explorer session! Today, we're thrilled to introduce one of our most powerful new features: integration with Artificial Intelligence (AI).

This integration is designed to solve a problem that many of you have reported: having a large volume of video content in your system, but not having the time or resources for full cataloging. The AI integration in VSN Explorer offers an automated solution to this challenge.

The Power of Automated Pre-Cataloging

Imagine having a massive library of video assets. While human validation will always be necessary, the system can now perform an automatic first-pass pre-cataloging, drastically reducing the manual effort required.

Let's dive into VSN Explorer to see this in action.

A New Layer of Segmentation

When you access an asset in VSN Explorer, you'll notice a significant addition in the segment section. We've introduced a new segment layer that leverages the power of AI to perform several crucial tasks:

Transcription: The AI reads and transcribes the entire video's audio.
Speaker/Character Detection: It detects the people or characters present in the video.
System Upload: All this data is automatically uploaded to the system, allowing the end-user (you!) to review, adjust, and utilize the full transcript, including who is speaking at any given moment.

Speaker Detection and Training

For the AI to function optimally, the system requires a specialized layer, which we call 'Speech Talks'. This feature displays the video's transcription. For known figures, like the famous chef Karlos Arguiñano in our first example, the system can automatically display the speaker's name and image.

📝 Important: This automatic naming requires pre-configuration and training within the Thesaurus. You teach the system: "When you find this person, assign this photo and this name." Once trained, the system will recognize them automatically in future assets.

Deeper Dives: News and Complex Assets

For a shorter video, finding information is simple. But for longer, more complex assets, like a 30-minute news bulletin, the new features shine.

In the segments tab of a complex asset, you now have two powerful search options:

1. In-Text Search

The In-Text feature recognizes and highlights key entities within the transcription:

People: Easily search for people mentioned (e.g., "Buffon" or "Santa Claus"). Clicking a name immediately takes you to the point in the video where they are mentioned.
Places: Search for locations (e.g., "Madrid"). The system highlights all six instances in the transcript and allows you to jump from one mention to the next using the play button.
Organizations, etc.: The AI can detect and categorize various entities, making it simple to navigate large amounts of text.

2. Visual Recognition

The system is also capable of reading what appears visually in the video:

Speakers/Faces: The AI detects faces and speakers. If a speaker is untrained, they will be labeled generically (e.g., "Speaker 9"). However, once you train the system (e.g., "Speaker 9 is X person"), it correctly tags them going forward, even showing their face next to the segment.
OCR (Optical Character Recognition): Any text appearing on screen, such as graphics or lower thirds, is automatically captured and searchable.

Advanced Search Capabilities

The true game-changer is how this data is integrated into VSN Explorer's search functionality. You don't have to search only for the video's main metadata; you can search deep within the AI-generated segment metadata.

In the Advanced Search, you can now filter by:

All Text: Search across all recognized text, including transcription and graphics.
Specific Speaker: Search for moments when a specific person is talking or when their face appears.

Combining Metadata for Pinpoint Accuracy

You can also combine AI segment metadata with the system's core metadata for incredibly precise searches. For example, you could ask VSN Explorer to find:

"All assets created after January 1st, 2025, where the AI has detected the face of Karlos Arguiñano."

This allows for exponentially more complex and accurate searches, saving countless hours of manual review.

Refining and Adjusting the AI Data

Since the system provides a preliminary catalog, you have the option to refine it:

Editing Transcripts: You can enter edit mode and adjust the transcription if you find an error (e.g., changing "cebolla" (onion) to "sartén" (frying pan)).
Adjusting Timings: The start and end times of segments can be precisely adjusted.
Replacing Recognized Faces/Speakers: If the AI misidentifies a person, you can use the MAM's tools to replace the recognized face or speaker with the correct one.

Once your adjustments are complete, simply save, and your asset is perfectly cataloged.

The AI integration in VSN Explorer is a monumental step toward solving the content cataloging bottleneck, providing a highly reliable and precise first layer of metadata for your entire video library.