Video caption result
Schema for video caption metadata.
Keyframe
Bases: BaseModel
Represents a keyframe extracted from a video segment.
Attributes:
| Name | Type | Description |
|---|---|---|
time_offset |
float
|
Time within the segment where the keyframe occurs. |
caption |
str | None
|
Text description of this specific keyframe. |
Segment
Bases: BaseModel
Represents a video segment with its captions, transcripts, and keyframes.
Attributes:
| Name | Type | Description |
|---|---|---|
start_time |
float | None
|
The segment's starting time in seconds. |
end_time |
float | None
|
The segment's ending time in seconds. |
transcripts |
list[AudioTranscript]
|
Optional list of transcripts for the segment. |
segment_caption |
list[str]
|
The single, rich description of the segment's action/plot. |
keyframes |
list[Keyframe]
|
Optional list of keyframes extracted from the segment. |
ensure_caption()
Ensure segment has caption, fallback to keyframes/transcripts if needed.
ensure_keyframes()
Ensure all keyframes time offset is non-negative.
ensure_transcripts()
Ensure all transcripts time offset is non-negative.
VideoCaptionMetadata
Bases: BaseModel
Metadata for video captioning results.
Attributes:
| Name | Type | Description |
|---|---|---|
video_summary |
str
|
A high-level summary of the entire video's plot, topic, or main events. |
segments |
list[Segment]
|
List of video segments with their captions and metadata. |
ensure_segment_end_time_greater_than_start_time()
Ensure segment end time is greater than start time.
If end time equal or lower than start time, then use next segment start time-1 as end time.