Function Resource

Function Repository Resource:

YouTubeTranscript

Source Notebook

Get YouTube transcripts

Contributed by: Anton Antonov

ResourceFunction["YouTubeTranscript"][id]

get the transcript of the YouTube video with identifier id.

ResourceFunction["YouTubeTranscript"] extracts the captions of the video, if they exist.
The transcript is returned as plain text.
The YouTube Data API has usage quotas
Not all YouTube videos have automatic or manual captions. If no captions are available, the script returns a message indicating this.
ResourceFunction["YouTubeTranscript"] processes "captionTracks" of the YouTube Data API, which is a field of YouTube's video metadata.
The field "captionTracks" is an array of objects, where each object represents a single caption track (e.g., for a specific language or type).
From "captionTracks" the "baseURL" string is extracted, which is the URL to fetch the caption content.

Examples

Basic Examples (2) 

Get a video transcript:

In[1]:=
transcript = ResourceFunction[
CloudObject[
    "https://www.wolframcloud.com/obj/antononcube/DeployedResources/Function/YouTubeTranscript"]]["ewU83vHwN8Y"];
transcript // StringLength
Out[2]=

Here is an excerpt:

In[3]:=
SeedRandom[332];
lines = StringSplit[transcript, "\n"];
p = RandomInteger[Length[lines] - 10];
lines[[p ;; p + 10]] // StringRiffle[#, "\n"] &
Out[6]=

If the video identifier is not found or the video has no captions Failure object is returned. For example:

In[7]:=
ResourceFunction[
CloudObject[
  "https://www.wolframcloud.com/obj/antononcube/DeployedResources/Function/YouTubeTranscript"]]["89328ewU83vHwN8Y"]
Out[7]=

Get a video transcript:

In[8]:=
transcript = ResourceFunction[
CloudObject[
    "https://www.wolframcloud.com/obj/antononcube/DeployedResources/Function/YouTubeTranscript"]]["_yUW-TGGKOc"];

Show the number of characters, words, and lines:

In[9]:=
Clear[TextStats];
TextStats[txt_String] := AssociationThread[{"Characters", "Words", "Lines"}, Through[{StringLength, Length@*TextWords, Length[StringSplit[#, "\n"]] &}[txt]]];
TextStats[transcript]
Out[10]=

Summarize the transcript:

In[11]:=
LLMResourceFunction["Summarize"][transcript]
Out[11]=

Get a video transcript and show table of themes:

In[12]:=
transcript = ResourceFunction[
CloudObject[
    "https://www.wolframcloud.com/obj/antononcube/DeployedResources/Function/YouTubeTranscript"]]["_yUW-TGGKOc"];
Clear[GridTableFormFromJSON];
GridTableFormFromJSON[json_String] := ResourceFunction["GridTableForm"][
   Dataset[Association /@ ImportString[StringReplace[json, {"```json" -> "", "```" -> ""}],
        "JSON"]] /. {x_String :> Style[x, FontFamily -> "Times New Roman"]}];
GridTableFormFromJSON[
 LLMResourceFunction["ThemeTableJSON"][transcript, "article", 30]]
Out[15]=