[YouTube] Add support for extracting auto-translated captions#997
[YouTube] Add support for extracting auto-translated captions#997
Conversation
...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
Outdated
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
Outdated
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
Show resolved
Hide resolved
| .setAutoGenerated(isAutoGenerated) | ||
| .setAutoTranslated(false) | ||
| .build()); | ||
| if (i == 0 && caption.getBoolean("isTranslatable") |
There was a problem hiding this comment.
I would not base the extraction on the index, but rather on whether the subtitles are auto-generated:
| if (i == 0 && caption.getBoolean("isTranslatable") | |
| if (isAutoGenerated && caption.getBoolean("isTranslatable") |
Also, this PR doesn't add support of subtitles translation for uploaded subtitles. For instance, see https://www.youtube.com/watch?v=_cMxraX_5RE: you can translate from German to French and from English to French, and the translations are different.
We may need another property in SubtitlesStream for this.
There was a problem hiding this comment.
I don't understand why we should use isAutoGenerated here. For better quality, it should be !isAutoGenerated. Manually added captions should be exact.
I was also wondering whether we should provide the auto-translated captions by default. Extracting the data for and generating ~100 SubtitleStreams takes some time. I'd definitely not recommend to do this for all available languages by default. On the other hand, we could provide a method which does this when needed.
There was a problem hiding this comment.
I decided to extract all available subtitles, but made sure to speed up the process. It's up to the frontends to filter the subtitles.
...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java
Show resolved
Hide resolved
2bcc0a9 to
efce384
Compare
|
What happened to this? |
Closes #977 Based on and adresses TeamNewPipe/NewPipe#8023
Faster and ordered: captions provided by the user are at the beginning of the list, auto-translated captions are at the end
efce384 to
9730de2
Compare
|
Is something blocking this PR from getting merged (except the needed rebase)? @AudricV (some user asked via email about this feature) |
Extract auto-translated captions for YouTube videos.
API changes 🟢
SubtitlesStreamThis adds
isAutoTranslated()next toisAutoGenerated()to distinguish between auto-generated subtitles which use speech2text and auto-translated captions based on Google translator.Additionally,
getBaseLocale(),getDisplayBaseLanguageName()andgetBaseLanguageTag()were added to access info on the language which was used for auto-translations.Issues closed by this PR
Closes #977
Based on and adresses TeamNewPipe/NewPipe#8023