-
Notifications
You must be signed in to change notification settings - Fork 121
Label handling commit breaks the imdb finetuning script #3
Copy link
Copy link
Open
Description
Thomas, thanks for sharing this code! I noticed that commit 8d9c237 seems to have broken the default functioning of the classification finetuning scripts - in the previous version there seems to have been a key called 'labels' associated with the imdb and trec dictionaries, but in finetuning_train.py this line still references the now deleted key.
I updated the line to just use DATASETS_LABELS_URL['imdb']['test'] as intended, but then it seems that the S3 bucket doesn't have the IMDB test file.
See below:
file_path = "https://s3.amazonaws.com/datasets.huggingface.co/imdb/test.labels.txt"
label_file = cached_path(file_path)
with open(label_file, "r", encoding="utf-8") as f:
all_lines = f.readlines()
print(all_lines[:5])
Gives:
['<?xml version="1.0" encoding="UTF-8"?>\n', '<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>imdb/test.labels.txt</Key><RequestId>3D9E7C511167A0FB</RequestId><HostId>RiidOcrHfFaqxW9tmUXRppE/G3lsYoCZcq+uaYDi2yPPoe8mv/Og6PMuUncwk+B53tGsvcCZMWk=</HostId></Error>']
Does the test file for IMDB still exist with this name? This doesn't seem to be an issue with TREC.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Fields
Give feedbackNo fields configured for issues without a type.