-
-
Notifications
You must be signed in to change notification settings - Fork 26
Description
`
from transformers import LayoutLMv3ForTokenClassification
#from v3.helpers import prepare_inputs, boxes2inputs, parse_logits
model = LayoutLMv3ForTokenClassification.from_pretrained("hantian/layoutreader")
list of [left, top, right, bottom], bboxes of spans, should be range from 0 to 1000
page_width = 612
page_height = 792
norm_boxes = []
for (x1, y1, x2, y2) in filtered_boxes:
nx1 = int(x1 / page_width * 1000)
ny1 = int(y1 / page_height * 1000)
nx2 = int(x2 / page_width * 1000)
ny2 = int(y2 / page_height * 1000)
norm_boxes.append([nx1, ny1, nx2, ny2])
inputs = boxes2inputs(norm_boxes)
inputs = prepare_inputs(inputs, model)
logits = model(**inputs).logits.cpu().squeeze(0)
orders = parse_logits(logits, len(norm_boxes))
print(orders)`
I am using this sample code on bbox extracted by yolo based model. The order is completey random. I dont know where I am making a mistake. Do you have a sample images and bbox data for testing ?
Thanks :)