Skip to content

bug: pdf splitting modifies returned csv elements #201

@Coniferish

Description

@Coniferish

Describe the bug
When specifying output_format as csv, the response from the api is different when split_pdf_page is True or False. When False, the elements contain an extra metadata field: text_as_html. This also means the element id does not match.

To Reproduce
_test_unstructured_client/integration/test_decorators.py::test_integration_split_csv_response illustrates this, but is passing because it asserts on a shortened string.

Expected behavior
The response to be identical whether or not split_pdf_page is True or False.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions