[Solved] LibreOffice File format error found at SAXParse

Help with installation and general system troubleshooting questions concerning the office suite LibreOffice.
Post Reply
okuribidreams
Posts: 5
Joined: Thu Jun 15, 2017 10:50 pm

[Solved] LibreOffice File format error found at SAXParse

Post by okuribidreams »

Hello, I am an editor and I finished yesterday editing this file for a book due to be published soon.
The coworker that had it after me told me they found it impossible to open it and it gives me the SAXParse error, which I absolutely can't seem to find a way to fix. :crazy:
My coworker's deadline is on the 22nd so if any of you could find a way to fix this corrupted file, I'd be indebted to you forever.

My email is gaia.marino.87<at>gmail.com

The file: https://www.dropbox.com/s/xj859j4nva7db ... .docx?dl=0

Thank you so much!
Last edited by Hagar Delest on Sat Jun 17, 2017 4:23 pm, edited 1 time in total.
Reason: tagged [Solved].
OpenOffice 3.1 on Windows 10
User avatar
RoryOF
Moderator
Posts: 34787
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: LibreOffice File format error found at SAXParse

Post by RoryOF »

The file as placed on Dropbox opened without error in OpenOffice on my Xubuntu system. In case it is of help I have sent you an .odt version of it, which you can resave to .docx.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
okuribidreams
Posts: 5
Joined: Thu Jun 15, 2017 10:50 pm

Re: LibreOffice File format error found at SAXParse

Post by okuribidreams »

Thank you so much for your help, but sadly, the file is not complete - it should be around 100k words but that's slightly more than 50k. I guess the error must be around where the file stops for you.
OpenOffice 3.1 on Windows 10
User avatar
RoryOF
Moderator
Posts: 34787
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: LibreOffice File format error found at SAXParse

Post by RoryOF »

Yes, you are correct: the last words in the XML file are " Una nera piuma di rondine." However, the XML file parses correctly and there is presumably some erroneous cutoff point in the file, beyond which the parser will not travel. I think the error occurs at (close to) location 1833069. I'm at the end of a very long day and won't try anything as I might damage the file even more. Perhaps someone else will make an attempt.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
okuribidreams
Posts: 5
Joined: Thu Jun 15, 2017 10:50 pm

Re: LibreOffice File format error found at SAXParse

Post by okuribidreams »

Thank you anyway so much for your help!
At least this way we'll have half of the file if no one else will be able to fix it and won't have to start from the very beginning. I'm still holding on to hope!
OpenOffice 3.1 on Windows 10
okuribidreams
Posts: 5
Joined: Thu Jun 15, 2017 10:50 pm

Re: LibreOffice File format error found at SAXParse

Post by okuribidreams »

(Also -- even just managing to extract and send me the unformatted text would be a HUGE help -- I can reformat but having to re-edit half of the book in one day at most would be crazy. Thank you in advance to anyone who might be able to help! ))
OpenOffice 3.1 on Windows 10
User avatar
robleyd
Moderator
Posts: 5265
Joined: Mon Aug 19, 2013 3:47 am
Location: Murbko, Australia

Re: LibreOffice File format error found at SAXParse

Post by robleyd »

I've had a look and found the same as RoryOF; the content of the XML file ends with " Una nera piuma di rondine." I've checked with a simple text viewer; this is the last few hundred bytes of the relevant file.

Code: Select all

piuma di rondine. </w:t></w:r></w:p><w:sectPr><w:footerReference w:type="default" r:id="rId2"/><w:type w:val="nextPage"/><w
:pgSz w:w="11906" w:h="16838"/><w:pgMar w:left="1134" w:right="1134" w:header="0" w:top="1134" w:footer="1134" w:bottom="1191"
 w:gutter="0"/><w:pgNumType w:fmt="decimal"/><w:formProt w:val="false"/><w:textDirection w:val="lrTb"/><w:docGrid w:type="defa
ult" w:linePitch="360" w:charSpace="4294961151"/></w:sectPr></w:body></w:document>
So unfortunately there is no text beyond that to recover.

As a last resort you might try recovering any deleted temporary files created for this document. As soon as possible do the following:

Download Recuva (Windows only) or PhotoRec for Win, MacOS, Linux and other OSes (only one needed) and let it do an in-depth recovery of deleted files on your computer. You may get a file containing some or all of your data (or not). Do this as a first priority; other use of the computer may overwrite any existing but deleted files and prevent their recovery. There is no guarantee that you will recover anything useful.
Slackware 15 64 bit
Apache OpenOffice 4.1.15
LibreOffice 24.8.3.2; SlackBuild for 24.8.3 by Eric Hameleers
---------------------
Roses are Red, Violets are Blue
Unexpected '{' on line 32
.
User avatar
RoryOF
Moderator
Posts: 34787
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: LibreOffice File format error found at SAXParse

Post by RoryOF »

I'm not online to do any editing yet, but I remark that the above quote is at a location 3.4 Mbytes into the file, whereas the cut-off point is at 1.8 mbytes, so the content is present. The trick will be to extract it. Of interest is why the code is cutting at 1.8 mbytes.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
RoryOF
Moderator
Posts: 34787
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: LibreOffice File format error found at SAXParse

Post by RoryOF »

I've had another look at this: I'm getting nowhere with the tools at my disposal.

For others who may wish to try, the error shows up at "uomo di casa" (no quotes). OpenOffice finishes displaying just there, about halfway through the file, although there is as much content again not displayed. XML Copy Editor declares the file to be well formed and Firefox shows the entire XML properly formatted as XML.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
RoryOF
Moderator
Posts: 34787
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: LibreOffice File format error found at SAXParse

Post by RoryOF »

I have managed to extract the entire text (please check) as plain text, which I have emailed to the OP.
 Edit: In answer to an offlist question: using Calibre's Convert books mechanism.

Also, if you reformat the text file - a relatively trivial task (20 minutes?), do please work and Save in .odt format. Save As .docx _only_ when finished. 
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
FJCC
Moderator
Posts: 9457
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: LibreOffice File format error found at SAXParse

Post by FJCC »

I recovered the entire text with correct formatting by opening the file in Wordpad. Word 2016 wouldn't open it at all and OpenOffice, as Rory found, brought in about half of the text. Wordpad opened it without any problem. I emailed that text to the OP.
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
User avatar
RoryOF
Moderator
Posts: 34787
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: LibreOffice File format error found at SAXParse

Post by RoryOF »

I am puzzled (but not going to loose sleep) over what was wrong internally in the .xml file.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
okuribidreams
Posts: 5
Joined: Thu Jun 15, 2017 10:50 pm

Re: LibreOffice File format error found at SAXParse

Post by okuribidreams »

You guys are truly my heroes. Thank you so much for your help.

It's a good thing WordPad seemed to open it without issues! I feel bad about not trying it first, but I sadly never rely on it. I will from now on.
Thanks again so much for your help, all of you.
OpenOffice 3.1 on Windows 10
Post Reply