Page 1 of 1
[Solved] LibreOffice File format error found at SAXParse
Posted: Thu Jun 15, 2017 11:19 pm
by okuribidreams
Hello, I am an editor and I finished yesterday editing this file for a book due to be published soon.
The coworker that had it after me told me they found it impossible to open it and it gives me the SAXParse error, which I absolutely can't seem to find a way to fix.
My coworker's deadline is on the 22nd so if any of you could find a way to fix this corrupted file, I'd be indebted to you forever.
My email is gaia.marino.87<at>gmail.com
The file:
https://www.dropbox.com/s/xj859j4nva7db ... .docx?dl=0
Thank you so much!
Re: LibreOffice File format error found at SAXParse
Posted: Thu Jun 15, 2017 11:32 pm
by RoryOF
The file as placed on Dropbox opened without error in OpenOffice on my Xubuntu system. In case it is of help I have sent you an .odt version of it, which you can resave to .docx.
Re: LibreOffice File format error found at SAXParse
Posted: Thu Jun 15, 2017 11:49 pm
by okuribidreams
Thank you so much for your help, but sadly, the file is not complete - it should be around 100k words but that's slightly more than 50k. I guess the error must be around where the file stops for you.
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 12:07 am
by RoryOF
Yes, you are correct: the last words in the XML file are " Una nera piuma di rondine." However, the XML file parses correctly and there is presumably some erroneous cutoff point in the file, beyond which the parser will not travel. I think the error occurs at (close to) location 1833069. I'm at the end of a very long day and won't try anything as I might damage the file even more. Perhaps someone else will make an attempt.
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 12:12 am
by okuribidreams
Thank you anyway so much for your help!
At least this way we'll have half of the file if no one else will be able to fix it and won't have to start from the very beginning. I'm still holding on to hope!
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 12:23 am
by okuribidreams
(Also -- even just managing to extract and send me the unformatted text would be a HUGE help -- I can reformat but having to re-edit half of the book in one day at most would be crazy. Thank you in advance to anyone who might be able to help! ))
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 3:54 am
by robleyd
I've had a look and found the same as RoryOF; the content of the XML file ends with " Una nera piuma di rondine." I've checked with a simple text viewer; this is the last few hundred bytes of the relevant file.
Code: Select all
piuma di rondine. </w:t></w:r></w:p><w:sectPr><w:footerReference w:type="default" r:id="rId2"/><w:type w:val="nextPage"/><w
:pgSz w:w="11906" w:h="16838"/><w:pgMar w:left="1134" w:right="1134" w:header="0" w:top="1134" w:footer="1134" w:bottom="1191"
w:gutter="0"/><w:pgNumType w:fmt="decimal"/><w:formProt w:val="false"/><w:textDirection w:val="lrTb"/><w:docGrid w:type="defa
ult" w:linePitch="360" w:charSpace="4294961151"/></w:sectPr></w:body></w:document>
So unfortunately there is no text beyond that to recover.
As a last resort you might try recovering any deleted temporary files created for this document. As soon as possible do the following:
Download Recuva (Windows only) or PhotoRec for Win, MacOS, Linux and other OSes (only one needed) and let it do an in-depth recovery of deleted files on your computer. You may get a file containing some or all of your data (or not). Do this as a first priority; other use of the computer may overwrite any existing but deleted files and prevent their recovery. There is no guarantee that you will recover anything useful.
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 6:43 am
by RoryOF
I'm not online to do any editing yet, but I remark that the above quote is at a location 3.4 Mbytes into the file, whereas the cut-off point is at 1.8 mbytes, so the content is present. The trick will be to extract it. Of interest is why the code is cutting at 1.8 mbytes.
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 9:32 am
by RoryOF
I've had another look at this: I'm getting nowhere with the tools at my disposal.
For others who may wish to try, the error shows up at "uomo di casa" (no quotes). OpenOffice finishes displaying just there, about halfway through the file, although there is as much content again not displayed. XML Copy Editor declares the file to be well formed and Firefox shows the entire XML properly formatted as XML.
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 9:48 am
by RoryOF
I have managed to extract the entire text (please check) as plain text, which I have emailed to the OP.
Edit: In answer to an offlist question: using Calibre's Convert books mechanism.
Also, if you reformat the text file - a relatively trivial task (20 minutes?), do please work and Save in .odt format. Save As .docx _only_ when finished. |
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 2:16 pm
by FJCC
I recovered the entire text with correct formatting by opening the file in Wordpad. Word 2016 wouldn't open it at all and OpenOffice, as Rory found, brought in about half of the text. Wordpad opened it without any problem. I emailed that text to the OP.
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 2:19 pm
by RoryOF
I am puzzled (but not going to loose sleep) over what was wrong internally in the .xml file.
Re: LibreOffice File format error found at SAXParse
Posted: Fri Jun 16, 2017 3:02 pm
by okuribidreams
You guys are truly my heroes. Thank you so much for your help.
It's a good thing WordPad seemed to open it without issues! I feel bad about not trying it first, but I sadly never rely on it. I will from now on.
Thanks again so much for your help, all of you.