Is it possible to extract Meta information from MS office files and/or PDFs with PHP? -
so have files....
.doc .docx .xls .xlsx , .pdf
that on server.
is possible (and if is, how) extract meta data files using php? i'm looking things author, keywords, title, etc...
in office documents it's information stored along document properties (file...properties...summary 2003, prepare...properties 2007).
in pdfs it's information found in document properties.
this not on windows server.
i have managed extract lot of meta information using xpdf on linux system few years back. nowadays, though, zend_pdf best bet. haven't used myself looks , promises need. seems have no library dependencies, either.
for word .docs, if don't find better way, plug openoffice server instance / command line , convert files odt, xml , parseable. if it's not possible extract meta data per macro - should be, don't know how work is. this openoffice forum entry gives ton of starting points automated conversion.
the ...x formats sort of xml, should possible fetch meta data them. alternatively, should able use openoffice's conversion filters here well, if transport meta data.
Comments
Post a Comment