![]() $ content: chr " STATE UNIVERSITY SYSTEM OF FLORIDA" "" "EXPENDITURE ANALYSIS" " 2006-2007". doc is actually a list, as can be seen with the following code: str(doc) The code above converted the PDF file to text and stored the result in doc. library(tm)ĭoc <- readPDF(control = list(text = "-layout"))(elem = list(uri = filename), readPDF threw an error when I tried to retrieve the PDF file directly from the link you provided, so I downloaded the PDF file to my working directory first. To get you started, here is an example of a complete readPDF command for reading a PDF file. Can anyone help me configure this correctly so that the tm package calls on the xpdf files correctly and readPDF functions like it should?Īgain, I'm very new to this, so apologies if I'm way off. I'm sure I'm missing something - right now I have pdftotext.exe in my working directory in R. I think it has to do with trying to use the tm package and the xpdf packages together, and so I read Tony Breyal's solution (I can't post more than 2 links), putting pdfinfo and pdftotext as environmental variables (I'm on Win 8) and restarting. I suspect it has something to do with the readPDF command - I get the following:Įrror in readPDF(PdftotextOptions = "-layout") : unused argument ![]() I've tried reading this and tried zx8754's solution with no luck. (If anyone has a better idea, please let me know!) I did some searching and after installing the tm package and the xpdf package, I need to extract specific data from page 55 through 300 of this and thought that R might be a good way to do so. I'm a beginner at R and having a bit of trouble using the tm package. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |