Signup/Sign In
Ask Question
Not satisfied by the Answer? Still looking for a better solution?

Pd.read_csv timing out

Hi

I am trying to read in a large NASA data file into a panda dataframe. It was working ok yesterday but overnight it stopped working giving errors as per below:

IncompleteRead: IncompleteRead(1079967499 bytes read, 740902801 more expected)

I am using very basic read.csv to import the tab file which was working ok. Is the issue with the file itself not opening or my browser or PC or internet connection? I cant even open in a browser as it doesn't get to the end of the file which leads me to think its not python setting ie a timeout that needs extended. Either my internet or the host site having problems. Its always the same amount being read and not read it seems. Even if I ask for minimal columns of data back ie 1 it still has the problem or all columns.

df = pd.read_csv ("hirise-pds.lpl.arizona.edu/PDS/INDEX/EDRCUMINDEX.TAB", header=None, usecols=col_list)

Thanks if you can help narrow down the issue for me. I have also asked the hosts at NASA if there is an issue with the web page.
by

1 Answer

vishaljlf39
My guess is that you don't have enough memory to read the entire file at once.
That fact that you can run for a long time, most likely is because the OS is probably trying to swap memory out to disk (paging on windows),
and can do this for quite a while until either the paging file gets full, or swap on Linux gets full.

Login / Signup to Answer the Question.