Pd.read_csv timing out


I am trying to read in a large NASA data file into a panda dataframe. It was working ok yesterday but overnight it stopped working giving errors as per below:

IncompleteRead: IncompleteRead(1079967499 bytes read, 740902801 more expected)

I am using very basic read.csv to import the tab file which was working ok. Is the issue with the file itself not opening or my browser or PC or internet connection? I cant even open in a browser as it doesn't get to the end of the file which leads me to think its not python setting ie a timeout that needs extended. Either my internet or the host site having problems. Its always the same amount being read and not read it seems. Even if I ask for minimal columns of data back ie 1 it still has the problem or all columns.

df = pd.read_csv ("", header=None, usecols=col_list)

Thanks if you can help narrow down the issue for me. I have also asked the hosts at NASA if there is an issue with the web page.

1 Answer

My guess is that you don't have enough memory to read the entire file at once.
That fact that you can run for a long time, most likely is because the OS is probably trying to swap memory out to disk (paging on windows),
and can do this for quite a while until either the paging file gets full, or swap on Linux gets full.

