-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
When I reindex, I see memory usage permanently go up.
Path I traced was from pd.DataFrame.assign -> pd.Series.reindex -> pd.Index.reindex
Not sure what the ultimate source of the issue is
import psutil
import pandas as pd
K = 10
N = 1000 * 1000
for i in range(K):
# any of these will cause the issue
#df = (pd.DataFrame(pd.np.random.rand(N, 2))
# .assign(series=pd.Series(range(N+1))))
#series = pd.Series(pd.np.random.rand(N)).reindex(range(N+1))
index = pd.Index(range(N)).reindex(range(N-1))
print('finished reindex iter {:2d}, using {:3.1f} percent memory'
.format(i, psutil.virtual_memory().percent))
prints this
finished reindex iter 0, using 45.6 percent memory
finished reindex iter 1, using 45.9 percent memory
finished reindex iter 2, using 46.1 percent memory
finished reindex iter 3, using 46.4 percent memory
finished reindex iter 4, using 46.7 percent memory
finished reindex iter 5, using 47.0 percent memory
finished reindex iter 6, using 47.2 percent memory
finished reindex iter 7, using 47.5 percent memory
finished reindex iter 8, using 47.7 percent memory
finished reindex iter 9, using 48.0 percent memory
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.14.32-pv-ts1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24.1
numpy: 1.10.4
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 5.0.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.2
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None