-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Datatypes not preserved on pd.read_excel #60088
Comments
I think this is worth changing, can I take it ? |
Hello, I tested excel and found that as long as there is only true in the table, it will be converted to TRUE, which is a keyword in excel. So maybe user will not have True in a cell. |
But TRUE is parsed as True in the first column and 1 in the back, which I think is worth changing.Maybe we should make the default values the same for different columns |
I created a excel to test it. The data is:
If we read it directly, the result is:
If we read it with
I think it has something to do with the way the data is created. |
As the docstring states, use I checked both calamine and openpyxl, both readers are reading integer or Boolean values instead of e.g. from openpyxl import load_workbook
wb = load_workbook('test.xlsx', data_only=True)
for row in wb.worksheets[0].rows:
for cell in row:
print(cell, cell.value, cell.internal_value, cell.data_type) As pandas only gets values through third-party libraries, they would need to support this first. It is likely there is a technical limitation in the Excel spec that prevents this, but I'm not certain. As there is nothing pandas can do here, closing this issue. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Input data
![image](https://private-user-images.githubusercontent.com/63579905/379311079-e8d9a588-a49f-4007-bb41-c262431d9d36.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMjY2MTEsIm5iZiI6MTczOTIyNjMxMSwicGF0aCI6Ii82MzU3OTkwNS8zNzkzMTEwNzktZThkOWE1ODgtYTQ5Zi00MDA3LWJiNDEtYzI2MjQzMWQ5ZDM2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIyMjUxMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZkOTEyZDhjN2Y2MTQ5NjFmZTUxNmQ0YWMzZTUzYzAyYWQ1ZTQyNzg4NDU4NmQyMWNmNzhkYWQ0MzY1ZDQ5OGMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.7i_q7iMeXpCF5WMX7GLrO-w9LyTZZYWOTVHB6-1hoPo)
Datatype is preserved, but values are modified
TRUE -> 1
![image](https://private-user-images.githubusercontent.com/63579905/379312377-753f38bd-a987-4565-81b1-51a2033ea3b5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMjY2MTEsIm5iZiI6MTczOTIyNjMxMSwicGF0aCI6Ii82MzU3OTkwNS8zNzkzMTIzNzctNzUzZjM4YmQtYTk4Ny00NTY1LTgxYjEtNTFhMjAzM2VhM2I1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIyMjUxMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWEwM2IxMjBhMjE1MjU2YjAzNjFkM2VhOGFjYjJmODdhYWU5NDhkNjg4YTM1NzQ3Nzg5MDczYTc2NWQ1MWYwNjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UbVOCCWn2vqjrnmvyR4drM5lgL2fO5IeTfJ0VXC276k)
values are modified, even if i read as string
TRUE <-> 1
![image](https://private-user-images.githubusercontent.com/63579905/379313724-c7d2e216-327b-4440-a6ed-090a455a4342.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMjY2MTEsIm5iZiI6MTczOTIyNjMxMSwicGF0aCI6Ii82MzU3OTkwNS8zNzkzMTM3MjQtYzdkMmUyMTYtMzI3Yi00NDQwLWE2ZWQtMDkwYTQ1NWE0MzQyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIyMjUxMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ5YjVmNjE5MzgwMmVmZWRjMjdlNDNhODRkYmMzN2M1MzA1NGMwOWY4MmExYjFlZjQxNDIyMzE0NWJjY2MxYTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.UflcIOHcaH7b2pESY75vF67JEbh_aaaibRsxwKwqVds)
additionally, I want TRUE in uppercase, if it is changed to True, i can't find difference if user has 'True in a cell
Expected Behavior
There should be a way to preserve values and datatypes as it is
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.11.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_India.1252
pandas : 2.2.2
numpy : 1.26.2
pytz : 2024.1
dateutil : 2.8.2
setuptools : 69.0.2
pip : 23.1.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.9.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : 2.0.23
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: