Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "IO Error: GDAL Error (4): `<file>.xlsx' not recognized as a supported file format." when calling st_read on registered .xlsx file #1870

Open
rpbouman opened this issue Sep 28, 2024 · 1 comment

Comments

@rpbouman
Copy link

What happens?

  • WASM is initialized
  • spatial extension loaded and installed
  • .xlsx file registered with registerFileHandle using DuckDBDataProtocol.BROWSER_FILEREADER
  • query SELECT * FROM st_read( '<filename>' ) fails with

Error: IO Error: GDAL Error (4): test.xlsx' not recognized as a supported file format.`

To Reproduce

I'm running this page locally:

<!doctype html>
<html>
  <head>
    <title>DuckDB WASM: Reading XSLX</title>
  </head>
  <body>
    <div id="content">
      <div id="initialization">
        <div id="status">Initializing, please wait...</div>
        <ul id="initActions">
        </ul>
      </div>
      <div style="display:none" id="ui">
        <label>Choose a file<label>: <input type="file" id="fileInput" accept=".xlsx, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"/>
      </div>
    </div>
    <script type="application/javascript">
      function statusUpdate(action) {
        var list = document.getElementById('initActions');
        list.innerHTML += `<li>${action}</li>`;
      }
      
      function alertError(message, error){
        var message = `${message}: ${error.message}\n\n${error.stack}`;
        console.error(message);
        alert(message);
      }
      
      function afterInit(){
        document.getElementById('initialization').style.display = 'none';
        document.getElementById('fileInput').addEventListener('change', async function(event){
          var files = event.currentTarget.files;
          if (files.length === 0){ 
            alert('No file selected!');
            return;
          }
          
          var file = files[0];
          var duckdb = window.duckdb;
          try {
            await duckdb.instance.registerFileHandle(
              file.name, 
              file, 
              duckdb.api.DuckDBDataProtocol.BROWSER_FILEREADER, 
              true
            );
            alert(`File registered!`);
          }
          catch (e){
            alertError('Error registering file', e);
            return;
          }

          try {
            var sql = `SELECT * FROM st_read('${file.name}')`;
            var result = await duckdb.connection.query(sql);
            debugger;
          }
          catch(e){
            alertError('Error executing query on file', e);
            return;
          }
        });
        document.getElementById('ui').style.display = 'block';
      }
    </script>
    <script type="module">
      import * as duckdb from 'https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/+esm';

      statusUpdate('module loaded');

      const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
      const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
      statusUpdate('jsdelivr bundle selected');

      const worker_url = URL.createObjectURL(
        new Blob([`importScripts("${bundle.mainWorker}");`], {type: 'text/javascript'})
      );
      const worker = new Worker(worker_url);
      const logger = new duckdb.ConsoleLogger();
      const db = new duckdb.AsyncDuckDB(logger, worker);
      URL.revokeObjectURL(worker_url);

      await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
      statusUpdate('DuckDB instance created');

      const connection = await db.connect();
      statusUpdate('DuckDB connection created');
      
      var versionResult = await connection.query(`SELECT version() as versioninfo`);
      var versionInfo = versionResult.get(0)['versioninfo'];
      statusUpdate(`DuckDB ${versionInfo} initialized.`);
      console.log(`DuckDB version: ${versionInfo}`) ;

      // expose duckdb as global
      window.duckdb = {
        api: duckdb,
        instance: db,
        connection: connection
      };
      
      statusUpdate('Installing spatial extension');
      await connection.query('INSTALL SPATIAL');
      statusUpdate('Loading spatial extension');
      await connection.query('LOAD SPATIAL');
      afterInit();
    </script>
    
  </body>
</html>

Sample xlsx file is attached.
test.xlsx

Browser/Environment:

Chrome 129.0.6668.71, MS Edge 129.0.2792.65

Device:

Windows 11 Desktop

DuckDB-Wasm Version:

using https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/+esm

DuckDB-Wasm Deployment:

https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/+esm

Full Name:

Roland Bouman

Affiliation:

EPAM Systems BV Netherlands

@rpbouman
Copy link
Author

I should have included: I can open the file fine using duckdb cli:

DuckDb> SELECT * FROM st_read( 'C:\roland\projects\duckdb-wasm-snippets\3. reading xlsx\test.xlsx' );
┌────────────┬──────────┐
│   Fruit    │ Calories │
│  varchar   │  int32   │
├────────────┼──────────┤
│ Apple      │       52 │
│ Banana     │       89 │
│ Cantaloupe │       39 │
└────────────┴──────────┘
Run Time (s): real 0.045 user 0.000000 sys 0.015625
changes:   0   total_changes: 0
DuckDb>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant