Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"KeyError: 'building_id'" when running lcm_simulate #19

Open
lisalan520 opened this issue Jun 2, 2015 · 33 comments
Open

"KeyError: 'building_id'" when running lcm_simulate #19

lisalan520 opened this issue Jun 2, 2015 · 33 comments

Comments

@lisalan520
Copy link

Hi,

I also have problem running 'hlcm_simulate' & 'elcm_simulate' models using my own data. It raised keyerror: 'building_id' for both models. I've checked my data and found nothing weird. I've also managed to break the model to individual steps and run them one by one. Do you have any idea what could be wrong? Thank you!

Here is my error message:

Running model 'hlcm_simulate'
There are 450501 total available units
    and 359815 total choosers
    but there are 0 overfull buildings
    for a total of 90686 temporarily empty units
    in 81292 buildings total in the region
Assigned 0 choosers to new units

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-012e08452343> in <module>()
----> 1 sim.run(["hlcm_simulate"])

C:\Anaconda\lib\site-packages\urbansim\sim\simulation.pyc in run(models, years, data_out, out_interval)
   1458                 model = get_model(model_name)
   1459                 t2 = time.time()
-> 1460                 model()
   1461                 print("Time to execute model '{}': {:.2f}s".format(
   1462                       model_name, time.time()-t2))

C:\Anaconda\lib\site-packages\urbansim\sim\simulation.pyc in __call__(self)
    670             kwargs = _collect_variables(names=self._argspec.args,
    671                                         expressions=self._argspec.defaults)
--> 672             return self._func(**kwargs)
    673 
    674     def _tables_used(self):

C:\Users\xzhang\Documents\PythonScripts\Marion_urbansim_test_0514_with_building_ids\models.pyc in hlcm_simulate(households, buildings, zones)
     39     return utils.lcm_simulate("hlcm.yaml", households, buildings, zones,
     40                               "building_id", "residential_units",
---> 41                               "vacant_residential_units")
     42 
     43 

C:\Users\xzhang\Documents\PythonScripts\Marion_urbansim_test_0514_with_building_ids\utils.pyc in lcm_simulate(cfg, choosers, buildings, nodes, out_fname, supply_fname, vacant_fname)
    198 
    199     # go from units back to buildings
--> 200     new_buildings = pd.Series(units.ix[new_units.values][out_fname].values,
    201                               index=new_units.index)
    202 

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
   1676             return self._getitem_multilevel(key)
   1677         else:
-> 1678             return self._getitem_column(key)
   1679 
   1680     def _getitem_column(self, key):

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key)
   1683         # get column
   1684         if self.columns.is_unique:
-> 1685             return self._get_item_cache(key)
   1686 
   1687         # duplicate columns & possible reduce dimensionaility

C:\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
   1050         res = cache.get(item)
   1051         if res is None:
-> 1052             values = self._data.get(item)
   1053             res = self._box_item_values(item, values)
   1054             cache[item] = res

C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item, fastpath)
   2563 
   2564             if not isnull(item):
-> 2565                 loc = self.items.get_loc(item)
   2566             else:
   2567                 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Anaconda\lib\site-packages\pandas\core\index.pyc in get_loc(self, key)
   1179         loc : int if unique index, possibly slice or mask if not
   1180         """
-> 1181         return self._engine.get_loc(_values_from_object(key))
   1182 
   1183     def get_value(self, series, key):

C:\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3656)()

C:\Anaconda\lib\site-packages\pandas\index.pyd in pandas.index.IndexEngine.get_loc (pandas\index.c:3534)()

C:\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11911)()

C:\Anaconda\lib\site-packages\pandas\hashtable.pyd in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11864)()

KeyError: 'building_id'
@fscottfoti
Copy link
Contributor

I think (I'm not totally sure) that you need a building_id in the households table. Do you have that column? The vacant_residential_units requires there to be a building_id.

@lisalan520
Copy link
Author

Thanks for replying!
I have 'building_id' column in both my households & jobs data. When I did the step-by-step test, it has no problem in calculating the vacant_residential_units...

@fscottfoti
Copy link
Contributor

OK - yeah I've definitely seen this before but am having a hard time remembering the problem.

I would try 2 things - first, try naming the index of your buildings table -

buildings.index.name = 'building_id'

then I would double check the index for duplicates

pd.Series(buildings.index).value_counts() and see if the top row has a value > 1.

@lisalan520
Copy link
Author

I've checked my buildings data with the method you suggested. There is no duplicates in buildings.index. And I got same error message...

@fscottfoti
Copy link
Contributor

And you tried changing the name of the index?

On Tue, Jun 2, 2015 at 1:55 PM lisalan520 [email protected] wrote:

I've checked my buildings data with the method you suggested. There is no
duplicates in buildings.index. And I got same error message...


Reply to this email directly or view it on GitHub
#19 (comment)
.

@lisalan520
Copy link
Author

Yes I did. Could it be related to data type?

@fscottfoti
Copy link
Contributor

Is building_id a float because there are nans? If so, that is very likely it - is should be an int column.

@lisalan520
Copy link
Author

The building_id is an int column. Should it be consecutive integers? I have something like this [1,2,3,5,9,10] will this be an issue?

@fscottfoti
Copy link
Contributor

Definitely does NOT have to consecutive.

@jiffyclub
Copy link
Member

Are you sure you're talking about the right table? The error is occurring here:

C:\Users\xzhang\Documents\PythonScripts\Marion_urbansim_test_0514_with_building_ids\utils.pyc in lcm_simulate(cfg, choosers, buildings, nodes, out_fname, supply_fname, vacant_fname)
    198 
    199     # go from units back to buildings
--> 200     new_buildings = pd.Series(units.ix[new_units.values][out_fname].values,
    201                               index=new_units.index)
    202 

And I think the most likely way to get a KeyError there is if the units DataFrame doesn't have a 'building_id' column. Which table is units?

@fscottfoti
Copy link
Contributor

Units comes from here:

https://github.com/synthicity/urbansim_defaults/blob/master/urbansim_defaults/utils.py#L358

It's an expansion of the original buildings table and it needs a building_id to get back to the buildings.

I really think the building_id comes from the call to .reset_index() right there and that the index has to be named building_id to get the building_id column there. If the index is named building_id, I'm not sure why it wouldn't have the column after that

@lisalan520
Copy link
Author

From my step-by-step test, the unit table looks like this:

image

and it does has building_id ..

@jiffyclub
Copy link
Member

It looks like @lisalan520 is not using the same version of lcm_simulate @fscottfoti linked to. Any reason to think that could be a problem?

@fscottfoti
Copy link
Contributor

It seems like it's a lot different - the line number has gone from 200 to 437. @lisalan520 what version are you using?

I don't know for sure, but it's definitely possible the new version would fix the problem. I have made some small changes in the function in the past 2-3 months. If we know what version @lisalan520 is running maybe we can diff them?

@lisalan520
Copy link
Author

The lcm_simulate I used comes from here:

https://github.com/synthicity/sanfran_urbansim/blob/master/utils.py

I have the same code as in the link. I'm using UrbanSim 1.3. I'll try to update my urbansim to see whether it solves the problem.

@lisalan520
Copy link
Author

Seems the two 2.0 versions both use discrete choice model, which should not solve the problem here. I'll try to run discrete choice model again and hope my computer can afford it this time. Many thanks!

@fscottfoti
Copy link
Contributor

So just to be clear, when you print out units building_id is there, and we're looking at the expression units.loc[new_units.values][out_fname] where out_fname is equal to building_id so can you print out units.loc[new_units.values]? - somehow building_id is missing from the result? What is the expression equal to?

@lisalan520
Copy link
Author

'units' is a dis-aggregated table of 'buildings' according to the vacant_units value. 'new_units' comes from lcm model predict. 'new_units.values' is used to pick rows from 'units' where 'units.index = new_units.values'

Here is a capture:

image

@fscottfoti
Copy link
Contributor

So can you then run units.loc[new_units.values][out_fname]? What am I missing?

@lisalan520
Copy link
Author

I was able to run units.loc[new_units.values]["building_id"] to get the results. But when I define out_fname= building_id, I cannot run units.loc[new_units.values][out_fname] here.

@fscottfoti
Copy link
Contributor

Can you print units.columns? Grasping at straws here...

@lisalan520
Copy link
Author

Here it is:
image

@fscottfoti
Copy link
Contributor

Interesting - and you put building_id in quotes above so that it's a string? Not sure what's going on here, but it's definitely a Pandas issue - there's no UrbanSim happening here that I can see.

@jiffyclub
Copy link
Member

Note that in the code @lisalan520 is using it's using .ix, not .loc. Wonder if that's making a difference.

@lisalan520
Copy link
Author

I tried both .loc and .ix and they have the same problem with using building_id without quotes.
image

@jiffyclub
Copy link
Member

You're not going to be able to use building_id without quotes, it has to be a string or a variable that refers to a string.

@jiffyclub
Copy link
Member

@lisalan520 You're not in SF, are you? I wish I could debug this in person. We might also be able to use a Google Hangout, I think I can drive your computer from those.

@lisalan520
Copy link
Author

Sorry for some reason I thought it was building_id in my models.py. I just checked it and it was "building_id" when I got the error...

I'm in Indianapolis. I'll check if I can use Google Hangout on this computer. Thanks!

@lisalan520
Copy link
Author

Hi @jiffyclub I think we can try Google Hangouts. So how do I connect with you?

@jiffyclub
Copy link
Member

@jiffyclub
Copy link
Member

Just had a call with @lisalan520 and for some reason for her the expression

    units = locations_df.loc[np.repeat(vacant_units.index.values,
                             vacant_units.values.astype('int'))].reset_index()

is resulting in the 'buildings_id' label on locations_df.index being dropped. She's using Pandas 0.14.1 and is going to try updating to 0.16.1 to see if that has been fixed (I suspect it has been fixed, since @fscottfoti hasn't run into the same problem).

@lisalan520
Copy link
Author

Many thanks @jiffyclub !

I could only update my Pandas to 0.16.0 due to our firewall. The problem was still there. At this moment I don't think the error comes from pandas but I will continue to update it to 0.16.1.

Meanwhile, with the problem we've found, I changed out_fname to 'index' in the code:
new_buildings = pd.Series(units.loc[new_units.values]['index'].values, index=new_units.index)

and the model works fine after this change. Though I still don't understand why "building_id" turns into "index" in the loc() function..

But it seems the problem is solved for now. Thank you very much! I really appreciate your help!

@jiffyclub
Copy link
Member

So weird that locations_df.reset_index() preserves the name, but locations_df.loc[].reset_index() doesn't! But glad you have something working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants