You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When testing importing large volumes of AIS data, it seemed that the platform.get_sensor() call was slow. This calls find_sensor which also seemed to be slow.
I tried to speed this up by:
Adding indexes to fields like Sensors.name and lower(Sensors.name)
Changing to only query the sensor_id initially when checking whether a sensor with particular details exists, and only requesting the full object if it does exist.
None of these seemed to speed things up particularly. This should be investigated more, as there must be a way to speed this up.
A few outputs of the line profiler are below:
A profile of add_to_sensors:
File: /Users/robin/Documents/IanMayo/pepys-import/pepys_import/core/store/data_store.py
Function: add_to_sensors at line 415
Line # Hits Time Per Hit % Time Line Contents
==============================================================
415 # @profile
416 def add_to_sensors(
417 self,
418 name,
419 sensor_type,
420 host_name,
421 host_nationality,
422 host_identifier,
423 privacy,
424 change_id,
425 host_id=None,
426 ):
427 """
428 Adds the specified sensor to the :class:`Sensor` table if not already present.
429
430 :param name: Name of sensor
431 :type name: String
432 :param sensor_type: Type of sensor
433 :type sensor_type: String
434 :param host_name: Name of Platform that sensor belongs to
435 :type host_name: String
436 :param host_nationality: Nationality of Platform that sensor belongs to
437 :type host_nationality: String
438 :param host_identifier: Identifier of Platform that sensor belongs to
439 :type host_identifier: String
440 :param privacy: :class:`Privacy` of :class:`State`
441 :type privacy: String
442 :param change_id: ID of the :class:`Change` object
443 :type change_id: Integer or UUID
444 :param host_id: ID of Platform that sensor belongs to (optional, can be passed instead
445 of host_name, host_nationality and host_identifier)
446 :return: Created Sensor entity
447
448 Notes:
449 To specify the platform that the added sensor should belong to you can either:
450 - Specify the host_name, host_nationality and host_identifier parameters, to uniquely identify the Platform
451 - Specify the host_id parameter to give the ID of the Platform, and set host_name, host_nationality and host_identifier to None
452 """
453 648 2054.0 3.2 0.0 if host_id is not None:
454 648 3080322.0 4753.6 32.7 host = self.search_platform_by_id(host_id)
455 else:
456 host = self.search_platform(host_name, host_nationality, host_identifier)
457
458 648 30472.0 47.0 0.3 sensor_type = self.search_sensor_type(sensor_type)
459 648 13113.0 20.2 0.1 privacy = self.search_privacy(privacy)
460
461 648 823.0 1.3 0.0 if sensor_type is None:
462 raise MissingDataException("Sensor Type is missing/invalid")
463 648 763.0 1.2 0.0 elif host is None:
464 raise MissingDataException("Host is missing/invalid")
465 648 630.0 1.0 0.0 elif privacy is None:
466 raise MissingDataException("Privacy is missing/invalid")
467
468 # Check if entry already exists with these details, and if so, just return it
469 # Just check the unique fields - in this case: name and host
470 # TODO: Possibly update when we get final uniqueness info from client
471 # results = (
472 # self.session.query(self.db_classes.Sensor)
473 # .filter(func.lower(self.db_classes.Sensor.name) == lowercase_or_none(name))
474 # .filter(self.db_classes.Sensor.host == host.platform_id)
475 # .all()
476 # )
477 648 1033.0 1.6 0.0 results = (
478 1944 1227691.0 631.5 13.0 self.session.query(self.db_classes.Sensor.sensor_id)
479 648 202341.0 312.3 2.1 .filter(func.lower(self.db_classes.Sensor.name) == lowercase_or_none(name))
480 648 88637.0 136.8 0.9 .filter(self.db_classes.Sensor.host == host.platform_id)
481 .all()
482 )
483
484 648 1070.0 1.7 0.0 if len(results) == 1:
485 # Don't add it, as it already exists - just return it
486 sensor_obj = self.session.query(self.db_classes.Sensor).filter(self.db_classes.Sensor.sensor_id == results[0])
487 return sensor_obj
488 648 817.0 1.3 0.0 elif len(results) > 1:
489 assert (
490 False
491 ), "Fatal error: Duplicate entries found in Sensors table" # pragma: no cover
492
493 1296 63896.0 49.3 0.7 sensor_obj = self.db_classes.Sensor(
494 648 645.0 1.0 0.0 name=name,
495 648 4159.0 6.4 0.0 sensor_type_id=sensor_type.sensor_type_id,
496 648 2211.0 3.4 0.0 host=host.platform_id,
497 648 2557.0 3.9 0.0 privacy_id=privacy.privacy_id,
498 )
499 648 75116.0 115.9 0.8 self.session.add(sensor_obj)
500 648 2621989.0 4046.3 27.9 self.session.flush()
501
502 648 1990714.0 3072.1 21.2 self.add_to_logs(table=constants.SENSOR, row_id=sensor_obj.sensor_id, change_id=change_id)
503 648 799.0 1.2 0.0 return sensor_obj
A profile of get_sensor:
File: /Users/robin/Documents/IanMayo/pepys-import/pepys_import/core/store/common_db.py
Function: get_sensor at line 200
Line # Hits Time Per Hit % Time Line Contents
==============================================================
200 # @profile
201 def get_sensor(
202 self,
203 data_store,
204 sensor_name=None,
205 sensor_type=None,
206 privacy=None,
207 change_id=None,
208 ):
209 """
210 Lookup or create a sensor of this name for this :class:`Platform`.
211 Specified sensor will be added to the :class:`Sensor` table.
212 It uses find_sensor method to search existing sensors.
213
214 :param data_store: DataStore object to to query DB and use missing data resolver
215 :type data_store: DataStore
216 :param sensor_name: Name of :class:`Sensor`
217 :type sensor_name: String
218 :param sensor_type: Type of :class:`Sensor`
219 :type sensor_type: String
220 :param privacy: Privacy of :class:`Sensor`
221 :type privacy: String
222 :param change_id: ID of the :class:`Change` object
223 :type change_id: Integer or UUID
224 :return: Created :class:`Sensor` entity
225 :rtype: Sensor
226 """
227 1720 3819.0 2.2 0.0 Sensor = data_store.db_classes.Sensor
228
229 1720 8820846.0 5128.4 48.7 sensor = Sensor().find_sensor(data_store, sensor_name, self.platform_id)
230 1720 987.0 0.6 0.0 if sensor:
231 928 406.0 0.4 0.0 return sensor
232
233 792 30676.0 38.7 0.2 sensor_type_obj = data_store.search_sensor_type(sensor_type)
234 792 13172.0 16.6 0.1 privacy_obj = data_store.search_privacy(privacy)
235 792 841.0 1.1 0.0 if sensor_type_obj is None or privacy_obj is None:
236 # We don't have access to the platform type attribute on self
237 # as it has been expunged by now, so query the database and check it
238 platform = (
239 data_store.session.query(data_store.db_classes.Platform)
240 .filter(data_store.db_classes.Platform.platform_id == self.platform_id)
241 .one()
242 )
243 platform_type_name = platform.platform_type_name
244 if platform_type_name == "Unknown":
245 # If we're dealing with an unknown Platform, then don't ask the user for
246 # sensor details, just create them with whatever information we've got
247 # and use UUIDs/Unknown for the missing bits
248 if sensor_name is None:
249 sensor_name = str(uuid.uuid4())
250
251 if sensor_type_obj is None:
252 sensor_type = "Unknown"
253 else:
254 sensor_type = sensor_type_obj.name
255
256 if privacy_obj is None:
257 privacy = get_lowest_privacy(data_store)
258 else:
259 privacy = privacy_obj.name
260
261 return data_store.add_to_sensors(
262 name=sensor_name,
263 sensor_type=sensor_type,
264 host_name=None,
265 host_nationality=None,
266 host_identifier=None,
267 host_id=self.platform_id,
268 privacy=privacy,
269 change_id=change_id,
270 )
271 resolved_data = data_store.missing_data_resolver.resolve_sensor(
272 data_store, sensor_name, sensor_type, self.platform_id, privacy, change_id
273 )
274 # It means that new sensor added as a synonym and existing sensor returned
275 if isinstance(resolved_data, Sensor):
276 return resolved_data
277 elif len(resolved_data) == 3:
278 (
279 sensor_name,
280 sensor_type_obj,
281 privacy_obj,
282 ) = resolved_data
283
284 1584 1810.0 1.1 0.0 assert isinstance(
285 792 1131.0 1.4 0.0 sensor_type_obj, data_store.db_classes.SensorType
286 ), "Type error for Sensor Type entity"
287 1584 1266.0 0.8 0.0 assert isinstance(
288 792 681.0 0.9 0.0 privacy_obj, data_store.db_classes.Privacy
289 ), "Type error for Privacy entity"
290
291 1584 9229404.0 5826.6 50.9 return data_store.add_to_sensors(
292 792 502.0 0.6 0.0 name=sensor_name,
293 792 3763.0 4.8 0.0 sensor_type=sensor_type_obj.name,
294 792 548.0 0.7 0.0 host_name=None,
295 792 532.0 0.7 0.0 host_nationality=None,
296 792 538.0 0.7 0.0 host_identifier=None,
297 792 2980.0 3.8 0.0 host_id=self.platform_id,
298 792 4093.0 5.2 0.0 privacy=privacy_obj.name,
299 792 614.0 0.8 0.0 change_id=change_id,
300 )
A profile of 'find_sensor':
File: /Users/robin/Documents/IanMayo/pepys-import/pepys_import/core/store/common_db.py
Function: find_sensor at line 120
Line # Hits Time Per Hit % Time Line Contents
==============================================================
120 @classmethod
121 # @profile
122 def find_sensor(cls, data_store, sensor_name, platform_id):
123 """
124 This method tries to find a Sensor entity with the given sensor_name. If it
125 finds, it returns the entity. If it is not found, it searches synonyms.
126
127 :param data_store: A :class:`DataStore` object
128 :type data_store: DataStore
129 :param sensor_name: Name of :class:`Sensor`
130 :type sensor_name: String
131 :param platform_id: Primary key of the Platform that Sensor belongs to
132 :type platform_id: int
133 :return:
134 """
135 # If we don't have a sensor name then we can't search by name!
136 1720 1223.0 0.7 0.0 if sensor_name is None:
137 return None
138
139 1720 3824.0 2.2 0.0 cached_result = data_store._sensor_cache.get((sensor_name, platform_id))
140 1720 736.0 0.4 0.0 if cached_result:
141 11 4.0 0.4 0.0 return cached_result
142
143 1709 2423.0 1.4 0.0 sensor_id = (
144 5127 2385713.0 465.3 27.3 data_store.session.query(data_store.db_classes.Sensor.sensor_id)
145 1709 179611.0 105.1 2.1 .filter(data_store.db_classes.Sensor.name == sensor_name)
146 1709 136191.0 79.7 1.6 .filter(data_store.db_classes.Sensor.host == platform_id)
147 .first()
148 )
149 1709 151643.0 88.7 1.7
150 1709 1137.0 0.7 0.0 if sensor_id:
151 1834 5740946.0 3130.3 65.7 sensor = data_store.session.query(data_store.db_classes.Sensor).filter(
152 917 77917.0 85.0 0.9 data_store.db_classes.Sensor.sensor_id == sensor_id[0]
153 ).first()
154 917 51977.0 56.7 0.6 data_store.session.expunge(sensor)
155 917 1386.0 1.5 0.0 data_store._sensor_cache[(sensor_name, platform_id)] = sensor
156 917 298.0 0.3 0.0 return sensor
157
158 792 332.0 0.4 0.0 return None
The text was updated successfully, but these errors were encountered:
When testing importing large volumes of AIS data, it seemed that the
platform.get_sensor()
call was slow. This callsfind_sensor
which also seemed to be slow.I tried to speed this up by:
Sensors.name
andlower(Sensors.name)
None of these seemed to speed things up particularly. This should be investigated more, as there must be a way to speed this up.
A few outputs of the line profiler are below:
A profile of
add_to_sensors
:A profile of
get_sensor
:A profile of 'find_sensor':
The text was updated successfully, but these errors were encountered: