Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding/adding sensor is slow #1098

Open
robintw opened this issue Dec 2, 2021 · 0 comments
Open

Finding/adding sensor is slow #1098

robintw opened this issue Dec 2, 2021 · 0 comments

Comments

@robintw
Copy link
Collaborator

robintw commented Dec 2, 2021

When testing importing large volumes of AIS data, it seemed that the platform.get_sensor() call was slow. This calls find_sensor which also seemed to be slow.

I tried to speed this up by:

  • Adding indexes to fields like Sensors.name and lower(Sensors.name)
  • Changing to only query the sensor_id initially when checking whether a sensor with particular details exists, and only requesting the full object if it does exist.

None of these seemed to speed things up particularly. This should be investigated more, as there must be a way to speed this up.

A few outputs of the line profiler are below:

A profile of add_to_sensors:

File: /Users/robin/Documents/IanMayo/pepys-import/pepys_import/core/store/data_store.py
Function: add_to_sensors at line 415

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   415                                               # @profile
   416                                               def add_to_sensors(
   417                                                   self,
   418                                                   name,
   419                                                   sensor_type,
   420                                                   host_name,
   421                                                   host_nationality,
   422                                                   host_identifier,
   423                                                   privacy,
   424                                                   change_id,
   425                                                   host_id=None,
   426                                               ):
   427                                                   """
   428                                                   Adds the specified sensor to the :class:`Sensor` table if not already present.
   429
   430                                                   :param name: Name of sensor
   431                                                   :type name: String
   432                                                   :param sensor_type: Type of sensor
   433                                                   :type sensor_type: String
   434                                                   :param host_name: Name of Platform that sensor belongs to
   435                                                   :type host_name: String
   436                                                   :param host_nationality: Nationality of Platform that sensor belongs to
   437                                                   :type host_nationality: String
   438                                                   :param host_identifier: Identifier of Platform that sensor belongs to
   439                                                   :type host_identifier: String
   440                                                   :param privacy: :class:`Privacy` of :class:`State`
   441                                                   :type privacy: String
   442                                                   :param change_id: ID of the :class:`Change` object
   443                                                   :type change_id: Integer or UUID
   444                                                   :param host_id: ID of Platform that sensor belongs to (optional, can be passed instead
   445                                                                   of host_name, host_nationality and host_identifier)
   446                                                   :return: Created Sensor entity
   447
   448                                                   Notes:
   449                                                   To specify the platform that the added sensor should belong to you can either:
   450                                                    - Specify the host_name, host_nationality and host_identifier parameters, to uniquely identify the Platform
   451                                                    - Specify the host_id parameter to give the ID of the Platform, and set host_name, host_nationality and host_identifier to None
   452                                                   """
   453       648       2054.0      3.2      0.0          if host_id is not None:
   454       648    3080322.0   4753.6     32.7              host = self.search_platform_by_id(host_id)
   455                                                   else:
   456                                                       host = self.search_platform(host_name, host_nationality, host_identifier)
   457
   458       648      30472.0     47.0      0.3          sensor_type = self.search_sensor_type(sensor_type)
   459       648      13113.0     20.2      0.1          privacy = self.search_privacy(privacy)
   460
   461       648        823.0      1.3      0.0          if sensor_type is None:
   462                                                       raise MissingDataException("Sensor Type is missing/invalid")
   463       648        763.0      1.2      0.0          elif host is None:
   464                                                       raise MissingDataException("Host is missing/invalid")
   465       648        630.0      1.0      0.0          elif privacy is None:
   466                                                       raise MissingDataException("Privacy is missing/invalid")
   467
   468                                                   # Check if entry already exists with these details, and if so, just return it
   469                                                   # Just check the unique fields - in this case: name and host
   470                                                   # TODO: Possibly update when we get final uniqueness info from client
   471                                                   # results = (
   472                                                   #     self.session.query(self.db_classes.Sensor)
   473                                                   #     .filter(func.lower(self.db_classes.Sensor.name) == lowercase_or_none(name))
   474                                                   #     .filter(self.db_classes.Sensor.host == host.platform_id)
   475                                                   #     .all()
   476                                                   # )
   477       648       1033.0      1.6      0.0          results = (
   478      1944    1227691.0    631.5     13.0              self.session.query(self.db_classes.Sensor.sensor_id)
   479       648     202341.0    312.3      2.1              .filter(func.lower(self.db_classes.Sensor.name) == lowercase_or_none(name))
   480       648      88637.0    136.8      0.9              .filter(self.db_classes.Sensor.host == host.platform_id)
   481                                                       .all()
   482                                                   )
   483
   484       648       1070.0      1.7      0.0          if len(results) == 1:
   485                                                       # Don't add it, as it already exists - just return it
   486                                                       sensor_obj = self.session.query(self.db_classes.Sensor).filter(self.db_classes.Sensor.sensor_id == results[0])
   487                                                       return sensor_obj
   488       648        817.0      1.3      0.0          elif len(results) > 1:
   489                                                       assert (
   490                                                           False
   491                                                       ), "Fatal error: Duplicate entries found in Sensors table"  # pragma: no cover
   492
   493      1296      63896.0     49.3      0.7          sensor_obj = self.db_classes.Sensor(
   494       648        645.0      1.0      0.0              name=name,
   495       648       4159.0      6.4      0.0              sensor_type_id=sensor_type.sensor_type_id,
   496       648       2211.0      3.4      0.0              host=host.platform_id,
   497       648       2557.0      3.9      0.0              privacy_id=privacy.privacy_id,
   498                                                   )
   499       648      75116.0    115.9      0.8          self.session.add(sensor_obj)
   500       648    2621989.0   4046.3     27.9          self.session.flush()
   501
   502       648    1990714.0   3072.1     21.2          self.add_to_logs(table=constants.SENSOR, row_id=sensor_obj.sensor_id, change_id=change_id)
   503       648        799.0      1.2      0.0          return sensor_obj

A profile of get_sensor:

File: /Users/robin/Documents/IanMayo/pepys-import/pepys_import/core/store/common_db.py
Function: get_sensor at line 200

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   200                                               # @profile
   201                                               def get_sensor(
   202                                                   self,
   203                                                   data_store,
   204                                                   sensor_name=None,
   205                                                   sensor_type=None,
   206                                                   privacy=None,
   207                                                   change_id=None,
   208                                               ):
   209                                                   """
   210                                                    Lookup or create a sensor of this name for this :class:`Platform`.
   211                                                    Specified sensor will be added to the :class:`Sensor` table.
   212                                                    It uses find_sensor method to search existing sensors.
   213
   214                                                   :param data_store: DataStore object to to query DB and use missing data resolver
   215                                                   :type data_store: DataStore
   216                                                   :param sensor_name: Name of :class:`Sensor`
   217                                                   :type sensor_name: String
   218                                                   :param sensor_type: Type of :class:`Sensor`
   219                                                   :type sensor_type: String
   220                                                   :param privacy: Privacy of :class:`Sensor`
   221                                                   :type privacy: String
   222                                                   :param change_id: ID of the :class:`Change` object
   223                                                   :type change_id: Integer or UUID
   224                                                   :return: Created :class:`Sensor` entity
   225                                                   :rtype: Sensor
   226                                                   """
   227      1720       3819.0      2.2      0.0          Sensor = data_store.db_classes.Sensor
   228
   229      1720    8820846.0   5128.4     48.7          sensor = Sensor().find_sensor(data_store, sensor_name, self.platform_id)
   230      1720        987.0      0.6      0.0          if sensor:
   231       928        406.0      0.4      0.0              return sensor
   232
   233       792      30676.0     38.7      0.2          sensor_type_obj = data_store.search_sensor_type(sensor_type)
   234       792      13172.0     16.6      0.1          privacy_obj = data_store.search_privacy(privacy)
   235       792        841.0      1.1      0.0          if sensor_type_obj is None or privacy_obj is None:
   236                                                       # We don't have access to the platform type attribute on self
   237                                                       # as it has been expunged by now, so query the database and check it
   238                                                       platform = (
   239                                                           data_store.session.query(data_store.db_classes.Platform)
   240                                                           .filter(data_store.db_classes.Platform.platform_id == self.platform_id)
   241                                                           .one()
   242                                                       )
   243                                                       platform_type_name = platform.platform_type_name
   244                                                       if platform_type_name == "Unknown":
   245                                                           # If we're dealing with an unknown Platform, then don't ask the user for
   246                                                           # sensor details, just create them with whatever information we've got
   247                                                           # and use UUIDs/Unknown for the missing bits
   248                                                           if sensor_name is None:
   249                                                               sensor_name = str(uuid.uuid4())
   250
   251                                                           if sensor_type_obj is None:
   252                                                               sensor_type = "Unknown"
   253                                                           else:
   254                                                               sensor_type = sensor_type_obj.name
   255
   256                                                           if privacy_obj is None:
   257                                                               privacy = get_lowest_privacy(data_store)
   258                                                           else:
   259                                                               privacy = privacy_obj.name
   260
   261                                                           return data_store.add_to_sensors(
   262                                                               name=sensor_name,
   263                                                               sensor_type=sensor_type,
   264                                                               host_name=None,
   265                                                               host_nationality=None,
   266                                                               host_identifier=None,
   267                                                               host_id=self.platform_id,
   268                                                               privacy=privacy,
   269                                                               change_id=change_id,
   270                                                           )
   271                                                       resolved_data = data_store.missing_data_resolver.resolve_sensor(
   272                                                           data_store, sensor_name, sensor_type, self.platform_id, privacy, change_id
   273                                                       )
   274                                                       # It means that new sensor added as a synonym and existing sensor returned
   275                                                       if isinstance(resolved_data, Sensor):
   276                                                           return resolved_data
   277                                                       elif len(resolved_data) == 3:
   278                                                           (
   279                                                               sensor_name,
   280                                                               sensor_type_obj,
   281                                                               privacy_obj,
   282                                                           ) = resolved_data
   283
   284      1584       1810.0      1.1      0.0          assert isinstance(
   285       792       1131.0      1.4      0.0              sensor_type_obj, data_store.db_classes.SensorType
   286                                                   ), "Type error for Sensor Type entity"
   287      1584       1266.0      0.8      0.0          assert isinstance(
   288       792        681.0      0.9      0.0              privacy_obj, data_store.db_classes.Privacy
   289                                                   ), "Type error for Privacy entity"
   290
   291      1584    9229404.0   5826.6     50.9          return data_store.add_to_sensors(
   292       792        502.0      0.6      0.0              name=sensor_name,
   293       792       3763.0      4.8      0.0              sensor_type=sensor_type_obj.name,
   294       792        548.0      0.7      0.0              host_name=None,
   295       792        532.0      0.7      0.0              host_nationality=None,
   296       792        538.0      0.7      0.0              host_identifier=None,
   297       792       2980.0      3.8      0.0              host_id=self.platform_id,
   298       792       4093.0      5.2      0.0              privacy=privacy_obj.name,
   299       792        614.0      0.8      0.0              change_id=change_id,
   300                                                   )

A profile of 'find_sensor':

File: /Users/robin/Documents/IanMayo/pepys-import/pepys_import/core/store/common_db.py
Function: find_sensor at line 120

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   120                                               @classmethod
   121                                               # @profile
   122                                               def find_sensor(cls, data_store, sensor_name, platform_id):
   123                                                   """
   124                                                   This method tries to find a Sensor entity with the given sensor_name. If it
   125                                                   finds, it returns the entity. If it is not found, it searches synonyms.
   126
   127                                                   :param data_store: A :class:`DataStore` object
   128                                                   :type data_store: DataStore
   129                                                   :param sensor_name: Name of :class:`Sensor`
   130                                                   :type sensor_name: String
   131                                                   :param platform_id:  Primary key of the Platform that Sensor belongs to
   132                                                   :type platform_id: int
   133                                                   :return:
   134                                                   """
   135                                                   # If we don't have a sensor name then we can't search by name!
   136      1720       1223.0      0.7      0.0          if sensor_name is None:
   137                                                       return None
   138
   139      1720       3824.0      2.2      0.0          cached_result = data_store._sensor_cache.get((sensor_name, platform_id))
   140      1720        736.0      0.4      0.0          if cached_result:
   141        11          4.0      0.4      0.0              return cached_result
   142
   143      1709       2423.0      1.4      0.0          sensor_id = (
   144      5127    2385713.0    465.3     27.3              data_store.session.query(data_store.db_classes.Sensor.sensor_id)
   145      1709     179611.0    105.1      2.1              .filter(data_store.db_classes.Sensor.name == sensor_name)
   146      1709     136191.0     79.7      1.6              .filter(data_store.db_classes.Sensor.host == platform_id)
   147                                                       .first()
   148                                                   )
   149      1709     151643.0     88.7      1.7
   150      1709       1137.0      0.7      0.0          if sensor_id:
   151      1834    5740946.0   3130.3     65.7              sensor = data_store.session.query(data_store.db_classes.Sensor).filter(
   152       917      77917.0     85.0      0.9                  data_store.db_classes.Sensor.sensor_id == sensor_id[0]
   153                                                       ).first()
   154       917      51977.0     56.7      0.6              data_store.session.expunge(sensor)
   155       917       1386.0      1.5      0.0              data_store._sensor_cache[(sensor_name, platform_id)] = sensor
   156       917        298.0      0.3      0.0              return sensor
   157
   158       792        332.0      0.4      0.0          return None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant