Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add key enumeration and rank estimation metrics #10

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

BoyeGuillaume
Copy link
Collaborator

Rank Estimation and Key Enumeration

Rank estimation and key enumeration are two metrics attempting to evaluate the feasability of a full key recovery attack based on a underlying score metrics. In pratice however, all scores metrics introduced in this PR perform poorly against real and simulated data. Due to the modularity of this approach it is easy to subsitute those poor metrics with more interresting ones. It could also be interresting to evaluate the usage of neural networks and other type of machine learning algorithm to replace those scores in the future.

Changes

  • A new plugin type for scores
  • Rank estimation metric
  • Key Enumeration metric
  • Add parallelisation utility for faster parallel for-loop
  • Improve performance of pearson correlation computation using parallelisation
  • Improve the command line parser to enable for more advanced argument such as character escaping ('\ ') and comma (eg. "a string with space counting for a single argument")

This Pull-Request is the result of a semester project

BoyeGuillaume and others added 30 commits March 1, 2023 18:51
Changes:
* Implements `CsvLoader` a hard-coded loader for csv files
* Add new vendor for displaying progress bar
  (https://github.com/p-ranav/indicators)
* Add warning push/pop to prevent indicators warning from polluting
  compilation

TODO:
* Create a more generic `CsvLoader`
Changes :
* Add support for string containing space (you can now write "a string
  with space" which will get parsed as a single argument)
* Add support for special character (\t, \n, \\, \r currently supported)
  notice that this may cause problem on legacy command as window-style
  path is now ill-defined
* Add tuple<uint32_t, uint32_t> as a valid argument type, the syntax to
  write such tuple is `<first_integer>:<second_integer>`, for instance
  `0:15`
* Prepare changes for list of argument
Changes:
* Add multiline features for the command line (if a line ends with '\'
  will ask for a new line)
* Add multiple value option (enables option to have more than one value,
  each value is now a pair of [name,type] this don't change the old API
  nethertheless.
* Add the ability to specify option multiple times (using sub
  ArgumentList)
* Changes the prompt from 'metrisca>' to 'metrisca $ ' because we are
  fancy
* Add the `clear` command which should clean the console (should also
  support MAC and Linux even though I did not test it)

TODO:
* Cleaning up the API cause it is messy
Changes:
* Add the key_enumeration metric subcommand and setup the parser
* Add new arguments relative to this command within arg_list.hpp
* Add and register the new KeyEnumerationMetric plugin
* Initialized the KeyEnumerationMetric plugin

TODO:
* Finish the initialization of the plugin
* Implementation of the plugin
Changes:
 * Add the implementation of the ::Compute method for the
   KeyEnumerationMetric plugin

Bug:
 * A Numerical nightmare
Changes:
 * Remove the KeyEnumerationMetric plugin
 * Remove the plugin registration
 * Remove the command in the CLI
 * Remove the no longer required TupleUInt from the argument list

Notes:

Key enumeration works by applying a CPA attack on the target then filter
the best key. The idea is to compute the pearson correlation matrix
then select the sample that is leaking the most. The old implementation
supposed the user would select range and then the algorithm would
perform a correlation between the corresponding range and the model
under key assumption
Changes:
 * Add RankEstimationMetric plugin
Changes:
 * Add implementation of the metric RankEstimationMetric
 * Register this metric as a new plugin
 * Integrate this plugin to metricacli
 * Improve the LazyFunction
* Check that the used dataset does not uses fixed plaintext overwise the
pearson correlation coefficient is ill-defined and cannot be computed
* Add new type of error for unknown IO
* Add loading bar to the pearson_distinguisher because staring at a
  black screen is really sad
* Add a binary loader plugin to load binary traces
* Rework main to take this binary loader
* Add ability to programatically sets the key byte considered when
  evaluating a model
* Fix rank estimation
* Add parallel loop to speed up pearson distinguisher
* Fix issue with rank_estimation
Add pthread as a link option under linux target in order to use c++
std threading library. By default it is not included on most platform,
and as such produces and linking error when trying to links metriscacli
executable
* Add convolution utility
* Remove old code no longer in use
* Refactor a little bit the rank estimation metric to properly handle
multi-step computation
* Add construction of the histogram to the rank estimation metric
* Fix small typo that prevented the code from compiling
* Start implementation of lucian proba code

* Grouping traces by their expected output
* Taking the average of each of this group
* Adding new meta-information to the rank_estimation_metric

* continue working on lucian code

* Start adding chelensky inverse

* Add operator * between matrices
* Add tranpose function for matrices
* Add Cholesky decomposition and inverse for Matrix<T>

NOTES : For now these functions are not efficient due to memory copy
and reallocation. However this is only a prototype and doing something
that works should be our priority

* Add log-determinant computation in the key_rank

* Fix some bugs and refactor rank_estimation

* Fixing some bug with the Matrix<T> construct
* Refactor the rank_estimation_metric to split the metric into two
phases. First we retrieve the probabilities, and secondly the key
enumeration algorithm (aka. retrieve the rank of the whole
key/approximate)

* Add debug code for probabilities

* stuff

* Applies multiple fixes

* don't know

* Merging two parts together

* Merging the compute probabilities part with the histogram in order to achieve the goal.

Notice that currently issues may occurs with metric due to bad error-handling. Plus the matrix look suspect.

* send help

* Fix stability issue by reducing matrix size

* Reducing matrix size by keeping only the 60 most important sample
* Fix the find-bin overflowing when the sample is greater than the
maximum or smaller than the minimum by clamping the output

* Fix convolution and add tests for it using numpy

* Parallelize convolution stage to speed up compute

* Add error handling for unknown key

* improve parallel library

* Fix buffer overflow bug with histogram and bad behavior

* feat: start working on key rank enumeration

* pubfb

* Bug fixes and add rank_enumeration as submetric

* Fixes scores are not sorted during key enumeration
* Add KeyEnumerationMetric to the list of plugin during construction of
metriSCA
* Add subcommand key_enumeration in order to use this metric from the
cli

* Fix bug during initialization of KeyEnum plugin

* Add output to csv option in key enumeration

* fix bugs

* key enumeration bug fix

* Move away from the simpler heuristic to profiling

* Remove old score function
* Add simplest form of profiling using a second dataset
* Started moving away from old heuristic

* no clue

* fix: use of training set as testing set
@BoyeGuillaume BoyeGuillaume marked this pull request as ready for review August 21, 2023 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant