Network Simulator is a library that can generate artificial social network data, based in a news as a seed.
Network Simulator is in the PyPI repository as 'network-simulator'. To install it, you need to have pip installed. Then, use the following command to install the library
To complete the installation, it is needed an additional dependence, libenchant
. In Windows, it is not necessary to install it manually, but it is in Linux, you should use one of the following commands:
Depending on the distribution, various libraries could be available, for instance, in ubuntu, you can use:
The following simulation schema is split into environment agents and network agents.
In this particular case, the default schema (in schema.py
) in which the simulation is based, is as follows (based on soil schema example):
The environment agent models the probabilities, that coincide for every agent in the network and time. The time refers to the step at which simulation is. Environment agent also model agent's time connection. The time in what each agent entry to the network is modeled by a normal distribution, whose parameters, mean_time_connection
and var_time_connection
(in discrete units), can be modified from the following file: run_and_prompts.ipynb
. An agent begins to be susceptible when it is connected to the network, and this it stays connected until the state changes to died
.
The social network agents model autonomous individuals within social network. They have states, functions that run once per step.
At step 0, no agent is able to interact in the social network. This is modeled in the state time_0
.
Subsequently, an agent becomes susceptible when the social network time is higher than its time connection. This is modeled in the state no_susceptible
.
Thirdly,the agent can be susceptible to become infected by the news, if it has set as true its parameter has_tv
, in that case, it becomes infected with a probability prob_tv_spread
. Also, it can start out to be infected by infected nodes it is connected to, in other words, its neighbors in the social network. In this case, it becomes contagious with probability prob_neighbor_spread
. This is modeled in the state neutral
.
Now, when the agent is already infected, it has other behavior. In this case, the agent try to infect to all of his neighbors with probability prob_neighbor_spread
. Also, it could reinfect with probability prob_backsliding
, this means that the node rejoin to the network. For this instance, it samples an infected node to reply (Being the degree in the social network as a weight), even whether it is not neighbor of the original node. At last, the node can die with probability prob_died
. It is important to note that the agent always 'infect' with its last message, i.e., every reply are intended to the last message.
If an agent dies, it leaves social network. This is modeled in the state died
The types of social network agents are as follows:
It acquires the base behavior of a regular agent.
It acquires the behavior of a Dumb Agent, modifying
- It weights the probability of getting infected by a neighbor by the number of neighbors. The probability of being infected is lower than a Dumb Agent, unless all of neighbors are infected
It acquires the behavior of a Herd Agent, modifying:
- Now, the node can cure, that means a changing to the opposite opinion (response and stance). It also can try to cure other agents with a probability
prob_neighbor_cure
, only if they are wise type and they have a distinct stance (excluding neutral stance). This is modeled in the statecured
- It also can cure itself, which is modeled by the probability
prob_neighbor_cure
*vecinos_curados
/vecinos_infectados
. Outside this, the agent adopt the behavior of a Herd Agent. This is modeled in the stateinfected
The connection network between users is generated by a method called Barabassi-Albert, that receives two parameters, n
and m
.
n
is the number of social network agents that will have the social graph.m
is the number of edges that each agent is going to connect, using as connection probability weight the nodes degree. (For more information check Barabassi-Albert Method)
id_message
: It is a unique interaction id. It ranges between 1 and number of interactions.state
: Indicates the state in which the interaction was produced.stance
: Indicates whether the agents is in favor (agree
) or against (against
) the news. It could change during the simulation, but each agent has been set by default. In literature, this is known as 'affective polarization'.response
: Indicates the kind of comment that the agent do. It can be eithersupport
,deny
,question
, orcomment
. The probabilities depend on the weights given as inputs, but given certain stances, some responses can be blocked. In literature, the term is referred to as the 'truth stance'.repost
: Indicates whether a post is a mention of the one before or it is an 'original' post. The repost probability is given as input. it can be 0 or 1. This probability is not used for backsliding interaction.method
: Indicates the infection method. It could bebacksliding
(if it relapse into infection),tv
(if the infection was through the root node) orfriend
(if the infection was through a neighbor).cause
: Indicates the infection cause, if it was through a neighbor. It could be any id in social agent id's. It ranges between 1 andn
.parent_id
: Indicates the interaction id. It range between 1 and the number of interactions.
- If the interaction method is
backsliding
, the agent will keep its stance and response. - If the interaction is
repost
, the response will besupport
and the stanceagree
. - If the stance is
against
, the response probability ofsupport
will be 0. - If the stance is
agree
, the response probability ofquestion
ydeny
will be 0. - If the stance is
neutral
, the response probability ofsupport
ydeny
will be 0.
-
Agents parameters by defect: These are the parameters that each agent will have, unless some specified change.
-
Network agent configuration: Here, each agent class is going to be created. Each one has weights,
weight
, that indicate how likely they appear in the social network compared to the rest. Additionally, the specific desired parameters for each one can be set. At last, it must be added the correct type intype
. -
Environment agents configuration: Here, every spread probability is set. Also, are configured the parameters of the connection normal distribution, the time intervals and the social network generation graph and its parameters. Note: By defect, the simulator always simulate 1 step less than the set.
-
Responses probability: It is necessary to add weights for each response and agent in a dictionary (It is not necessary that weights sum to 1).
Cells in the file example.ipynb
must be executed. They are ready to go, basically, they do:
- Create the file
.yml
that indicate the simulation parameters. - Execute the soil command in cmd, that will run the simulation.
- Soil data output is gotten.
- Data is pivoted, for easy readability.
- The title and body news are defined.
- It is defined a Post instance, that will be the tree interaction root node.
- It is defined the initial and final time in the simulation.
- Each interaction data is used for create Post instances and add them to the interaction tree
- It is gotten the prompt of each interaction, giving as a parameter language, minimum and maximum characters and the user description.
- It is defined an endpoint to send the prompt and get a reply, that will be the text in the interaction, in essence, what the network agent post in the social network.
- English: It is suggested to use
gpt 3.5
, pre-implemented in gpt3_5/gpt3_5 module. - Spanish: It is suggested to use finetuned
LlaMa 2
- English: It is suggested to use
- The reply correctness is calculated (in terms of the non-existent word ratio). If the correctness is low, the text will try to correct. (This step is optional, but ensure better results)
The tree is visualizated in a DFS order. This structure is completely editable.
get_data.py
: It gets the functions that are used inexample.ipynb
for formatting, getting and cleaning soil output data.ia.py
: It gets the functions that are used for provide prompt and get the reply from the endpoint.post.py
: It creates Post class, that inherit from anytree class Node. Post class manage everything related to interactiontemplates.py
: It stores the prompt templates that are sent to the endpointtransform_time.py
: It transforms the simulator steps to continuous time in minutes and seconds. Mark: It consider just a time interval and sample the exact time inside it with a normal distribution.spelling_checker.py
: It contains the function that calculates the correctness of a text and the one that creates the prompt to correct a previous text.yml_create_functions.py
: It contains the function that transform input parameters in the beginning into .yml format.
The Post class stores each of the attributes of an interaction and also implements the following methods to handle the interaction:
It gets the prompt of each interaction
It generates the prompt if it is a reply to the root post
It generates the prompt if it is not a reply to the root post. The reply is directed to the first non-repost ancestor
Among the interactions, it gets the first ancestor that is not a repost.
The external libraries that were used are (versions are suggested):
- Python (>=3.6.x)
- Anytree (>=2.8.0)
- Soil (==0.20.7)
- Scipy (>=1.8.0)
- Numpy (>=1.24.3)
- Pyenchant (>=3.2.2)
There's an example that it is ready to use in the following link in google colab.
There's an example that it is ready to use in the following link. Note that it is necessary to have installed jupyter notebook to run the example, but you can built your own without an .ipynb file.
To run this example, it is needed:
- Install openai dependencies, with the following command
pip install network-simulator[openai]
- A schema, located in
schema/schema.py
. Default schema is in the following link. - An API KEY, in this case, for openai. This should be located in
parameters.py
, in the variableAPI_KEY
.