Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msc placeholder: 5G overlay infrastructure for decentralised learning ??? #7258

Open
synctext opened this issue Jan 13, 2023 · 60 comments
Open
Assignees

Comments

@synctext
Copy link
Member

synctext commented Jan 13, 2023

Thesis defense target: 21 June 2024. Survey target: end of July 2023.
Would like to have a fresh master thesis topic, not incremental improvement of other thesis work.
Starting roughly Q1 2023 or summer of 2023, flexible. update: starting lit. survey 2nd May
update 2: literature survey finished: 3 oct 2023.

RTOS expertise. AWS. Dream of contributing to The Linux Kernel. Byte-level stuff OK, even assembly person in the age of Javascript :-) Like to use machine learning, but not invent new ML stuff or central focus of thesis (no unsupervised learning, no online learning). Thus more ML that is: adversarial, byzantine, decentralised, personalised, local-first AI, edge-devices only, low-power hardware accelerated. Prefer to utilise advanced algorithms msc course knowledge.

Possible brainstorm starting idea: start building the fastest machine learning based on hardware acceleration. First step is get the hardware running fast, stepwise modify algorithms and tweak towards machine learning for learn-to-rank, learn-through-consumption, or even learn-about-trust (reputation graph, work graph, MeritRank inspired etc). Promised phones to test.

  • Applied ML direction {less interested}. Related work to astronomical hardware cost for AI. OpenAI has spend $63M on hardware at least:
In February 2018, the Organization entered into a two year services agreement with Google, LLC for cloud computing
services. The terms include a minimum spend commitment of $63M during the service period which the Organization
had fully satisfied as of the date of financial statement issuance

https://rct.doj.ca.gov/Verification/Web/Download.aspx?saveas=560291.pdf&document_id=09027b8f803a8976 [source]

@synctext
Copy link
Member Author

synctext commented Mar 23, 2023

Concrete idea for NAT survey

This survey describes the progress in the field of an Internet which is fully connected, currently mobile devices are not fully participating within the network. Smartphones are unable to receive message from others. Only Facebook, Google, and other servers in the cloud are able to communicate with billions of smartphone users. In the name of security billions of users have a constrained network, without freedom to communicate.

Find on scholar

Year scientific article or report
2000 SIP, NAT, and Firewalls - master thesis KTH
2003 Network convergence and the NAT/Firewall problems
2005 Characterization and measurement of tcp traversal through nats and firewalls
2006 Implementation and performance study of a new NAT/firewall signaling protocol
2008 A Better Approach than Carrier-Grade-NAT
2008 Free-riding, fairness, and firewalls in p2p file-sharing
2009 A measurement of NAT and firewall characteristics in peer-to-peer systems
2011 Delft work UDP NAT and Firewall Puncturing in the Wild
2011 Tribler: P2p media search and sharing
2013 Assessing the impact of carrier-grade NAT on network applications
2013 Common requirements for carrier-grade NATs (CGNs)
2013 A Royal Opinion on Carrier Grade NATs
2013 BT Retail Tests IP Address Sharing
2014 On the performance and fairness of BitTorrent-like data swarming systems with NAT devices
2014 Deterministic Address Mapping to Reduce Logging in Carrier-Grade NAT Deployments
2016 Carrier-grade NAT—is it really secure for customers? A test on a Turkish service provider
2016 A multi-perspective analysis of carrier-grade NAT deployment
2016 Statistical network monitoring: Methodology and application to carrier-grade NAT
2016 Overudp: Tunneling transport layer protocols in udp for p2p application of ipv4
2018 Inferring carrier-grade NAT deployment in the wild
2018 IETF Internet Standard draft on Trustchain
2020 birthday paradox solution https://tailscale.com/blog/how-nat-traversal-works/
2020 https://github.com/danderson/nat-birthday-paradox
2021 A QUIC (K) Way Through Your Firewall?
2021 Hardware details, Fortigate: https://news.ycombinator.com/item?id=27489797
2022 How NAT traversal works — NAT notes for nerds
2023 Doomed to Repeat with IPv6? Characterization of NAT-centric Security in SOHO Routers

Taken from the master thesis of 2000:
image

ToDo1: 30 citations to carrier grade NAT, and all these topics.
ToDo2: taxonomy list, https://www.rfc-editor.org/rfc/rfc3234

Finally, we investigated various telecom providers in The Netherlands about their NAT and blocking practices. We procured 12 SIM cards and measured their behavior. See full connectivity matrix of Sim-to-Sim card. Only 3 offer free Internet... {ToDo}.

TODO: register at https://mare.ewi.tudelft.nl/project 📝

@OrestisKan
Copy link

@synctext registration is for the thesis not for the literature survey no ?

@synctext
Copy link
Member Author

indeed, it's nice if you register your thesis as early as possible.

@OrestisKan
Copy link

Literature_Survey.pdf

@synctext
Copy link
Member Author

synctext commented May 31, 2023

Feel free to add a bit more content on reproducing state-of-the-art literature.

scientific problem of universal connectivity is not explained clearly. Storyline goes too fast, page 2 already has "port-restricted cone NAT". Take .5 page for a tutorial on the concept of an incoming connection. Need structure!

Section 5. Reproducing results from literature
After presenting the relevant 34 prior works of covered in this survey we now combine the state-of-the-art results. Using a practical experimental evaluation we reproduced the best-of-class algorithms presented in the discovered literature. We confirmed the findings of the body of literature within our reproduction experiment. Our simple app reproduces the NAT penetration algorithms of the main literature [2,5,17]. Cardinal outcome of our experimental CGNAT evaluation is the success rate, something often lacking in studies. The success rate for various Dutch telecom providers is determined to be: 97%.
ETC.

EDIT: brainstorm about master thesis focus. Idea for title: "5G overlay infrastructure for edge-based decentralised learning". Context to sell your perfect_overlay effort. Only need a few weeks doing a minimal-viable-product of decentralised machine learning. Simply take this gossip-based ML algorithm and running code. Goal: 100 actual nodes {mixed real ARM Android and x86 Kotlin}!

@OrestisKan
Copy link

OrestisKan commented May 31, 2023

  1. Polish text
  2. Create taxonomy table with the literature (from the literature survey from other student): https://arxiv.org/abs/2212.06436
  3. Create an app and test penetration-rate using ipv8-kotlin

@synctext
Copy link
Member Author

synctext commented Jun 19, 2023

MARE: "5G overlay infrastructure for decentralised learning"

Update:

@synctext synctext changed the title msc placeholder: brainstorm on search, real-time, or hardware msc placeholder: 5G overlay infrastructure for decentralised learning ??? Jun 19, 2023
@synctext
Copy link
Member Author

synctext commented Jul 10, 2023

Goal: mechanism for one phone to help another phone to puncture their carrier-grade NAT.

@OrestisKan
Copy link

Literature_Survey.pdf

@synctext
Copy link
Member Author

synctext commented Jul 17, 2023

Nearly done with the Lit Survey. [38] citations to forums and scientific papers. Great result to include:
puncure log
Just put as-simple-as-possible description of the SIM cards from 6 different 4G/5G providers.

Research Assistant job: send 50 UDP packets, count how many arrive. Repeat for all SIM-card combinations. Test the performance of EVA, note that you can then quickly run out of your 100-ish MByte SIM data quota. Read from Rahim on the binary transport protocol called EVA. See some example code: https://github.com/KoningR/eurotoken/blob/5c84348ba16dd9ce4b97e53ff52a5cefe9ee97c1/src/main/kotlin/evatest/EvaApplication.kt

@OrestisKan
Copy link

OrestisKan commented Aug 18, 2023

Literature_Survey (2).pdf

  • Add the pictures with the explanations my dad said
  • Table with red green on which carriers failed and succeeded and why Lyca failed
  • Picture of the sim cards and the phones etc
  • Also the picture above (17th july)

Lyca is symmetric NAT, the rest (Lebara, TMOBILE and vodaphone) could cross communicate while they all failed with Lyca ( even Lyca to Lyca communication failed). Theoretically with Birthday paradox Lyca to Lyca communication may be achieved. We need to determine the address and port predictability in order to understand how long it would take for the NAT to be penetrated and how long it would take for Lyca to block the requests

Willingess to travel (and I have accommodation maybe?)

  • Luxembourg
  • France
  • Italy
  • Germany
  • Croatia
  • UK
  • Ireland
  • Greece
  • Cyprus
  • Portugal
  • Belgium
  • Austria
  • Romania
  • North Macedona
  • Bosnia Herzegobina
  • Serbia
  • New Jersey
  • Vancouver Canada
  • Australia
  • Egypt
  • Chile

Reason for traveling: Live physical testing 4g and G5 communications and procurement of SIM cards

Research assistantship ending 30/09/23!

@synctext
Copy link
Member Author

synctext commented Aug 18, 2023

  • €50 for Cyprus based 3G/4G SIM cards for local experimental study
  • The goal of the app was to determine whether IPv8 could make two phones each on a different carrier’s 5G running kotlin-ipv8 and a computer running on WiFi the JVM version of IPv8 discover each other and communicate by penetrating potential NATs that are in the way. Please replace with a more scientific wording, leave out all engineering. Suggestion: With real-world 4G/5G SIM-cards we experimentally determined the efficacy of the NAT puncturing methods described in the literature. For this purpose we significantly re-factored a networking library called "IPv8" developed in 2020 at Delft University of Technology. The IPv8 library uses Kotlin to implement the UDP NAT puncturing approach documented in an expired IETF Internet Standard draft from 2018[REF].
  • Action for adviser: try to allocate €5000 (???) in hardware and travel expenses for on-site EU-wide networking experiments within master thesis or Research Assistant in Tribler Lab position. Back of envelope calculation, 50 SIM card x €20 = €1000. Remaining travel budget then €4000. Android Machine Learning is working. With @quintene porting of TensorFlow Light on Android, we could together test "Decentralised AI beyond federated learning on 5G".
  • IPv6 is happening after 30 years, needs testing. https://news.ycombinator.com/item?id=32798003
  • Discussing post-Delft degree options: responsible AI for 5G grant funding https://www.ngi.eu/opencalls/#ngizeroreview
  • Has Amazon Big Tech experience
  • Ambitious master thesis {or goal of entire lab this year}, greatly expanding the scope (builds on your assembly & Big Tech experience)
    1. flawless EU-wide NAT puncturing
    2. Effortless UDP-based binary transfer between any two phones
    3. Cryptographic key pair for any unique smartphone for privacy-respecting self-sovereign identity (IPv8 ID)
    4. Cryptographic certificates of rendezvous between any two phones
    5. Minimal viable realisation of ranking function and MeritRank, see https://arxiv.org/pdf/2308.07148.pdf
    6. Gossip exchange with bias to peers ranked as trustworthy
    7. Decentralised AI. integrate gossip exchange with @quintene for decentralised machine learning (BeyondFederated - truly decentralised learning at the edge #7254).

@OrestisKan
Copy link

Final Literature Survey with the suggested improvements
Literature_Survey.pdf

@synctext
Copy link
Member Author

synctext commented Sep 11, 2023

Comments on this latest survey:

  • Title could be more informative with survey and carrier-grade NAT mentioning: "Survey and experiments on carrier-grade NATs.
  • "Internet connectivity nowadays has become a fundamental necessity.", superior flow of opening line by dropping "nowadays".
  • "While CGNAT has been successful in conserving IPv4 addresses", more a miximizing usage technique.
  • "NAT Punctuting", "stealling", typo stuff. No spell checking ? 🤕
  • "TABLE I: Overview of all peer-to-peer techniques to establish communication behind NATs" order by year?
  • Figure 3: " Two machines behind Firewalls using a synchronizer", please expand fully to 1 or 2 columns (+ fix "send sdummy")
  • "the maximum packets per second that a machine can send is ≈ 56000,", confusing, too little info, birthday paradox?
  • "C. Peers where both are behind an EDM-based NAT", define the cardinal performance parameter of NAT Hole Opening Time. If 10k packets get out per second and any UDP hole created by any single packet is valid for 30s, we obtain a birthday paradox match with 300k UDP packets! If both sides have similar parameters we get 300k x 300k pairings and you only need 1 match. Right?? 🔢
  • "behind an EDM-based NAT to achieve an almost instant collision with 99.9% probability.", puncture
  • "TABLE II: Carriers that succeeded in transmitting a package to another carrier" 👍 🎊 👍
  • "FIG. 9: The sims used for testing" great 12-SIM transport box with X-Ray shielding. Please maximum size and also Figure 8 {make full page width} readability. Move positioning much earlier, not references, but before even presenting results your show the setup.
  • "11 years after the launch of IPv6", check your dates. RFC 1883, Proposed Standard
  • Add IPv8 outdated IETF draft Internet Standard, https://datatracker.ietf.org/doc/html/draft-pouwelse-trustchain-01#section-4
  • Survey needs full bib info: "I. Livadariu, K. Benson, A. Elmokashfi, A. Dhamdhere, and A. Dainotti, “Inferring carrier-grade nat deployment in the wild,” 2018."
  • mention usage of fixed ICMP echo request packets. They can test your connection? mention reply trick: https://samy.pl/pwnat/

@OrestisKan
Copy link

Literature_Survey (1).pdf
Latest (hopefully final) version with all the suggestions for improvements that you requested

@OrestisKan
Copy link

OrestisKan commented Sep 20, 2023

@synctext birthday attack between phone running on Vodaphone5g and emulator running in eduroam wifi worked and they managed to connect, still needs optimizations cause its heavy etc but at least we know it works! More details in my Slack message

whats left to do:

  • Make it even more lightweight
  • Slowdown the request sending so it doesnt get flagged as an attack potentially and make it send reauests in slightly random intervals?
  • Dynamic "reset" simulations by figuring out the Nat model
  • Different behaviour is easy to hard for speedup since not always a birthday attack is needed hence one should figure out their nat type and if easy then try fixed ports and then enumeration of ports. If one is hard then birthday attack
  • Gather network diagnostics?
  • unit tests
  • cleanup code

@OrestisKan
Copy link

OrestisKan commented Oct 3, 2023

@synctext
Copy link
Member Author

synctext commented Oct 3, 2023

Solid progress! Survey completed, now ready for Arxiv submission.
Thesis brainstorm: link the TensorFlow Light which Quinten van Es got operational to birthday attack. get healthy IPv8 overlay. focus on binary transfer for "decentralised Artificial Intelligence". Fix the "information diffusion problem". measure UDP bandwidth throughput. EVA protocol also: this whole issue warning bad code 😷 Determine bottleneck. Improve. Write thesis DONE!!!

Improve activity grid principle of status of each of the 25 connected IPv8 peers.
image

Related IPFS work: https://github.com/plprobelab/network-measurements/blob/master/results/rfm15-nat-hole-punching.md
The measurement was designed to provide insights into when and why the DCUtR protocol fails in NAT hole punching and to provide recommendations for improvement. In total, we tracked 6.25M hole punches from 212 clients (API keys). The clients were deployed in 39 different countries and hole punched remote peers in 167 different countries. Our top findings were that: libp2p’s hole punching success rate is around 70%.
https://research.protocol.ai/publications/decentralized-hole-punching/

@OrestisKan
Copy link

OrestisKan commented Oct 17, 2023

THESIS TITLE (draft): First 5G deployment of Distributed Artificial Intelligence

IEEE_Conference_Template.pdf

Measure: UDP bandwidth, bottlenecks, timeouts on Android client and NATs, connection reset time and port association time, all possible conditions that make successful communication possible and complete understanding of all possible factors that cause a communication failure. Determine if there is an upper bound to the number of concurrent IPs that a device can talk to(e.g. 63 works and adding a 64th may break the least recently used).

Reliable data transfer: compare UDP and EVA protocol in terms of effective throughput, packet loss, congestion

Measure the exact NAT behaviour!

Measure NAT hole opening time!

I have operational 10 or 12 sim cards. I have two phones, hence I can use 2 sim cards at the time

@synctext
Copy link
Member Author

synctext commented Oct 17, 2023

update "This is brute forcing the public IP"{+port}, nice and sharp description somebody from Canada gave your work.

@OrestisKan
Copy link

OrestisKan commented Nov 8, 2023

SURVEY to be announced by Arxiv tomorrow
I added tests for:

  • Udp bandwidth measurement
  • NAT reset time

TODO:

  • Integrate Birthday attack & measurements into IPv8
  • Add the rest of the measurement tests mentioned on the comment above
  • Understand IPv8 codebase
  • Create a documentation explaining some code overview of IPv8 as a guide through the codebase. This : https://github.com/Tribler/kotlin-ipv8/tree/master/doc only contains tutorials on how to use the API but no explanations on the engine
  • Publish my codebase

Goal by Christmas:

  1. Quantify all measurements for the simcards
  2. Integrate the Birthday Attack in Ipv8

@OrestisKan
Copy link

OrestisKan commented Nov 9, 2023

Lit Survey is published: https://arxiv.org/abs/2311.04658

Edited to fix the broken reference link

@OrestisKan
Copy link

OrestisKan commented Nov 16, 2023

I HAVE CODE FOR:

Measuring:

  • UDP bandwidth

  • bottlenecks

  • Measure Roundtime of packets using "ping-pong"

  • timeouts on Android clients and NATs

  • connection reset time and port association time

  • all possible conditions that make successful communication possible

  • Complete understanding of all possible factors that cause a communication failure.

  • Determine if there is an upper bound to the number of concurrent IPs that a device can talk to(e.g. 63 works and adding a 64th may break the least recently used).

  • Reliable data transfer: compare UDP and EVA protocol in terms of effective throughput, packet loss, congestion

  • Measure the exact NAT behaviour!

  • Measure NAT hole opening time!

@OrestisKan
Copy link

  • gather data from other providers + MTN
  • Analyze nd model the data and behaviour of each NAT
  • Write problem description and methodology

@synctext
Copy link
Member Author

synctext commented Feb 26, 2024

  • in Paris doing 4G measurements
  • Orange (no pre-paid, ID-check mandatory, 15min chat required in French), SFR france SIM cards for starting. 30 Euro.
  • Buy lebara again in France, because it could be different hardware or Internet settings. Goal is to get beyond 50 SIM-cards in 1 thesis photo, Figure-1.
  • Buy Bouygues, O2, Free Mobile, Nomad, Traveltomtom?
  • Experiment with e-SIM? You order an e-sim card for France on the internet, you receive a QR code, scan it, follow the simple steps and within less than 2 minutes you have a France e-sim card installed on your phone.
  • Please phase and plan your research. No generic tool yet. no QUIC or uTP binary transfer yet. But invest in a good measurement script to deeply understand the port behaviour ❗ How predictable and repeating is the port selection? Mapping behaviour is completely known? Port ranges? timeouts? MTU: measure it please! Jumbo frames???
  • Current measurement script
    • uses the server in Delft with lots of ports
    • gathering the mappings of the 4G NATs
    • timeout measurements between 2 phones. increasing time-out, waiting for packet drop. Fragile measurement methodology: timeout==end-of-measurement.
  • Sprint goal: Belgium and Norway Oslo next. process mapping. If time is left: get this uTP lib going between 2 servers (and phones) starting writing down mapping results.

@synctext
Copy link
Member Author

  • 2009 Natcracker: Nat combinations matter references birthday paradox, great work by KTH. Claims 27 unique NAT types exists?? 🤔
  • 2005 NATBLASTER: Establishing TCP connections between hosts behind NATs
  • 2024 NATexploder: decentralised federated learning with NAT puncturing 😅
    • ChatGTP 3.5 prompt: Cool existing NAT puncturing algorithms are natcracker and natblaster. please give me another cool name?
    • "NATgrenade", "NATmaverick" more AI generated names
    • "How about "NatForge"? It conveys the idea of crafting or forging a pathway through network address translation (NAT) barriers. It also implies strength and resilience, suggesting that the algorithm can efficiently navigate through NAT configurations."

@OrestisKan
Copy link

OrestisKan commented Mar 18, 2024

Updates 18/03

  • Timeout is server based, tests both time timeout between sends and timeout of response
  • Added MTU calculation and jumbo frames check
  • added function to see amount of simultaneous mappings the NAT can hold
  • Added feature to gather mappings without closing the ports since some data showed that when the socket closes the mapping of the nat disappears (to early to analyse results)
  • currently gathering Belgium data
  • End of the week will gather Norway data

Roaming update: There are simcards that when you level home nothing changes because you tunnel home (virtually nothing changes) Lyca NL, Lebara NL , MTN CY, and Lebara FR are tested to change the IP while roaming

Check if while roaming it behaves the same as the partner (open research question)

Server right now:

  • Listens to all sockets 65.5k sockets concurrently.

  • Supports timeout tests, MTU tests, concurrent connections tests and mapping test

  • Create a Probability Desnity function on the Port Mappings.

  • Create state transition function for when the "buckets" shift (stochastic function)

@synctext
Copy link
Member Author

synctext commented Mar 18, 2024

  • Blockchain Engineering master students are doing binary transfer one team and other advanced team {repo}
  • Weird story about the UDP.close() triggering a destructing of port mappings 💩
    • no birthday paradox if you close ports
    • clear protocol layering and decoupling violation
    • my advise: invest maximum 2h to wireshark this phenomenon
  • current 65k open socket server strategy is random
    • starts transmission from random socket
    • all other packets are also random
    • Randomness is essence of birthday collision. If other side has non-randomness, don't do a trivial linear sweep.
    • the reply strategy: off by default (only for time-out measurements)
    • future feature???: reply with 1 UDP response to measure random packet loss
    • record when "connected" if the IPv4 address changes of carrier-grade NAT. (test with 4G.reboot()??)
  • This sprint Probability density function of multiple SIM cards posted on this issue please. As you reported until some unknown condition is met. Try to find out what is happening or what are the state transition probabilities. and first thesis writing: why, what and how you are measuring.

update idea to use more external IPv4 addresses on your server. That means expanding your testing infrastructure with probing from multiple addresses. Can you start measuring for a while from 1 address and predict what the other address will see as port mapping? {hope this is understandable}.
If any stranger on the Internet can help you predict you port mapping (or not) you've made scientific progress. Both positive and negative outcome is progress and thesis material. thnx The following 5 IPs are assigned to your server: YYY.ZZZ.119.XXX :

@OrestisKan
Copy link

OrestisKan commented Apr 5, 2024

Currently gathered Belgian and Norwegian data for this week and fixed the bugs in the server that was causing it to crash. Updated the Paper with some changes on the measurements used and data gathered.

I believe right now there are good enough number of sim cards in my possession and I'll focus on analyzing the results of this sprint.

Todo:

  • Create a Probability Desnity function on the Port Mappings.
  • Create state transition function for when the "buckets" shift (stochastic function)
  • Make use of the multiple IP address! Measure for a while from one address and then try to see if I can predict what the other address ill see as a port mapping

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf

Planning to charter a private Piper Aircraft soon to do a sim card run in another EU member state

@synctext
Copy link
Member Author

synctext commented Apr 5, 2024

  • Your thesis does not mention why symmetric NATs exist
  • Good start in Problem Description of "we want distributed machine learning, hence we do ultra low-level networking stuff". You need a more structured storyline:
    • strong focused opening like: "We conducted successful birthday attacks on the Vodafone 5G network to enable pure peer-to-peer based decentralised learning. Our motivation is to empower users to freely use generative AI, without Big Tech, clouds, or servers in general."
    • We aim to give users democratic strategic autonomy
    • Machine learning is essential, but without cloud please. No privacy loss please.
    • We can't yet make heavy machine learning based apps such as TikTok or Youtube with user privacy, autonomy, and essentially cloud-free.
    • Hence we need freedom to communication in 4G/5G, form a cloud-free phone-to-phone overlay, and decentralised federated machine learning.
    • "5G overlay infrastructure for decentralised learning" 💥 😮 👏
    • So no 4 types of NATs according to Huawei in the first pages!
  • How could we use a blog post on a high-traffic nerd website for your thesis?
    • point to your cool stuff (requirement: thesis draft .PDF)
    • point to Android .APK on apps store for broad volunteer testing
    • crawl results for thesis graphs
    • make figures && make thesis.pdf && defend && DONE 🏁
    • Actually, this is no small task to deploy, 5G measure, and crawl. Like a secondary thesis project task
  • (thesis writing style) Preliminary results indicate for 16 Sim-Sim card pairs we obtained an average birthday paradox attack successs rate of 10-20% with one outlier of 40% success with 10 tries (e.g. stats caveats). We are investigating the low probability of success and aim to transition from brute force methodology towards a biased attacks. We are currently collecting more data from our 17 procured SIM cards. 1 week planned for integration of decentralised k-means clustering based ML.
  • Script can not yet exploit the multiple server IPv4 addresses for 5G measurements

@OrestisKan
Copy link

OrestisKan commented Apr 21, 2024

Updates last sprint:

  • Lebara
    Uses 256 queues of 256 ports each of incremental sequence numbers. The port number of the first port in each queue is always divisible by 256.

  • KPN
    It is the same as Lebara but with fewer users per NAT; thus, in 32.5% of queue assignments, a user is assigned a full queue. In almost 40% of the cases the initial port number of the queue (port number mod 256 is available).

  • Lyca
    Port numbers are randomly assigned, they follow no order, and there is no particular mapping and they span the whole port number space (after analyzing 0.5M mappings)

  • Vodaphone:
    Seems to be randomly assigned but spans only the first 15k port numbers with some bias towards the middle numbers of the space. Some port numbers are used for multiple connections. Requires more analyzing. Seems to be following a normal distribution

Updated thesis:

Next Sprint:

  • Discuss thesis committee and schedule defence day, since I'll need to book tickets for it + Summer will make it harder to schedule a day that suits everyone/
  • Fix problem description
  • Finish analyzing behaviour of NATs (Vodaphone + other European ones, + figure out the policy of how you are assigned in a queue for Lebara and + KPN)
  • Use the insights from these analyses into a birthday attack and evaluate whether there are improvements and document them

@synctext
Copy link
Member Author

synctext commented Apr 24, 2024

  • great progress with thesis!
  • No "Fix problem description" please, first finish the experiments and thesis section
  • Work towards green light moment please 🍏
  • UK, Romania, Croatia SIM cards plans are ongoing 👏 (16 SIMs operational)
  • e-SIM buy on Internet for final big thesis numbers 🐎
  • 24h prediction cycle. Congestion causes increased prediction difficulty. Empty at night 🌃
  • passport requirement for enrolment!??
  • Apple might cause an explosion of on-device LLM research 😲
  • TABLE II: Nat Types of all the carriers tested and the location of the test
    • add the Norway 6 seconds and 300s Vodafone
    • Puncture difficulty: 👌 to 🔥 🔥 🔥 🔥 🔥 ?
    • try some MOD 256 math expression (like Bubblesort complexity or trust formula)
    • Example from prior Delft thesis work “Universal Trust Machine”, https://arxiv.org/abs/2301.06938: image
  • Student work of IPv8 on Android 3 MByte/sec binary transfer 💥

@OrestisKan
Copy link

Vodafone NL fitted on a beta distribution
vodaphone-betta-distribution

@OrestisKan
Copy link

OrestisKan commented May 22, 2024

Progress update:

shifted focus from buying SIMs to getting the library to work. Library for BirthdayAttack is done and theoretically works on the unit test by testing historical data, and the algorithms for port prediction seem to be an improvement from randomness.

Importing this in an Android app, compiling and running it on the phone stops sending packets around the mark of 29k packets (out of ~250k). NO ERROR, NO CRASHING, stops sending packets (every time a packet is sent, a print message is written).

I bought physical sims from different carriers: 4 from Cyprus, 2 from Romania, and 6 from the UK, bringing the total to 21. Waiting for delivery for sims from Greece, Portugal and possibly Turkey (Turkey is not guaranteed due to extra charges and generally hard to test; maybe I can manage 1)

@synctext
Copy link
Member Author

synctext commented May 22, 2024

  • Not going as fast as desired.
  • Focus on getting sufficient experimental results to claim functional Edge-AI on 5G 👁️
  • EdgeAI-5G: 5G Edge AI with full decentralisation and NAT puncturing
  • reduce scope of thesis, just only buy 2 e-SIMs, just to get a light feel for it.
  • key underlying result: how much effort to connect various EU SIM cards (seconds delay+byte cost)
  • then you can do binary transfer and on-device k-means stuff.
  • hard deadline 30sep thesis.

@OrestisKan
Copy link

OrestisKan commented Jun 8, 2024

@synctext that roaming potentially ruins the birthday attack. Foreign (roaming carriers) seem impenetrable (all of them), which is suspicious. Local carriers (Netherlands before cyprus now) seem to be working fine. Currently looking into it but carriers that were easy to penetrate as soon as one is roaming all of a sudden they are not ( even though roaming IP shows IP of the carriers country). Looking into whether the mapping changes while roaming and how behaviour changes

Update Norwegian carrier Telia's Nat mapping timer falls from 300 seconds in norway to just 2 seconds, MyCall to 17. Belgium's Lyca time to leave is so small that it is not even logged

Testing KPN showed no change in timer but after hours of trying to penetrate with no success makes me believe that there is some differences. Same with Vodaphone NL which was very easy to penetrate and now all of a sudden is impossible to get a success

@synctext
Copy link
Member Author

synctext commented Jun 11, 2024

  • roaming and non-roaming is very distinct! Double the measurement results 💥
  • Maximise automation of measurement
  • last graph with new experimental results was 25 April.
  • Thesis polish examples
    • Table 1; each attack is 170k packets, not mentioned
    • transaction ID undefined in algorithm 6
  • Thesis structure
    • detailed measurement of each carrier
      • time-dependant open mappings! 😱
      • congestion of network around you or connected radio tower
    • also optimised puncture method, like taking the Vodafone port probability into account
    • manual tuned methods
    • new section with complete matrix? and double -but partial- with reaming issue
  • simultaneous socket issue (64k server + 15k Android)
    • try to quantify how hard things are. Example, 2 second timeout will have a puncture-time comparable to when the sun dies?
  • central storyline: highly unstable, unpredictable, random technique. depends on SIMs, depends on time of day, depends on location (roaming).

@OrestisKan
Copy link

Latest paper with new graphs and all.

Improvement of Birthday Attack seems to be working, waiting for Odido results (~Saturday/Sunday) to quantify by how muchit improves /hypothesis test

TODOs:

  • Improve text
  • Fix background section
  • Introduction
  • Conclusion
  • Future Work
  • Abstraact
  • Fill in all tables when all data are done (~2 weeks top)
  • Add cyprus data when I get phones back ?
  • Cleanup libraries to make them production ready

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf

@synctext
Copy link
Member Author

synctext commented Jun 28, 2024

  • Wording of master thesis is still in early rough draft form. Results are nice. No communication lib or AI use-case yet 🧐
  • Gathering 5G data means 4-5 days of work per 5G provider. Analysis is also lots of manual work.
  • anecdotal evidence of the change, no anecdotes in experimental section please
  • Upon attempting to remeasure this from Cyprus, too informal language for master thesis. "roaming behaviour and performance analysis"
  • TABLE IV: Timeouts of various carriers in seconds. On the left side are timeouts for waiting for communication establishment (i.e., the initial packet is sent, but no response has yet been received from the server). On the right side is the communication timeout, i.e., communication is established, but no communication occurs. (UB=Upper Bound, LB= Lower Bound. The twoletter code next to the carrier is the ISO country code.), short, single-line table titles. Explanation goes into text.
  • TALK about the new library etc and analyse the results blabla The evaluation results of 10 runs per carrier using the improved Birthday attack are shown in table, network-aware exploit for connectivity section is still 4 lines and outdated results table 👀
  • F. Improved Birthday Attack Evaluation and Findings, upgrade to section. 5G provider aware NAT-puncturing
  • Two experimental sections: 1) understand these 5G providers 2) improved NAT-puncturing
  • Need for a 3rd experimental section with scientific core (final experimental section, 2 weeks)
    • you show with minimal of work you can do Phone-to-Phone AI swarm intelligence (Scann lib)
    • OR another one is Phone-to-Phone TikTok alternative in our Superapp
    • we have running code for K-Means on phones, but complex code

@OrestisKan
Copy link

OrestisKan commented Jun 28, 2024

By next meeting:

  • Fill in all tables missing data
  • Explain that there's is an improvement in connectivity
  • Improve writing and majke it scientific, more coherent and more explaining

Least risky 3rd exp section is p2p tiktok

@OrestisKan
Copy link

OrestisKan commented Jul 3, 2024

Updated and improved the text of the METHODOLOGY chapter. Made it more clear and explained why each test is useful. Better explained the algorithms and the tests that were to be performed

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf

@OrestisKan
Copy link

First_5G_deployment_of_Distributed_Artificial_Intelligence.pdf

Improved writing in methodology and evaluation, added hypothesis testing to prove increase in connectivity presented the new library, analysed Odido, Orange FR, and SFR added parameters of Epic CY and CytaMobile-VodaFone CY

@synctext
Copy link
Member Author

synctext commented Jul 16, 2024

Taxonomy of NAT boxes and blog posts are not introduction or problem description section material (.5 page). Methdology is "Architecture and design of ..."

Example of intro (page 1 of thesis only + abstract):
This work empowers citizens to take back control of The Internet and AI. AI is expected to improve industries like healthcare, manufacturing and customer service, leading to higher-quality experiences for both workers and customers. The leading AI hardware company could be worth $50 trillion within a decade in the scenario where generative AI leads to an industrial revolution. Companies are making small steps toward "Level 5 intelligence". Companies are expected to profit, leading to further market concentration. Ordinary citizens are not expected to profit from this, increasing inequality. Politicians publicly ponder that: Who will control AI, will control the world. Our work is meticulously designed to counter this.

Our work empowers citizens to take back control of their life. More specifically, we present the self-organising technology stack to take back The Internet and AI. Who owns The Internet? Who controls AI? The Internet is essentially private property, with few exceptions. Big Tech AI is build with copyrighted works [REF]. Google, Facebook, Amazon, Apple, Tencent, and others operate the central components of our daily digital lives. For instance, we require permission from Google and Apple to publish software for mobile devices. Their monopoly power means no other meaningful method exists to reach billions of smartphone users with newly created apps. News and media are dominated by American and China-based AI-driven monopolies.

We introduce a novel type of low-level network overlay and proof-of-principle zero-server AI network. Our zero-server architecture offers various networking primitives. These serve as the basic building blocks for creating full fledged AI alternatives for the services of "trusted" third parties or Big Tech companies.

We crafted a decentralised Tiktok to demonstrate the viability of our work. Our proof-of-principle social media app does not require any servers, avoid using any cloud, bypasses the need for any legal entity, and abstains any centrality in general. Relentless improvements in mobile hardware now enable both generative AI and on-device alternatives for the cloud. Our app builds a fully decentralised media experience by building upon the swarming-based Bittorrent protocol.

Our main contribution is bypassing the carrier-grade NAT hardware inside 5G networks. The depletion of IPv4 addresses and lack of cybersecurity forces 5G network operators to violate Internet protocols.
=== END of INTRO ===

Problem Description
Our central scientific problem is how to devise an AI-overlay network on 5G. We aim to create a new ownership model of generative AI. By creating fully decentralised machine learning on 5G it is possible to make AI which is owned by both nobody and everybody. The challenge is to overcome the restricted communication in 5G networks. These mobile networks are excusively designed to communicate with the cloud. Since 2017 Delft University has successfully expanded their research on decentralised 5G overlay network. See the Android devices participating in a peer-to-peer based network overlay. {INSERT PICTURE PAGE 2}

Architecture an AI overlay network on 5G {1 page or less}
We present the detailed technical work required for creating
{INSERT PICTURE of 2-3 phones running decentral tiktok with successful puncture SIMs}
We have a different design and different approach then the TinyML community is persuing. Our architecture is designed to decentralise generative AI, as demonstrated in our prior AI decentralisation efforts. The ability for autonomous AI agents to communicate using 5G is cardinal.

Extensive measurements of 5G networks
Present results. Present that Table 2.

  • Carrier-grade NATs solve an old unsolved cybersecurity issue: nobody can prevent you to communicate on The Internet. Drain the battery of any smartphone users you dislike. {not mentioned, just IPv4 depletion, phones also have IPv6 often}
  • Algorithms mentioned are not algorithms, just pseudo code with implementation details like create socket and including a pseudo-unique measurement identifier
  • Too trivial to mention, just 10 lines. Algorithm 2 Function to find the connection initiation timeout upper and lower bounds {same for algorithm 3}

@OrestisKan
Copy link

OrestisKan commented Jul 16, 2024

  • Fix algorithms to appeal to math people
  • Fix methodology chapter to be Architecture and implementation etc
  • Create a third experiment with tiktok
  • Write introduction
  • Fix background
  • Combine table II with table V (MTU and Jumbo frames) and table of nat types
  • Add hypothesis test parameters H_0, H_1 an the significance value somewhere in table VII to be self explanatory
  • Finish analysing all providers that I gathered data from

@OrestisKan
Copy link

@synctext
Copy link
Member Author

synctext commented Sep 4, 2024

Draft editing for possible APNIC blog post

5G carrier-grade NAT puncturing for social good

Students from Delft University of Technology have been trying to fix The Internet for 25 years. One specific master thesis we want to highlight is the successful puncturing of a symmetric NAT (Network Address Translator) in a mobile 5G network. It was used to create a decentralised TikTok alternative. Many of our students create operational distributed systems for social good. Our dream is that one day our Internet protocols for identity, trust, money, data, and intelligence will re-decentralise The Internet.

University education sometimes has a gap with the outside world. Education from the Tribler Lab at Delft University specifically aims to close that gap. Within our courses at bsc, msc, and phd level we aim to confront students with real-world complexity. To prepare students for a company job we require that students push running code to Github for a passing grade.

Running code of 5G puncturing.
(ToDo expand into details)
We explain in great detail how master student travelled through Europe to buy 4G/5G SIM cards

Delft students developed operational Internet protocols for identity, trust, money, data, and intelligence. Contributing to social good is the focus of our learn-by-doing educational methodology. In the past 25 years we see a growing amount of political and societal passion in our students. Cynical voices within education are negative about Gen Z. We all turned "fat, lazy, and happy" was said during the opening of the academic year in Eindhoven. Our experience is different, but education needs to evolve as generations differ. Gen Z is exposed to TikTok videos about doomsday glaciers, forever wars, and flooding of coastal cities within their lifetime. We see that many of our students get maximum motivation by being able to contribute to something bigger. At Delft we see that Gen Z demands more than merely giving a single common lab assignment to the yearly population of 500 computer science engineers. Individual projects to fix the world leads to superior learning outcomes. The boomer generation shaped society mostly based on capital. Gen Z now lives on an Internet where a few companies run everything and have become quasi-monopolists in their respective domains, such as search, social networks, or e-commerce. This affects not only the online life of students, but also defines the offline rules, where their gatekeeping companies impacts sectors such as taxi services, dining, and retail.

Complete infographic of AI society.
Trust, pure-phone, SSI, DAO, markets, Euro, LART?
1999 example Aaron?

puncturing of 5G infrastructure of operators
For the past 25 years Delft University of Technology is activly

OK

@OrestisKan
Copy link

@synctext
Copy link
Member Author

synctext commented Sep 9, 2024

More polish is needed; email thesis 10Sep evening.

  • TABLE II: Timeouts of various carriers in seconds. Polish 2x location columns, replace with country name or delete.
  • Same for tunnel. Make complete or remove
  • Polish Evaluation results, shown in Table III, indicate that provideraware NAT puncturing led to successful connections in four additional combinations Actual table is a few pages later, please fix.
  • From the Birthday Paradox calculator [14], one can get a 50% success rate of a match after sending 77162 packets, and for a 99.9% success rate, 243587 packets are needed. This is incomplete! Either add that you're assuming that ports never close again on the NAT device or give a timeout of certain amount of minutes. Second, you're also assuming a certain pool size of Internet IPv4 addresses of which we draw a semi-random number again for both sides.
  • Page 4 says: This will still be attempted based on the Birthday Paradox 99.9% likelihood of success, i.e. a connectivity attempt is comprised of 243587 packets. Page 9 uses this silent definition of "Attempt" without explanation or reference. A name with clarifty would be "Puncture Unit", hard to justify this semi-arbitrary Orestis numbering system.

@OrestisKan
Copy link

@synctext
Copy link
Member Author

synctext commented Sep 18, 2024

ToDo:

  • defense room booking
  • Front page cover, https://github.com/Tribler/tribler/files/12357573/FROSTDAO__Collective_Ownership_of_wealth_using_FROST.pdf
  • Library upload
  • MARE
    • student Final Examination Form
    • [X]
  • Email final thesis to 2 professors;
    • add small comment: "added figure 1" + carrier-aware NAT puncturing clarification on page X
  • invite the Tribler lab to your defence
  • Presentation (share draft on Monday!!!)
    • restricted communication + restricted consumer electronics
    • Problem Description
    • Symmetric NATs Figure 1 picture + picture of SIM cards
    • Puncturing + Birthday attack
    • Experimental setup
    • Highlighted measurement result 1-2 5G networks
    • Highlights of puncturing tool (Table III)
    • Screenshot (or animated .GIF) of DeToks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants
@synctext @OrestisKan and others