Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added sampling to dnscap #15

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open

Added sampling to dnscap #15

wants to merge 13 commits into from

Conversation

chegger
Copy link

@chegger chegger commented Sep 14, 2016

Added sampling to dnscap. With the new -q option, the user is now able to decide whether the output should be sampled as follows: Only every nth query is output and only responses that correspond to one of these sampled queries will be output.
e.g.:
dnscap -r ./file.pcap -g -q 10 will output only every 10th query and these queries' corresponding responses.
Due to the nature of this sampling, the -q option is not compatible with -s r (of which the user would informed by usage()).

Updated dnscap.1.in to match this addition.

Added uthash/uthash.h since I needed a hashtable for the sampling to work. The dnscap and uthash licenses should be compatible.


I've made a very similar pull request at an outdated repository about 16 days ago already:
#10

Now that I' have based my work on the DNS-OARC repo, there are still two issues to talk about:

  • "Making the -q option more versatile"
    According to an unfinished master's thesis, sampling the way I implemented, is as suitable as a random distribution for DNS traffic. Since this thesis is not published yet, I cannot provide a source to it.
    Also, might it not be difficult to produce good random numbers at such high package processing speeds? A way could be using the content of the single packets to produce random numbers from (like the transaction id)?.
    All in all, I'm worried that certain packages could be favored over others.
  • "Using the already existing generic hashtable dnscap/plugins/rssm/hashtable.c"
    Simply put, I think it would mean a lot of unnecessary work that I probably can't do properly, since I don't have a lot of experience yet. This would probably make the hash table rather inefficient.
    I have absolutely no objection to somebody else doing so, though (although uthash seems quite OK to me).

Added a module that allows the output to be sampled. With the new -q option, the user is now able to decide whether the output should be sampled as follows: Only every nth query is output and only responses that correspond to one of these sampled queries will be output.
e.g.:
dnscap -r ./file.pcap -g -q 10 will output only every 10th query and these queries' corresponding responses.
Updated dnscap.1.in to match the Sampling Module addition
uthash is used as the hash table library in the Sampling Module
Added a module that allows the output to be sampled. With the new -q option, the user is now able to decide whether the output should be sampled as follows: Only every nth query is output and only responses that correspond to one of these sampled queries will be output.
e.g.:
dnscap -r ./file.pcap -g -q 10 will output only every 10th query and these queries' corresponding responses.
@jelu
Copy link
Member

jelu commented Sep 15, 2016

You are calling this a module (or plugin) but it is not one, is there a reason you did not implement this under plugins/sampling/ ?

Copy link
Member

@jelu jelu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, please use more spaces in both control sections and argument lists and try to mimic the coding style of the rest of the code.

@@ -53,6 +53,7 @@
.Op Fl E Ar datetime
.Op Fl P Ar plugin.so
.Op Fl U Ar str
.Op Fl q Ar unsigned int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing unsigned int is not very user friendly, use something like nth instead.

@@ -353,6 +354,11 @@ Enable immediate mode on interfaces.
Append "and
.Ar str
" to the pcap filter.
.It Fl q Ar unsigned int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'll change it!

Causes the output to be sampled after the application of all other filters.
Only every nth dns initiation will be output and responses are only output
if they correspond to one of the sampled queries. This option cannot be
used with option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sentence is unclear, specify the options that can't be used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very sorry, this must have been a copy and paste error. The last sentence is only half of the sentence it should be, I'll correct that.

@@ -1099,6 +1130,11 @@ parse_args(int argc, char *argv[]) {
usage("the -L and -l options are mutually exclusive");
if (background && (dumptrace || preso))
usage("the -b option is incompatible with -d and -g");
if (sample && chooseSidesResponse)
{
if (!((dir_wanted & DIR_INITIATE) != 0) && ((dir_wanted & DIR_RESPONSE) != 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check this? If both are true then both options have been used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be incompatible with -s r. I fixed this flaw.
I'm actually sorry, I thought I had tested this..

if (sample == TRUE)
{
ns_msg dnsmsgSample;
ns_initparse(dnspkt,dnslen,&dnsmsgSample);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of libbind and dnscap can be compiled without libbind.

Copy link
Author

@chegger chegger Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean ns_msg and ns_initparse? Is there another way to retrieve the transaction id of a query? I've seen ns_initparse being used in the output functions in dump_dns.c, so I assumed it was safe to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the top #if HAVE_NS_INITPARSE && ..., you will need to disable all the sampling code if libbind is missing or create your own functions for parsing the required information.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'll disable the sampling code if libbind is missing. In future i might try to write my own parsing functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampling code is still compiled even if libbind does not exist, checking if sample is true or not does not remove the dependency of the functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but in parse_args() sample is only set to true if libbind does exist, by checking #if HAVE_NS_INITPARSE && HAVE_NS_PARSERR && HAVE_NS_SPRINTRR . Is this checkback not suitable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do know the difference between if() and #if right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do and I just realized where my mistake in thinking was at. I will work on it right away.

ns_initparse(dnspkt,dnslen,&dnsmsgSample);
samplePacket *currentQuery;
void *ipAddrBufferFrom = &from.u;
void *ipAddrBufferTo = &to.u;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for these void * you can type cast it in one go below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, I initially tried to cast it in one go, but i got a compiler error. Ill try again, thank you!

querycount++;
if(querycount % sampleAmount == 0)
{
currentQuery = malloc(sizeof(*currentQuery));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not check if malloc() was successful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm okay, thank you! I'll add all malloc checks that need to be added.

}
else
{
sample_lookup_key *lookup_key = (sample_lookup_key*)malloc(sizeof(*lookup_key));;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not check if malloc() was successful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhm okay, thank you! I'll add all malloc checks that need to be added.

Copy link
Author

@chegger chegger Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, i should also you calloc instead of malloc, since I initialize the pointers with 0s afterwards anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one calloc() instead of malloc() and memset().

{
currentQuery = malloc(sizeof(*currentQuery));
memset(currentQuery, 0, sizeof(*currentQuery));
currentQuery->from = *fromBuffer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are meant to be in samplePacket->from ?
From the struct definition from is just an unsigned and this would essentially put the first unsigned from the fromBuffer into currentQuery. That would be the first part of iaddr.u and that is only the IPv4 address (or a very small part of it) which means this does not work for IPv6.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samplePacket->from should contain IP addresses (both IPv4 and v6 would be optimal).

On my system unsigned is 32 bits (which is the size of IPv4 adresses), so it should be able to store a full IPv4. I'll change it to uint_32 so it also works on all systems.
As to IPv6, I did not find out how to handle such big numbers properly. I thought it would be better to take a small portion of the IPv6 addresses than not to work with IPv6 at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry you need to find a way to support both v4 and v6 otherwise I will reject the pull request. I do not know how exactly uthash works but maybe you can use the either iaddr which from is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially used from but I came to the conclusion that it doesn't work properly. I'll give it another try though. If I don't get it to work like this, I'll look for another way to support IPv6 addresses.

In case I can't get it to work with IPv6 at all, I'll inform you, so you can reject this pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at plugins/rssm/rssm.c for an example of how to make a hash table with both v4 and v6 addresses ("my_hashtbl").

dnscap_common.h defines 'iaddr' which you should use to store v4/v6 addrs.

rssm.c (and other plugin .c files) have iaddr_hash().

At some point it would make sense to move hashtbl.c and supporting functions out of plugins and into src.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a way that will work perfectly fine. Ill commit it on Monday!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but you have implemented this in a very dangerous way and the use of AMOUNT_32BIT_IN_128BIT is unneeded. You are copying in and out of different structures without any check of structure sizes and seeing that the hash can take binary blobs as keys you should be able to put two iaddr structures within the hash entry for both to and from v4/v6 addresses.

@chegger
Copy link
Author

chegger commented Sep 15, 2016

You are calling this a module (or plugin) but it is not one, is there a reason you did not implement this under plugins/sampling/ ?

You are right, this is not a module or a plugin, that's why it's not implemented in plugins/sampling/. The reason why I call it module is that I lack better terminology. What should it be called? An Addition?

Overall, please use more spaces in both control sections and argument lists and try to mimic the coding style of the rest of the code.

Alright! I was under the false impression that I did an alright job at mimicking the coding style.

By the way, I'm genuinely grateful for your extremely constructive feedback and your time! It's worth a lot!

@jelu
Copy link
Member

jelu commented Sep 15, 2016

Just "sampling" is sufficient.

@jelu
Copy link
Member

jelu commented Sep 15, 2016

I also see that you have created this pull request from your develop branch, please in the future do pull requests from feature branches.

git checkout -b feature_name develop
git commit ...
git push -u origin feature_name

@chegger
Copy link
Author

chegger commented Sep 15, 2016

I also see that you have created this pull request from your develop branch, please in the future do pull requests from feature branches.

Okay, thanks for letting me know! As of now, I haven't even come to set up github locally on my computer yet. I'm still exclusively using the web interface. Takes some time to get into it.

@chegger chegger changed the title Added a Sampling Module to dnscap Added sampling to dnscap Sep 15, 2016
Changed parameter for the q option from "unsigned int" to "nth"
changed the parameter of the q option from "unsigned int" to "nth"
Applied most of the suggested corrections suggested by jelu
Disabled the sampling code if dnscap is not compiled with libbind.
if (!((dir_wanted & DIR_INITIATE) != 0) && ((dir_wanted & DIR_RESPONSE) != 0))
usage("the -q option is incompatible with -s r");
if(((dir_wanted & DIR_RESPONSE) != 0) && ((dir_wanted & DIR_INITIATE) == 0))
usage("the -q option is incompatible with -s r");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second glance I do not see the reason to have chooseSidesResponse, dir_wanted is set to both direction at start and you really only need to check if DIR_RESPONSE is not set.

Corrected chooseSidesResponse logic.
Added a proper way to support ipv6 addresses for sampling.
Fixed the libbind dependency logic for sampling
Updated the support for ipv4/v6 in the sampling code.
Also made some slight changes to the libbind dependency logic.
rephrased a comment from "Sampling Module" to "sampling"
@chegger
Copy link
Author

chegger commented Sep 26, 2016

Well, @jelu what do you think now?

@jelu
Copy link
Member

jelu commented Sep 26, 2016

Sorry @SuperAmbitiousItGuy but I am pressed for time on other issues this week, hope to look at this next week and I won't be including this in v1.1.0 so it will not be merged just yet even if approved.

jelu added a commit that referenced this pull request Oct 11, 2016
Update pcap_thread to v1.0.1, add travis check that dnscap can run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants