Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle functions stored and called from struct fields #23

Open
picatz opened this issue Jan 3, 2024 · 4 comments
Open

Handle functions stored and called from struct fields #23

picatz opened this issue Jan 3, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@picatz
Copy link
Owner

picatz commented Jan 3, 2024

Today, functions that are stored and called from struct fields aren't handled properly when constructing the call graph. Let's look at the following example program where a function called (run) is used in the command struct:

package main

import (
	"database/sql"
	"fmt"
	"net/http"
	"os"
)

func login(db *sql.DB, q string) error {
	_, err := db.Query(q) // want "potential sql injection"
	if err != nil {
		return err
	}
	return nil
}

func startServer() error {
	db, err := sql.Open("sqlite3", ":memory:")
	if err != nil {
		return err
	}

	mux := http.NewServeMux()

	mux.HandleFunc("/login", func(w http.ResponseWriter, r *http.Request) {
		pass, _ := r.URL.User.Password()
		login(db, pass)
	})

	return http.ListenAndServe(":8080", mux)
}

type command struct {
	name string
	run  func(args []string) error
}

type commands []*command

func (c commands) run(args []string) error {
	for _, cmd := range c {
		if cmd.name == args[0] {
			return cmd.run(args[1:])
		}
	}

	return fmt.Errorf("unknown command: %s", args[0])
}

type cli struct {
	commands commands
}

func (c *cli) run(args []string) error {
	return c.commands.run(args)
}

func main() {
	c := &cli{
		commands{
			{
				name: "start-server",
				run: func(args []string) error {
					startServer()
					return nil
				},
			},
		},
	}

	err := c.run(os.Args[1:])
	if err != nil {
		panic(err)
	}
}

The resulting call graph ends up being:

n10:net/http.ListenAndServe

n2:main$1
        → n3:startServer

n3:startServer
        → n7:database/sql.Open
        → n8:net/http.NewServeMux
        → n9:(*net/http.ServeMux).HandleFunc
        → n10:net/http.ListenAndServe

n6:(*database/sql.DB).Query

n5:startServer$1
        → n4:login
        → n11:(*net/url.Userinfo).Password

n7:database/sql.Open

n8:net/http.NewServeMux

n9:(*net/http.ServeMux).HandleFunc
        → n5:startServer$1

n11:(*net/url.Userinfo).Password

n0:main
        → n1:(*cli).run

n1:(*cli).run

n4:login
        → n6:(*database/sql.DB).Query

☝️ Notice how (*database/sql.DB).Query is there, but there's no call path that can be constructed back to main. To illustrate this, we can start at the SQL query, and work our way back:

n6:(*database/sql.DB).Query → n4:login → n5:startServer$1 → n9:(*net/http.ServeMux).HandleFunc → n3:startServer  → n2:main$1

n2:main$1, while it looks like it's n0:main, and that's even almost the case, it's really the anonymous function defined in main (func(args []string) error {...}). It's also just a coincidence of the example, and that "chain" could be broken otherwise; like, if c := &cli{ ... } happened in a global variable instead.

So, we need a way to link n2:main$1 back to n0:main to solve this issue.

@picatz picatz added the bug Something isn't working label Jan 3, 2024
@picatz
Copy link
Owner Author

picatz commented Jan 19, 2024

A visualization of the call graph being constructed, hopefully demonstrating the missing link that needs to be achieved to fix this bug:

untitled (13)

@picatz
Copy link
Owner Author

picatz commented Jan 19, 2024

Some other call graph algorithm visualizations for fun:

data = {
  const csvData = await FileAttachment("[email protected]").csv();
  
  // assuming your data has source and target columns
  const links = csvData.map(d => ({source: d.source_func, target: d.target_func}));
  
  // get unique nodes
  const nodes = Array.from(new Set(links.flatMap(l => [l.source, l.target])), id => ({id}));

  return {nodes, links};
}
ForceGraph(data, {
  nodeId: (d) => d.id, // node identifier, to match links
  nodeTitle: (d) => d.id, // hover text
  width: 4000,
  height: 4000,
  invalidation // stop when the cell is re-run
})

callgraphutil.NewVulncheckCallGraph(ctx, mainFn.Prog, srcFns)

untitled (14)

rta.Analyze([]*ssa.Function{mainFn}, true).CallGraph

untitled (15)

cha.CallGraph(mainFn.Prog)

untitled (17)

static.CallGraph(mainFn.Prog)

untitled (16)

cg := vta.CallGraph(vtaFns, cha.CallGraph(mainFn.Prog))

vtaFns := make(map[*ssa.Function]bool)
for _, fn := range srcFns {
	vtaFns[fn] = true
}
vtaFns[mainFn] = true

untitled (18)

@picatz
Copy link
Owner Author

picatz commented Jan 19, 2024

It's interesting seeing the similar structure in the CHA or RTA construction and the one from callgraphutil being used right now in sqli and other analyzers:

Note

While these structures look very similar, there are some key differences in the construction that aid in later analysis.

For example, looking at rta the "middle" node is actually a (reflect.Value).Call, not an (*net/http.ServerMux).HandleFunc.

callgraphutil.NewGraph(mainFn, buildSSA.SrcFuncs...)

Screenshot 2024-01-18 at 8 24 25 PM

rta.Analyze([]*ssa.Function{mainFn}, true).CallGraph

Screenshot 2024-01-18 at 8 24 34 PM

cha.CallGraph(mainFn.Prog)

Screenshot 2024-01-18 at 8 29 10 PM

@picatz
Copy link
Owner Author

picatz commented Feb 19, 2024

Continuing to revisit this problem, attempting to rephrase it: I'm trying to find instances of function calls that are reachable from main. Functions stored and called from struct fields, especially those called extra indirectly from slice indexing, need to be accounted.


This also begs the question, for the case of SQL injection analysis, if there might be merit in dropping the "reachable from main" requirement for results. If we worked "backwards" in the analysis, starting from a (*database/sql.DB).Query node, we could identify if it was used within the http.Handler, and the related *http.Request object used with it:

n6:(*database/sql.DB).Query → n4:login → n5:startServer$1 → n9:(*net/http.ServeMux).HandleFunc 

Today, we identify paths from the "root" (main), and then find paths to our sinks (like the SQL query):

taint/check.go

Line 70 in ca0546c

sinkPaths := callgraphutil.PathsSearchCallTo(cg.Root, sink)

🤔 If we started at n6, we could work out who directly called it (n4), and continue along until we reach one of our sources. I'm curious if it may also end up being faster / cheaper in the long run, because we don't always need to re-start out searches from main.


But, it might be nice to have both "forward" and "backward" call graph path analysis that are robust to use the "right tool for the job" when possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant