diff --git a/README.md b/README.md
index cbc1dce..a6c45f7 100755
--- a/README.md
+++ b/README.md
@@ -3,113 +3,35 @@ CIS565: Project 2: CUDA Pathtracer
 -------------------------------------------------------------------------------
 Fall 2012
 -------------------------------------------------------------------------------
-Due Friday, 10/12/2012
+Due Sunday, 10/12/2012
 -------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------
-NOTE:
--------------------------------------------------------------------------------
-This project requires an NVIDIA graphics card with CUDA capability! Any card after the Geforce 8xxx series will work. If you do not have an NVIDIA graphics card in the machine you are working on, feel free to use any machine in the SIG Lab or in Moore100 labs. All machines in the SIG Lab and Moore100 are equipped with CUDA capable NVIDIA graphics cards. If this too proves to be a problem, please contact Patrick or Karl as soon as possible.
-
--------------------------------------------------------------------------------
-INTRODUCTION:
--------------------------------------------------------------------------------
-In this project, you will extend your raytracer from Project 1 into a full CUDA based global illumination pathtracer. 
-
-For this project, you may either choose to continue working off of your codebase from Project 1, or you may choose to use the included basecode in this repository. The basecode for Project 2 is the same as the basecode for Project 1, but with some missing components you will need filled in, such as the intersection testing and camera raycasting methods. 
-
-How you choose to extend your raytracer into a pathtracer is a fairly open-ended problem; the supplied basecode is meant to serve as one possible set of guidelines for doing so, but you may choose any approach you want in your actual implementation, including completely scrapping the provided basecode in favor of your own from-scratch solution.
+BLOG Link: http://seunghoon-cis565.blogspot.com/2012/10/project-2-cuda-pathtracer.html
 
 -------------------------------------------------------------------------------
-CONTENTS:
+A brief description
 -------------------------------------------------------------------------------
-The Project2 root directory contains the following subdirectories:
-	
-* src/ contains the source code for the project. Both the Windows Visual Studio solution and the OSX makefile reference this folder for all source; the base source code compiles on OSX and Windows without modification.
-* scenes/ contains an example scene description file.
-* renders/ contains two example renders: the raytraced render from Project 1 (GI_no.bmp), and the same scene rendered with global illumination (GI_yes.bmp). 
-* PROJ1_WIN/ contains a Windows Visual Studio 2010 project and all dependencies needed for building and running on Windows 7.
-* PROJ1_OSX/ contains a OSX makefile, run script, and all dependencies needed for building and running on Mac OSX 10.8. 
-
-The Windows and OSX versions of the project build and run exactly the same way as in Project0 and Project1.
+The goal of this project is to implement a simple PathTracing algorithm by using CUDA.
 
 -------------------------------------------------------------------------------
-REQUIREMENTS:
+Features
 -------------------------------------------------------------------------------
-In this project, you are given code for:
-
-* All of the basecode from Project 1, plus:
-* Intersection testing code for spheres and cubes
-* Code for raycasting from the camera
-
-You will need to implement the following features. A number of these required features you may have already implemented in Project 1. If you have, you are ahead of the curve and have less work to do! 
-
-* Full global illumination (including soft shadows, color bleeding, etc.) by pathtracing rays through the scene. 
+- Basic
+* Full global illumination (including soft shadows, color bleeding, etc.)
 * Properly accumulating emittance and colors to generate a final image
 * Supersampled antialiasing
-* Parallelization by ray instead of by pixel via string compaction (see the Physically-based shading and pathtracing lecture slides from 09/24 if you don't know what this refers to)
+* Parallelization by ray instead of by pixel via stream compaction
 * Perfect specular reflection
+  
 
-You are also required to implement at least two of the following features. Some of these features you may have already implemented in Project 1. If you have, you may NOT resubmit those features and instead must pick two new ones to implement.
 
-* Additional BRDF models, such as Cook-Torrance, Ward, etc. Each BRDF model may count as a separate feature. 
-* Texture mapping 
-* Bump mapping
-* Translational motion blur
-* Fresnel-based Refraction, i.e. glass
-* OBJ Mesh loading and rendering without KD-Tree
-* Interactive camera
-* Integrate an existing stackless KD-Tree library, such as CUKD (https://github.com/unvirtual/cukd)
+- Addtional
+* Interactive camera via keyboard and mouse
 * Depth of field
 
-Alternatively, implementing just one of the following features can satisfy the "pick two" feature requirement, since these are correspondingly more difficult problems:
-
-* Physically based subsurface scattering and transmission
-* Implement and integrate your own stackless KD-Tree from scratch. 
-* Displacement mapping
-* Deformational motion blur
-
-As yet another alternative, if you have a feature or features you really want to implement that are not on this list, let us know, and we'll probably say yes!
-
 -------------------------------------------------------------------------------
-NOTES ON GLM:
+How to build
 -------------------------------------------------------------------------------
-This project uses GLM, the GL Math library, for linear algebra. You need to know two important points on how GLM is used in this project:
-
-* In this project, indices in GLM vectors (such as vec3, vec4), are accessed via swizzling. So, instead of v[0], v.x is used, and instead of v[1], v.y is used, and so on and so forth.
-* GLM Matrix operations work fine on NVIDIA Fermi cards and later, but pre-Fermi cards do not play nice with GLM matrices. As such, in this project, GLM matrices are replaced with a custom matrix struct, called a cudaMat4, found in cudaMat4.h. A custom function for multiplying glm::vec4s and cudaMat4s is provided as multiplyMV() in intersections.h.
-
--------------------------------------------------------------------------------
-BLOG
--------------------------------------------------------------------------------
-As mentioned in class, all students should have student blogs detailing progress on projects. If you already have a blog, you can use it; otherwise, please create a blog using www.blogger.com or any other tool, such as www.wordpress.org. Blog posts on your project are due on the SAME DAY as the project, and should include:
-
-* A brief description of the project and the specific features you implemented.
-* A link to your github repo if the code is open source.
-* At least one screenshot of your project running.
-* A 30 second or longer video of your project running.  To create the video use http://www.microsoft.com/expression/products/Encoder4_Overview.aspx 
-
--------------------------------------------------------------------------------
-THIRD PARTY CODE POLICY
--------------------------------------------------------------------------------
-* Use of any third-party code must be approved by asking on Piazza.  If it is approved, all students are welcome to use it.  Generally, we approve use of third-party code that is not a core part of the project.  For example, for the ray tracer, we would approve using a third-party library for loading models, but would not approve copying and pasting a CUDA function for doing refraction.
-* Third-party code must be credited in README.md.
-* Using third-party code without its approval, including using another student's code, is an academic integrity violation, and will result in you receiving an F for the semester.
-
--------------------------------------------------------------------------------
-SELF-GRADING
--------------------------------------------------------------------------------
-* On the submission date, email your grade, on a scale of 0 to 100, to Karl, yiningli@seas.upenn.edu, with a one paragraph explanation.  Be concise and realistic.  Recall that we reserve 30 points as a sanity check to adjust your grade.  Your actual grade will be (0.7 * your grade) + (0.3 * our grade).  We hope to only use this in extreme cases when your grade does not realistically reflect your work - it is either too high or too low.  In most cases, we plan to give you the exact grade you suggest.
-* Projects are not weighted evenly, e.g., Project 0 doesn't count as much as the path tracer.  We will determine the weighting at the end of the semester based on the size of each project.
-
--------------------------------------------------------------------------------
-SUBMISSION
--------------------------------------------------------------------------------
-As with the previous project, you should fork this project and work inside of your fork. Upon completion, commit your finished project back to your fork, and make a pull request to the master repository.
-You should include a README.md file in the root directory detailing the following
-
-* A brief description of the project and specific features you implemented
-* At least one screenshot of your project running, and at least one screenshot of the final rendered output of your pathtracer
-* Instructions for building and running your project if they differ from the base code
-* A link to your blog post detailing the project
-* A list of all third-party code used
\ No newline at end of file
+I developed this project on Visual Studio 2010.
+Its solution file is located in "PROJ1_WIN/565Raytracer.sln".
+You should be able to build it without modification.
\ No newline at end of file
diff --git a/ScreenCapture_10-15-2012 2.58.42 PM.wmv b/ScreenCapture_10-15-2012 2.58.42 PM.wmv
new file mode 100755
index 0000000..35f6887
Binary files /dev/null and b/ScreenCapture_10-15-2012 2.58.42 PM.wmv differ
diff --git a/renders/GI_no.bmp b/renders/GI_no.bmp
deleted file mode 100755
index 72c61e3..0000000
Binary files a/renders/GI_no.bmp and /dev/null differ
diff --git a/renders/GI_yes.bmp b/renders/GI_yes.bmp
deleted file mode 100644
index bc87343..0000000
Binary files a/renders/GI_yes.bmp and /dev/null differ
diff --git a/scenes/sampleScene.txt b/scenes/sampleScene.txt
index 936135b..871330f 100755
--- a/scenes/sampleScene.txt
+++ b/scenes/sampleScene.txt
@@ -1,7 +1,7 @@
 MATERIAL 0				//white diffuse
 RGB         1 1 1       
 SPECEX      0      
-SPECRGB     1 1 1      
+SPECRGB     0 0 0      
 REFL        0       
 REFR        0        
 REFRIOR     0       
@@ -38,7 +38,7 @@ MATERIAL 3 				//red glossy
 RGB         .63 .06 .04      
 SPECEX      0      
 SPECRGB     1 1 1       
-REFL        0       
+REFL        0.2       
 REFR        0        
 REFRIOR     2       
 SCATTER     0        
@@ -47,10 +47,10 @@ RSCTCOEFF   0
 EMITTANCE   0
 
 MATERIAL 4 				//white glossy
-RGB         1 1 1     
+RGB         1 1 1 
 SPECEX      0      
-SPECRGB     1 1 1      
-REFL        0       
+SPECRGB     1 1 1       
+REFL        0.2       
 REFR        0        
 REFRIOR     2      
 SCATTER     0        
@@ -74,7 +74,7 @@ MATERIAL 6 				//green glossy
 RGB         .15 .48 .09      
 SPECEX      0      
 SPECRGB     1 1 1     
-REFL        0       
+REFL        0.2       
 REFR        0        
 REFRIOR     2.6       
 SCATTER     0        
@@ -106,6 +106,18 @@ ABSCOEFF    0 0 0
 RSCTCOEFF   0
 EMITTANCE   15
 
+MATERIAL 9				// perfect specular(mirror)
+RGB         1 1 1        
+SPECEX      0      
+SPECRGB     1 1 1       
+REFL        1       
+REFR        0        
+REFRIOR     0       
+SCATTER     0        
+ABSCOEFF    0 0 0      
+RSCTCOEFF   0
+EMITTANCE   0
+
 CAMERA
 RES         800 800
 FOVY        25
@@ -182,7 +194,7 @@ SCALE       .01 10 10
 
 OBJECT 5
 sphere
-material 4
+material 9
 frame 0
 TRANS       0 2 0
 ROTAT       0 180 0
@@ -226,4 +238,28 @@ SCALE       .3 3 3
 frame 1
 TRANS       0 10 0
 ROTAT       0 0 90
-SCALE       .3 3 3
\ No newline at end of file
+SCALE       .3 3 3
+
+OBJECT 9
+cube
+material 9
+frame 0
+TRANS       -4.9 5 0
+ROTAT       0 0 0
+SCALE       .01 5 5
+frame 1
+TRANS       -4.9 5 0
+ROTAT       0 0 0
+SCALE       .01 5 5
+
+OBJECT 10
+sphere
+material 4
+frame 0
+TRANS       -2 2 -2
+ROTAT       0 0 0
+SCALE       2 2 2
+frame 1
+TRANS       -2 5 -2
+ROTAT       0 180 0
+SCALE       3 3 3
diff --git a/src/interactions.h b/src/interactions.h
index e18cfff..0951576 100755
--- a/src/interactions.h
+++ b/src/interactions.h
@@ -7,6 +7,7 @@
 #define INTERACTIONS_H
 
 #include "intersections.h"
+#include "glm/gtx/norm.hpp"
 
 struct Fresnel {
   float reflectionCoefficient;
@@ -44,9 +45,9 @@ __host__ __device__ glm::vec3 calculateTransmissionDirection(glm::vec3 normal, g
 }
 
 //TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
-__host__ __device__ glm::vec3 calculateReflectionDirection(glm::vec3 normal, glm::vec3 incident) {
+__host__ __device__ glm::vec3 calculateReflectionDirection(glm::vec3 normal, glm::vec3 incident) {//TODO: inline?
   //nothing fancy here
-  return glm::vec3(0,0,0);
+	return glm::normalize(incident - 2.0f * normal * glm::dot(normal, incident));
 }
 
 //TODO (OPTIONAL): IMPLEMENT THIS FUNCTION
@@ -90,7 +91,11 @@ __host__ __device__ glm::vec3 calculateRandomDirectionInHemisphere(glm::vec3 nor
 //Now that you know how cosine weighted direction generation works, try implementing non-cosine (uniform) weighted random direction generation. 
 //This should be much easier than if you had to implement calculateRandomDirectionInHemisphere.
 __host__ __device__ glm::vec3 getRandomDirectionInSphere(float xi1, float xi2) {
-  return glm::vec3(0,0,0);
+	// reference: Slide 7 in http://www.cs.sjsu.edu/~teoh/teaching/previous/cs116b_sp08/lectures/lecture16_raytracing.ppt
+	float q = TWO_PI * xi1;
+	float f = acos(2.f*xi2 - 1);
+
+	return glm::normalize(glm::vec3(cos(q)*sin(f), sin(q)*sin(f), cos(f)));
 }
 
 //TODO (PARTIALLY OPTIONAL): IMPLEMENT THIS FUNCTION
@@ -102,5 +107,12 @@ __host__ __device__ int calculateBSDF(ray& r, glm::vec3 intersect, glm::vec3 nor
   return 1;
 };
 
+// diffuse: 0, specular reflection 1
+__host__ __device__ inline int decideDiffOrSpec(float reflectivity, float randNumber)
+{	
+	if (randNumber <= reflectivity) return 1;
+	else return 0;
+}
+
 #endif
-    
\ No newline at end of file
+    
diff --git a/src/intersections.h b/src/intersections.h
index 714e918..2233a0f 100755
--- a/src/intersections.h
+++ b/src/intersections.h
@@ -159,7 +159,7 @@ __host__ __device__  float boxIntersectionTest(glm::vec3 boxMin, glm::vec3 boxMa
 
 
 
-    normal = multiplyMV(box.transform, glm::vec4(currentNormal,0.0));
+	normal = glm::normalize(multiplyMV(box.transform, glm::vec4(currentNormal,0.0)));
     return glm::length(intersectionPoint-ro.origin);
 }
 
@@ -285,4 +285,37 @@ __host__ __device__ glm::vec3 getRandomPointOnSphere(staticGeom sphere, float ra
   return randPoint;
 }
 
-#endif
\ No newline at end of file
+__host__ __device__ int findClosestIntersection(const staticGeom* geoms, int numOfGeoms, const ray& r,
+                                                glm::vec3* closestIntersection, glm::vec3* closestIntersectionNormal,
+                                                float* closestDistance) {
+  // if the front closest geom is found, the index of it is returned
+  // otherwise, returns -1
+  glm::vec3 intersectionPoint, normal;
+  float intersectionDistance;
+  *closestDistance = FLT_MAX;
+  int closestGeomInd = -1;
+
+  for (int i = 0; i < numOfGeoms; i++) { // for each object
+    if (geoms[i].type == SPHERE) {
+      intersectionDistance = sphereIntersectionTest(geoms[i], r, intersectionPoint, normal);
+    } else if (geoms[i].type == CUBE) {
+      intersectionDistance = boxIntersectionTest(geoms[i], r, intersectionPoint, normal);
+    } else { // not-supported object type
+      continue;
+    }
+
+    if (intersectionDistance < EPSILON) { // object is missed
+      continue; 
+    } else if (intersectionDistance < *closestDistance) { // closer is found
+      *closestDistance = intersectionDistance;
+      *closestIntersection = intersectionPoint;
+      *closestIntersectionNormal = normal;
+      closestGeomInd = i;
+    }
+  }	
+
+  return closestGeomInd;
+}
+
+
+#endif
diff --git a/src/main.cpp b/src/main.cpp
index 4e94892..bccd2b5 100755
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -1,3 +1,4 @@
+
 // CIS565 CUDA Raytracer: A parallel raytracer for Patrick Cozzi's CIS565: GPU Computing at the University of Pennsylvania
 // Written by Yining Karl Li, Copyright (c) 2012 University of Pennsylvania
 // This file includes code from:
@@ -6,6 +7,8 @@
 //       Yining Karl Li's TAKUA Render, a massively parallel pathtracing renderer: http://www.yiningkarlli.com
 
 #include "main.h"
+#include "glm/gtx/rotate_vector.hpp"
+
 
 //-------------------------------
 //-------------MAIN--------------
@@ -90,6 +93,8 @@ int main(int argc, char** argv){
   #else
 	  glutDisplayFunc(display);
 	  glutKeyboardFunc(keyboard);
+	  glutSpecialFunc(keyboard_special);
+	  glutMouseFunc(mouse);
 
 	  glutMainLoop();
   #endif
@@ -229,6 +234,74 @@ void runCuda(){
 		}
 	}
 
+	void keyboard_special(int key, int x, int y)
+	{// callback function for glutSpecialFunc
+		switch (key)
+		{
+		case(GLUT_KEY_LEFT):
+			stopCudaAndClearStates();
+			renderCam->positions[targetFrame] -= //TODO: OpenGL texture seems to be flipped along x-axis
+				0.5f * glm::normalize(glm::cross(renderCam->ups[targetFrame], renderCam->views[targetFrame]));			
+			initCuda();
+			break;
+		case(GLUT_KEY_RIGHT):
+			stopCudaAndClearStates();
+			renderCam->positions[targetFrame] -= //TODO: OpenGL texture seems to be flipped along x-axis
+				0.5f * glm::normalize(glm::cross(renderCam->views[targetFrame], renderCam->ups[targetFrame]));			
+			initCuda();
+			break;
+		case(GLUT_KEY_UP):
+			stopCudaAndClearStates();
+			renderCam->positions[targetFrame] += 0.5f*renderCam->views[targetFrame];
+			initCuda();
+			break;
+		case(GLUT_KEY_DOWN):
+			stopCudaAndClearStates();
+			renderCam->positions[targetFrame] -= 0.5f*renderCam->views[targetFrame];
+			initCuda();
+			break;
+		}
+
+		glutPostRedisplay();
+		return;
+	}
+
+	void mouse(int button, int state, int x, int y)
+	{// callback function for mouse
+		switch (button)
+		{
+		case GLUT_LEFT_BUTTON:
+			prev_mouse_x = x;
+			prev_mouse_y = y;
+			glutMotionFunc(motion_left);
+			break;
+		}
+		glutPostRedisplay();
+		return;
+	}
+
+	void motion_left(int x, int y)
+	{// callback function for motion when left button is pressed
+		if (!(x < 0 || x > width || y < 0 || y > height)) {
+			float theta = -(x - prev_mouse_x)*TWO_PI/width; //TODO: OpenGL texture seems to be flipped along x-axis
+			float rho = (y - prev_mouse_y)*TWO_PI/height;
+
+			stopCudaAndClearStates();
+
+			renderCam->views[targetFrame] = glm::rotateY(renderCam->views[targetFrame], theta);
+			renderCam->views[targetFrame] = glm::rotateX(renderCam->views[targetFrame], rho);
+			renderCam->views[targetFrame] = glm::normalize(renderCam->views[targetFrame]);
+			
+			renderCam->ups[targetFrame] = glm::rotateY(renderCam->ups[targetFrame], theta);
+			renderCam->ups[targetFrame] = glm::rotateX(renderCam->ups[targetFrame], rho);
+			renderCam->ups[targetFrame] = glm::normalize(renderCam->ups[targetFrame]);
+
+			initCuda();
+
+			glutPostRedisplay();
+		}
+		return;
+	}
 #endif
 
 
@@ -396,3 +469,18 @@ void shut_down(int return_code){
   #endif
   exit(return_code);
 }
+
+void stopCudaAndClearStates()
+{
+	cudaDeviceSynchronize();
+	cudaDeviceReset(); 
+
+	//clear image buffer
+	iterations = 0;
+	for (int i=0; i<renderCam->resolution.x*renderCam->resolution.y; i++){
+		renderCam->image[i] = glm::vec3(0,0,0);
+	}
+	if (!singleFrameMode) {
+		targetFrame = 0;
+	}
+}
\ No newline at end of file
diff --git a/src/main.h b/src/main.h
index 55daf50..30a26f4 100755
--- a/src/main.h
+++ b/src/main.h
@@ -78,6 +78,11 @@ void runCuda();
 #else
 	void display();
 	void keyboard(unsigned char key, int x, int y);
+	void keyboard_special(int key, int x, int y);	// callback function for glutSpecialFunc
+	void mouse(int button, int state, int x, int y); // callback function for mouse
+	void motion_left(int x, int y); // callback function for motion when left button is pressed
+	int prev_mouse_x = 0;
+	int prev_mouse_y = 0;
 #endif
 
 //-------------------------------
@@ -104,5 +109,6 @@ void cleanupCuda();
 void deletePBO(GLuint* pbo);
 void deleteTexture(GLuint* tex);
 void shut_down(int return_code);
+void stopCudaAndClearStates();
 
 #endif
\ No newline at end of file
diff --git a/src/raytraceKernel.cu b/src/raytraceKernel.cu
index d473c89..a4118f3 100755
--- a/src/raytraceKernel.cu
+++ b/src/raytraceKernel.cu
@@ -16,6 +16,9 @@
 #include "interactions.h"
 #include <vector>
 #include "glm/glm.hpp"
+#include <thrust/remove.h>
+#include <thrust/device_ptr.h> 
+#include "glm/gtx/vector_access.hpp"
 
 void checkCUDAError(const char *msg) {
   cudaError_t err = cudaGetLastError();
@@ -30,7 +33,7 @@ void checkCUDAError(const char *msg) {
 __host__ __device__ glm::vec3 generateRandomNumberFromThread(glm::vec2 resolution, float time, int x, int y){
   int index = x + (y * resolution.x);
    
-  thrust::default_random_engine rng(hash(index*time));
+  thrust::default_random_engine rng(hash(index)*hash(time));
   thrust::uniform_real_distribution<float> u01(0,1);
 
   return glm::vec3((float) u01(rng), (float) u01(rng), (float) u01(rng));
@@ -41,11 +44,11 @@ __host__ __device__ ray raycastFromCameraKernel(glm::vec2 resolution, float time
    
   int index = x + (y * resolution.x);
    
-  thrust::default_random_engine rng(hash(index*time));
+  thrust::default_random_engine rng(hash(index)*hash(time));
   thrust::uniform_real_distribution<float> u01(0,1);
   
   //standard camera raycast stuff
-  glm::vec3 E = eye;
+  glm::vec3 Eye = eye;
   glm::vec3 C = view;
   glm::vec3 U = up;
   float fovx = fov.x;
@@ -55,18 +58,37 @@ __host__ __device__ ray raycastFromCameraKernel(glm::vec2 resolution, float time
   
   glm::vec3 A = glm::cross(C, U);
   glm::vec3 B = glm::cross(A, C);
-  glm::vec3 M = E+C;
+  glm::vec3 M = Eye+C;
   glm::vec3 H = (A*float(CD*tan(fovx*(PI/180))))/float(glm::length(A));
   glm::vec3 V = (B*float(CD*tan(-fovy*(PI/180))))/float(glm::length(B));
+
+  // generate a small shake to acheive Anti-Aliasing
+  float dx = (float)u01(rng) - 0.5f;
+  float dy = (float)u01(rng) - 0.5f;
   
-  float sx = (x)/(resolution.x-1);
-  float sy = (y)/(resolution.y-1);
-  
+  float sx = (x+dx)/(resolution.x-1);
+  float sy = (y+dy)/(resolution.y-1);
+   
   glm::vec3 P = M + (((2*sx)-1)*H) + (((2*sy)-1)*V);
-  glm::vec3 PmE = P-E;
-  glm::vec3 R = E + (float(200)*(PmE))/float(glm::length(PmE));
+   P = Eye + glm::normalize((P - Eye))*5.f;
+
+#ifdef RAYTRACEKERNEL_DEPTH_OF_FIELD 
+  float lenseRadius = 0.05f;
+  float randomAngle = (float)u01(rng) * TWO_PI;
+
+  float randomDistance = lenseRadius * (float)u01(rng);
   
-  glm::vec3 direction = glm::normalize(R);
+  float cam_dx = randomDistance * cos(randomAngle);
+  float cam_dy = randomDistance * sin(randomAngle);
+
+  Eye += cam_dx*glm::normalize(H) + cam_dy*glm::normalize(V);	
+#endif
+  
+  glm::vec3 PmE = P-Eye;
+  //glm::vec3 R = Eye + (float(200)*(PmE))/float(glm::length(PmE));
+ // glm::vec3 R = Eye + PmE/glm::length(PmE);
+  
+  glm::vec3 direction = glm::normalize(PmE);
   //major performance cliff at this point, TODO: find out why!
   ray r;
   r.origin = eye;
@@ -85,18 +107,19 @@ __global__ void clearImage(glm::vec2 resolution, glm::vec3* image){
 }
 
 //Kernel that writes the image to the OpenGL PBO directly. 
-__global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3* image){
+__global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3* image, int numOfIters){
   
   int x = (blockIdx.x * blockDim.x) + threadIdx.x;
   int y = (blockIdx.y * blockDim.y) + threadIdx.y;
   int index = x + (y * resolution.x);
   
   if(x<=resolution.x && y<=resolution.y){
+	  float oneOverIters = 1.f/(float)numOfIters;
 
       glm::vec3 color;      
-      color.x = image[index].x*255.0;
-      color.y = image[index].y*255.0;
-      color.z = image[index].z*255.0;
+      color.x = image[index].x*oneOverIters*255.0;
+      color.y = image[index].y*oneOverIters*255.0;
+      color.z = image[index].z*oneOverIters*255.0;
 
       if(color.x>255){
         color.x = 255;
@@ -120,42 +143,58 @@ __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3*
 
 //TODO: IMPLEMENT THIS FUNCTION
 //Core raytracer kernel
-__global__ void raytraceRay(glm::vec2 resolution, float time, cameraData cam, int rayDepth, glm::vec3* colors, 
-                            staticGeom* geoms, int numberOfGeoms, material* materials, int numberOfMaterials){
+__global__ void raytraceRay(float time,
+                            int rayDepth,
+                            glm::vec3* colors, 
+                            const staticGeom* geoms, int numberOfGeoms,
+                            const material* materials, int numberOfMaterials,
+                            PixelRay* pixelRays, int numOfRays,
+                            glm::vec3* acc_refl_diff_colors)
+{
+	const int index = blockDim.x*blockIdx.x + threadIdx.x;
+	PixelRay* const pixRay = &pixelRays[index];
 
-  int x = (blockIdx.x * blockDim.x) + threadIdx.x;
-  int y = (blockIdx.y * blockDim.y) + threadIdx.y;
-  int index = x + (y * resolution.x);
+	if (index < numOfRays) {
+		const int pixID = pixRay->pixelID;
+		glm::vec3 intersectionPoint, normal;
+		float intersectionDistance;
+		int intersectionGeomInd = findClosestIntersection(geoms, numberOfGeoms, pixRay->r,
+			&intersectionPoint, &normal, &intersectionDistance); 
 
-  ray r = raycastFromCameraKernel(resolution, time, x, y, cam.position, cam.view, cam.up, cam.fov);
-
-  if((x<=resolution.x && y<=resolution.y)){
-
-    float MAX_DEPTH = 100000000000000000;
-    float depth = MAX_DEPTH;
-
-    for(int i=0; i<numberOfGeoms; i++){
-        glm::vec3 intersectionPoint;
-        glm::vec3 intersectionNormal;
-       if(geoms[i].type==SPHERE){
-           depth = sphereIntersectionTest(geoms[i], r, intersectionPoint, intersectionNormal);
-        }else if(geoms[i].type==CUBE){
-            depth = boxIntersectionTest(geoms[i], r, intersectionPoint, intersectionNormal);
-        }else if(geoms[i].type==MESH){
-            //triangle tests go here
-        }else{
-            //lol?
-        }
-        if(depth<MAX_DEPTH && depth>-EPSILON){
-          MAX_DEPTH = depth;
-          colors[index] = materials[geoms[i].materialid].color;
-        }
-    }
+		if (intersectionGeomInd == -1) { // no hit!
+			// terminate this ray
+			pixRay->pixelID = -1;
+			return;
+		}
+
+		const material objectMaterial = materials[geoms[intersectionGeomInd].materialid];
+
+		if (objectMaterial.emittance > 0.f) { // light source
+			colors[pixID] += acc_refl_diff_colors[pixID] * objectMaterial.emittance * objectMaterial.color;
+			pixRay->pixelID = -1;
+			return;
+		}    
+		// diffuse materials
+		thrust::default_random_engine rng(hash(pixID)*hash(time)*hash(rayDepth));
+		thrust::uniform_real_distribution<float> u01(0,1);   
+		int diffOrSpec = decideDiffOrSpec(objectMaterial.hasReflective, (float)u01(rng));
 
+		if (diffOrSpec == 1) { // 1. sepcular reflection
+			pixRay->r.direction = glm::normalize(calculateReflectionDirection(normal, pixRay->r.direction));
+			pixRay->r.origin = intersectionPoint + RAY_BIAS_AMOUNT*pixRay->r.direction;
+			return;
+		}
 
+		// 2. diffuse reflection
+		//colors[pixID] += acc_refl_diff_colors[pixID] * objectMaterial.emittance;
+		acc_refl_diff_colors[pixID] *= objectMaterial.color;
 
-    //colors[index] = generateRandomNumberFromThread(resolution, time, x, y);
-   }
+		pixRay->r.direction = glm::normalize(calculateRandomDirectionInHemisphere(normal,
+			(float)u01(rng),
+			(float)u01(rng)));
+		pixRay->r.origin = intersectionPoint + RAY_BIAS_AMOUNT*pixRay->r.direction;
+		return;
+	}
 }
 
 
@@ -163,18 +202,18 @@ __global__ void raytraceRay(glm::vec2 resolution, float time, cameraData cam, in
 // Wrapper for the __global__ call that sets up the kernel calls and does a ton of memory management
 void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms){
   
-  int traceDepth = 1; //determines how many bounces the raytracer traces
+   const int numOfPixels = (int)renderCam->resolution.x * (int)renderCam->resolution.y;
 
   // set up crucial magic
   int tileSize = 8;
   dim3 threadsPerBlock(tileSize, tileSize);
   dim3 fullBlocksPerGrid((int)ceil(float(renderCam->resolution.x)/float(tileSize)), (int)ceil(float(renderCam->resolution.y)/float(tileSize)));
-  
+
   //send image to GPU
   glm::vec3* cudaimage = NULL;
-  cudaMalloc((void**)&cudaimage, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3));
-  cudaMemcpy( cudaimage, renderCam->image, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3), cudaMemcpyHostToDevice);
-  
+  cudaMalloc((void**)&cudaimage, numOfPixels*sizeof(glm::vec3));
+  cudaMemcpy( cudaimage, renderCam->image, numOfPixels*sizeof(glm::vec3), cudaMemcpyHostToDevice);
+
   //package geometry and materials and sent to GPU
   staticGeom* geomList = new staticGeom[numberOfGeoms];
   for(int i=0; i<numberOfGeoms; i++){
@@ -205,19 +244,56 @@ void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iteratio
   cam.up = renderCam->ups[frame];
   cam.fov = renderCam->fov;
 
-  //kernel launches
-  raytraceRay<<<fullBlocksPerGrid, threadsPerBlock>>>(renderCam->resolution, (float)iterations, cam, traceDepth, cudaimage, cudageoms, numberOfGeoms, cudamaterials, 
-                                                                              numberOfMaterials);
+  // prepare some data structures for each pixel
+  // 1. a random PixelRay, 2. an accumulated reflective and diffuse colors
+  cameraData* cuda_cam = NULL;
+  cudaMalloc((void**)&cuda_cam, sizeof(cameraData));
+  cudaMemcpy(cuda_cam, &cam, sizeof(cameraData), cudaMemcpyHostToDevice);
 
-  sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, renderCam->resolution, cudaimage);
+  PixelRay* cuda_pixelRays = NULL;
+  cudaMalloc((void**)&cuda_pixelRays, numOfPixels*sizeof(PixelRay));
+
+  glm::vec3* cuda_acc_refl_diff_colors = NULL;  
+  cudaMalloc((void**)&cuda_acc_refl_diff_colors, numOfPixels*sizeof(glm::vec3));
+
+  preparePathTracing<<<fullBlocksPerGrid, threadsPerBlock>>>(cuda_cam, (float)iterations,
+                                                            cuda_pixelRays, cuda_acc_refl_diff_colors);
+  
+  int numOfAliveRays = numOfPixels; 
+  thrust::device_ptr<PixelRay> ray_end_ptr;
+  const int rayThreadsPerBlock = 256; //TODO: play with different numbers
+  for (int i = 0; numOfAliveRays > 0 && i < RAYTRACEKERNEL_RAY_BOUNCE_MAX; i++) {
+    const int rayBlocksPerGrid = (int)ceil((float)(numOfAliveRays+1)/rayThreadsPerBlock);
+    
+    //kernel launches
+    raytraceRay<<<rayBlocksPerGrid, rayThreadsPerBlock>>>((float)iterations,
+														  i,
+                                                          cudaimage,
+                                                          cudageoms, numberOfGeoms,
+                                                          cudamaterials, numberOfMaterials,
+                                                          cuda_pixelRays, numOfAliveRays,
+                                                          cuda_acc_refl_diff_colors);
+	checkCUDAError("2Kernel failed!");
+
+    // perform stream comopaction on PixelRays
+	thrust::device_ptr<PixelRay> cuda_pixelRays_devPtr(cuda_pixelRays);
+    ray_end_ptr =
+        thrust::remove_if(cuda_pixelRays_devPtr, cuda_pixelRays_devPtr + numOfAliveRays, isTerminated());
+    numOfAliveRays = (int)(ray_end_ptr - cuda_pixelRays_devPtr);
+  }
+
+  sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, renderCam->resolution, cudaimage, iterations);
 
   //retrieve image from GPU
   cudaMemcpy( renderCam->image, cudaimage, (int)renderCam->resolution.x*(int)renderCam->resolution.y*sizeof(glm::vec3), cudaMemcpyDeviceToHost);
 
   //free up stuff, or else we'll leak memory like a madman
-  cudaFree( cudaimage );
   cudaFree( cudageoms );
   cudaFree( cudamaterials );
+  cudaFree( cuda_pixelRays );
+  cudaFree( cuda_acc_refl_diff_colors );
+  cudaFree( cuda_cam );
+  cudaFree( cudaimage );
   delete [] geomList;
 
   // make certain the kernel has completed 
@@ -225,3 +301,36 @@ void cudaRaytraceCore(uchar4* PBOpos, camera* renderCam, int frame, int iteratio
 
   checkCUDAError("Kernel failed!");
 }
+
+
+// kernel to prepare some data structures needed by PathTracing
+__global__ void preparePathTracing(const cameraData* cam, float time,
+                                 PixelRay* pixelRays, glm::vec3* acc_refl_diff_colors)
+{
+  int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+  int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+  int index = x + (y * cam->resolution.x);
+
+  if (index < (int)cam->resolution.x*cam->resolution.y) {
+    // 1. pixel rays
+    pixelRays[index].r =
+        raycastFromCameraKernel(cam->resolution, time, x, y, cam->position, cam->view, cam->up, cam->fov);
+    pixelRays[index].pixelID = index;
+
+    // 2. initialize the accumulator for reflective and diffuse colors
+	glm::set(acc_refl_diff_colors[index], 1.f, 1.f, 1.f);
+  }
+}
+
+
+//// kernel to accumulate colors acheived in this iteration to the total accumulator
+//__global__ void accumulateColors(const glm::vec2 resolution, glm::vec3* totalAccumulator, const glm::vec3* accuThisIter)
+//{
+//	int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+//	int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+//	int index = x + (y * resolution.x);
+//
+//	if (index < (int)resolution.x*resolution.y) {
+//		totalAccumulator[index] += accuThisIter[index];
+//	}
+//}
\ No newline at end of file
diff --git a/src/raytraceKernel.h b/src/raytraceKernel.h
index 331e5ce..552fd14 100755
--- a/src/raytraceKernel.h
+++ b/src/raytraceKernel.h
@@ -6,7 +6,7 @@
 //       Yining Karl Li's TAKUA Render, a massively parallel pathtracing renderer: http://www.yiningkarlli.com
 
 #ifndef RAYTRACEKERNEL_H
-#define PATHTRACEKERNEL_H
+#define RAYTRACEKERNEL_H
 
 #include <stdio.h>
 #include <thrust/random.h>
@@ -15,6 +15,35 @@
 #include "sceneStructs.h"
 #include <cutil_math.h>
 
+#define RAYTRACEKERNEL_RAY_BOUNCE_MAX 10
+#define RAYTRACEKERNEL_DEPTH_OF_FIELD 
+
+typedef struct
+{
+  ray r;
+  
+  // pixel index in 1D
+  // traversing 2D image plane in the row-major order
+  // negative value means this ray is no longer considered.  
+  int pixelID; 
+} PixelRay;
+
+struct isTerminated
+{
+  __host__ __device__
+  bool operator()(const PixelRay pixRay)
+  {
+    return pixRay.pixelID < 0;
+  }
+};
+
 void cudaRaytraceCore(uchar4* pos, camera* renderCam, int frame, int iterations, material* materials, int numberOfMaterials, geom* geoms, int numberOfGeoms);
 
+// kernel to prepare some data structures needed by PathTracing
+__global__ void preparePathTracing(const cameraData* cam, float time,
+                                 PixelRay* pixelRays, glm::vec3* acc_refl_diff_colors);
+
+//// kernel to accumulate colors acheived in this iteration to the total accumulator
+//__global__ void accumulateColors(const glm::vec2 resolution, glm::vec3* totalAccumulator, const glm::vec3* accuThisIter);
+
 #endif
diff --git a/src/utilities.h b/src/utilities.h
index 5842c33..288491b 100755
--- a/src/utilities.h
+++ b/src/utilities.h
@@ -17,13 +17,13 @@
 #include <vector>
 #include "cudaMat4.h"
 
-const float PI                          =3.1415926535897932384626422832795028841971;
-const float TWO_PI                      =6.2831853071795864769252867665590057683943;
-const float SQRT_OF_ONE_THIRD           =0.5773502691896257645091487805019574556476;
+#define PI                          ((float)3.1415926535897932384626422832795028841971)
+#define TWO_PI                      ((float)6.2831853071795864769252867665590057683943)
+#define SQRT_OF_ONE_THIRD           ((float)0.5773502691896257645091487805019574556476)
 const float E                           =2.7182818284590452353602874713526624977572;
-const float EPSILON                     =.000000001;
+#define EPSILON                     ((float).000000001)
 const float ZERO_ABSORPTION_EPSILON     =0.00001;
-const float RAY_BIAS_AMOUNT             =0.0002;
+#define RAY_BIAS_AMOUNT             ((float)0.0002)
 
 namespace utilityCore {
     extern float clamp(float f, float min, float max);