1/12
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Intro Slide
First I wanted to thank you for taking the time of being on my committee, especially Dr. Chen. Who has worked with me for nearly a year as we work on this project. I truly appreciate all the help!
For my project, I applied a inverse reinforcement learning framework to a robotic pick and place task
There reason I chose this, is because I have had the opportunity with my company to automate a significant number of tasks, currently the methods we use to automate these tasks is by constraining nearly every aspect of the task.
these opportunities are getting scarce and I am looking for new techniques to automate some of the less constrained one.
In industry, many automation projects have the goal of replacing work already being performed.
People or machines already have a wealth of data in performing the task, however its rare that engineers take advantage of that data.
Which brings me to my goals for the project…
Goals
I had some personal and professional goals for this project, I wanted a project that had the potential to improve processes at work. Meaning I wanted a create frameworks for automating non-traditional projects. By non-traditional I mean stochastic processes, have some degree of randomness
Since this is my IRL project, I wanted one that didn’t require feature extraction from a human,
so I could focus on the IRL
there a lot of projects on the floor where we can generate expert trajectories using operator data. This is currently being studied significantly
however this requires multiple NN – many of them using convolutional NN to perform feature extraction on video.
This adds significant complexity and makes it difficult to adjust parameters in the expert trajectory. So the first approach was improving an existing automated process.
game video
As you watch the user move,
I want you to think about what he could be trying to achieve
(wait for video to finish)
Read the assumption
Green good
red bad
This Is inverse Learning
What you just did was inverse learning
What you just did was inverse learning
You saw the trajectory of someone who knows how to play and inferred the objective. This is what we are going to try and accomplish in this thesis
And you could see, if given enough of the paths, we could get a pretty good idea of the objectives
So.. How do we get a computer to do this.
if I said a state was the grid position
then one option is to define a reward and make that reward proportional to the states the user reaches.
If we counted how many time and agent reaches a state, multiply it by the “importance of the state”, you would have a way to compare an expert to a novice.
Then over time push the novice toward the expert path
To do this we define the reward as a weighted sum of the importance of each state times the state. There are a couple of issues with this alone
Infinite number of equations
The biggest problem, is there are infinite solutions
The trivial one – all zeros – no reward, the expert is just randomly moving
undervaluing or overvaluing states –
Since it is a weighted sum an undervalued or overvalued state could result in the same sum as the true reward
HOW DO WE FIX THIS?….
maximum entropy
We introduce a contrariant, one that makes the path selection easier
We pick the path with the maximum entropy.
What is entropy
In physics entropy is associated with chaos or random motion, it’s a measure of the difficulty to predeict
its very similar to information theory, I like to think of it as less constraints. The more constraints, the more fixed the trajectory becomes but you are making assumptions that may not be correct.
To the right… we can see the optimization problem of max entropy