thesis

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/12

There's no tags or description

Looks like no tags are added yet.

Last updated 4:23 PM on 3/15/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

New cards

Intro Slide

First I wanted to thank you for taking the time of being on my committee, especially Dr. Chen. Who has worked with me for nearly a year as we work on this project. I truly appreciate all the help!

For my project, I applied a inverse reinforcement learning framework to a robotic pick and place task

There reason I chose this, is because I have had the opportunity with my company to automate a significant number of tasks, currently the methods we use to automate these tasks is by constraining nearly every aspect of the task.

these opportunities are getting scarce and I am looking for new techniques to automate some of the less constrained one.

In industry, many automation projects have the goal of replacing work already being performed.

People or machines already have a wealth of data in performing the task, however its rare that engineers take advantage of that data.

Which brings me to my goals for the project…

New cards

Goals

I had some personal and professional goals for this project, I wanted a project that had the potential to improve processes at work. Meaning I wanted a create frameworks for automating non-traditional projects. By non-traditional I mean stochastic processes, have some degree of randomness

Since this is my IRL project, I wanted one that didn’t require feature extraction from a human,

so I could focus on the IRL

there a lot of projects on the floor where we can generate expert trajectories using operator data. This is currently being studied significantly

however this requires multiple NN – many of them using convolutional NN to perform feature extraction on video.

This adds significant complexity and makes it difficult to adjust parameters in the expert trajectory. So the first approach was improving an existing automated process.

New cards

game video

As you watch the user move,

I want you to think about what he could be trying to achieve

(wait for video to finish)

Read the assumption

Green good

red bad

New cards

This Is inverse Learning

What you just did was inverse learning

You saw the trajectory of someone who knows how to play and inferred the objective. This is what we are going to try and accomplish in this thesis

And you could see, if given enough of the paths, we could get a pretty good idea of the objectives

So.. How do we get a computer to do this.

if I said a state was the grid position

then one option is to define a reward and make that reward proportional to the states the user reaches.

If we counted how many time and agent reaches a state, multiply it by the “importance of the state”, you would have a way to compare an expert to a novice.

Then over time push the novice toward the expert path

To do this we define the reward as a weighted sum of the importance of each state times the state. There are a couple of issues with this alone

New cards

Infinite number of equations

The biggest problem, is there are infinite solutions

The trivial one – all zeros – no reward, the expert is just randomly moving

undervaluing or overvaluing states –

Since it is a weighted sum an undervalued or overvalued state could result in the same sum as the true reward

HOW DO WE FIX THIS?….

New cards

maximum entropy

We introduce a contrariant, one that makes the path selection easier

We pick the path with the maximum entropy.

What is entropy

In physics entropy is associated with chaos or random motion, it’s a measure of the difficulty to predeict

its very similar to information theory, I like to think of it as less constraints. The more constraints, the more fixed the trajectory becomes but you are making assumptions that may not be correct.

To the right… we can see the optimization problem of max entropy

New cards