thesis

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/12

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 4:23 PM on 3/15/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

1
New cards

Intro Slide

First I wanted to thank you for taking the time of being on my committee, especially Dr. Chen. Who has worked with me for nearly a year as we work on this project. I truly appreciate all the help!

For my project, I applied a inverse reinforcement learning framework to a robotic pick and place task

There reason I chose this, is because I have had the opportunity with my company to automate a significant number of tasks, currently the methods we use to automate these tasks is by constraining nearly every aspect of the task.

these opportunities are getting scarce and I am looking for new techniques to automate some of the less constrained one.

In industry, many automation projects have the goal of replacing work already being performed.

People or machines already have a wealth of data in performing the task, however its rare that engineers take advantage of that data.

Which brings me to my goals for the project…

2
New cards

Goals

I had some personal and professional goals for this project, I wanted a project that had the potential to improve processes at work. Meaning I wanted a create frameworks for automating non-traditional projects. By non-traditional I mean stochastic processes, have some degree of randomness

Since this is my IRL project, I wanted one that didn’t require feature extraction from a human,

so I could focus on the IRL

there a lot of projects on the floor where we can generate expert trajectories using operator data. This is currently being studied significantly

however this requires multiple NN – many of them using convolutional NN to perform feature extraction on video.

This adds significant complexity and makes it difficult to adjust parameters in the expert trajectory. So the first approach was improving an existing automated process.

3
New cards

game video

As you watch the user move,

I want you to think about what he could be trying to achieve

(wait for video to finish)

Read the assumption

  Green good

  red bad

4
New cards

This Is inverse Learning

What you just did was inverse learning

What you just did was inverse learning

You saw the trajectory of someone who knows how to play and inferred the objective. This is what we are going to try and accomplish in this thesis

And you could see, if given enough of the paths, we could get a pretty good idea of the objectives

So.. How do we get a computer to do this.

  if I said a state was the grid position

  then one option is to define a reward and make that reward proportional to the states the user reaches.

  If we counted how many time and agent reaches a state, multiply it by the “importance of the state”, you would have a way to compare an expert to a novice.

  Then over time push the novice toward the expert path

To do this we define the reward as a weighted sum of the importance of each state times the state. There are a couple of issues with this alone

5
New cards

Infinite number of equations

The biggest problem, is there are infinite solutions

  The trivial one – all zeros – no reward, the expert is just randomly moving

 

  undervaluing or overvaluing states –

  Since it is a weighted sum an undervalued or overvalued state could result in the same sum as the true reward

HOW DO WE FIX THIS?….

6
New cards

maximum entropy

We introduce a contrariant, one that makes the path selection easier

We pick the path with the maximum entropy.

What is entropy

In physics entropy is associated with chaos or random motion, it’s a measure of the difficulty to predeict

its very similar to information theory, I like to think of it as less constraints. The more constraints, the more fixed the trajectory becomes but you are making assumptions that may not be correct.

To the right… we can see the optimization problem of max entropy

7
New cards
8
New cards
9
New cards
10
New cards
11
New cards
12
New cards
13
New cards

Explore top notes

note
Module 8: Price Control
Updated 1257d ago
0.0(0)
note
Storms Review
Updated 1227d ago
0.0(0)
note
Leçon 1 D'Accord 3 Vocabulaire
Updated 1277d ago
0.0(0)
note
Stress
Updated 1249d ago
0.0(0)
note
Module 8: Price Control
Updated 1257d ago
0.0(0)
note
Storms Review
Updated 1227d ago
0.0(0)
note
Leçon 1 D'Accord 3 Vocabulaire
Updated 1277d ago
0.0(0)
note
Stress
Updated 1249d ago
0.0(0)

Explore top flashcards

flashcards
TOP 200 DRUGS FOR PTCB
200
Updated 718d ago
0.0(0)
flashcards
M.1 - Musical
27
Updated 1093d ago
0.0(0)
flashcards
BY 101 Unit 1
66
Updated 938d ago
0.0(0)
flashcards
AP Psych Unit 3-5
268
Updated 466d ago
0.0(0)
flashcards
asian worlds western imperalism
46
Updated 762d ago
0.0(0)
flashcards
Kap 5 Tysk Echt 1
20
Updated 1143d ago
0.0(0)
flashcards
TOP 200 DRUGS FOR PTCB
200
Updated 718d ago
0.0(0)
flashcards
M.1 - Musical
27
Updated 1093d ago
0.0(0)
flashcards
BY 101 Unit 1
66
Updated 938d ago
0.0(0)
flashcards
AP Psych Unit 3-5
268
Updated 466d ago
0.0(0)
flashcards
asian worlds western imperalism
46
Updated 762d ago
0.0(0)
flashcards
Kap 5 Tysk Echt 1
20
Updated 1143d ago
0.0(0)