Network Management

Network management.

In this section of the course,

we're going to talk about network management.

Now network management is the process of administering

and managing computer networks.

When you perform network management,

you're involved with thought analysis,

performance management,

provisioning of networks and clients,

as well as maintaining the quality of service.

To properly manage your network,

you need to ensure that you have

the right network documentation in place,

which includes the physical network diagrams,

logical network diagrams, wiring diagrams,

site survey reports, audit and assessment reports,

and baseline configurations,

as these are some of the key

and foundational pieces of documentation necessary

to fully understand where your network resides

and how it operates.

Also, you need to look at various performance metrics

and sensor data from across your network and its devices.

For example, if you don't know that one of your routers

is experiencing high processor utilization,

you may not be able to predict an upcoming network failure

or prevent it from occurring.

As you manage your network, you're also going to be working

to monitor the data flowing across this network,

which means you need to be able

to use things like NetFlow data to better understand

where this data is flowing.

On a network device level,

you can also manage individual interfaces

by considering their status, statistics

and the errors they generate.

Finally, it becomes really important

to understand the environmental factors

that can affect your network.

After all, our network devices are not operating

in the middle of nowhere,

but instead they're located inside of our data centers

and telecommunication closets.

To keep them operating at peak performance,

our devices need to ensure they're experiencing

the right environmental conditions,

and that means we have to make sure

these devices have the proper power, space and cooling

within their operating environments.

So in this section, we're going to focus on domain three,

network operations, and we're going to talk

about objectives 3.1 and 3.2.

Objective 3.1 states that given a scenario,

you must use the appropriate statistics and sensors

to ensure network availability.

And objective 3.2 states that you must explain

the purpose of organizational documents and policies.

So let's get started with our coverage of network management

in this section of the course.

Common documentation.

In this lesson, we're going to cover the common documentation

that you're going to use in your enterprise networks.

This includes physical network diagrams,

logical network diagrams, wiring diagrams,

site survey reports, audit and assessment reports,

and baseline configurations.

First, we have physical network diagrams.

A physical network diagram is used to show

the actual physical arrangement of the components

that make up your network

including the cabling and the hardware.

Typically, these diagrams give you a bird's eye view

of the network in its physical space,

and it looks a lot like a floor plan,

but you may also see physical network diagrams

that show how things are going to be cabled

within an individual rack of a data center as well.

For example, if I have a physical network diagram

showing the floor plan of a small office,

I can notate on that floor plan

exactly where each IP-based CCTV is going to be installed.

Now, in this example,

you can see I have nine IP-based cameras

and how the cable is going to run back to a central point,

such as this network video recorder

that contains nine power over Ethernet ports

to run this camera system.

I could just as easily have another flow plan like this

showing where all my network jacks are going to be located

in an office and how the cables are being run back

to a patch panel, and from that patch panel

back to an edge switch that connects to all these devices.

Inside my data center, though, I'm usually more concerned

with how things are physically located

within one single rack.

And so I can create a rack diagram.

For example, here's a diagram showing a rack,

containing two storage area network controllers,

two firewalls, two switches,

three virtual machine host servers running ESXi,

a backup server, a modular smart array,

and a tape backup library.

From this diagram, you can clearly see where in this rack

each of those units is going to be located

and which network cables will connect to which ports

on which devices.

Another version of this type of diagram

may include a front view,

which also shows the location inside the cabinet.

And it can have the different device names

and the IP addresses for each of those devices,

but it won't show the actual network cables

and where they connect

because you're looking at the front of those devices.

Now, another type of physical network diagram we have

is used to provide documentation

for how our main distribution frame or MDF

and our intermediate distribution frame or IDF

are connected and cabled.

In this example, you could see a very generic MDF

and IDF layout for a typical three-story office building.

Here, we have the MDF on the bottom right corner

of the first floor,

and then a smaller IDF on the right corner

of each of the remaining floors.

There's an interconnection between each IDF and the MDF,

and each floor has a single network cable

running to a jack into an office.

Now, this of course is a very oversimplified diagram

or an overview diagram,

but we can also have additionally detailed diagrams

depending on how much work we want.

That could show each rack inside the MDF or the IDF

and what they would look like,

how they're cabled, including their edge switches,

patch panels, and other networking equipment.

The second type of documentation we need to cover

is logical network diagrams.

Unlike the physical network diagrams

that show exactly which port or cable

is going to connect it to and how it's ran

on the physical floor plan or the rack layout,

we use a logical diagram.

We're going to use this to illustrate the flow of data

across a network and it's going to be used to show

how devices are communicating with each other.

These logical diagrams will include things like the subnets,

the network objects and devices,

the routing protocols and domains, voice gateways,

traffic flow, and network segments within a given network.

Traditionally, network diagrams were drawn by hand

and using symbols to represent the different network devices

like routers, switches, firewalls,

intrusion detection systems, and clients.

In this example, I'm using the standard Cisco notation

to demonstrate how the various switches and routers

are being connected to form this network.

On the logical diagram, we also include the IP addresses

and the interface identifiers such as G0/1

or gigabit Ethernet 0/1 or ATM1/0,

which is for an ATM interface for our routers and switches.

Notice the routers are being represented by a circle

with four arrows, two pointing inward,

and two pointing outward.

Switches are going to be represented by a square

with four arrows, all pointing outward.

Servers like a DHCP, DNS, or TFTP server

are represented by a large rectangle server icon.

And the computers are going to be shown

using a computer icon.

Another symbol you may see included

is an intrusion detection system

or intrusion prevention system,

which is going to be a rectangle

that contains a circle inside of it

with two arrows crossing over the circle.

A firewall is usually represented by a brick wall

and an access point is going to be represented

by a rectangle with a series of radio waves going out of it

from the left to the right.

Now, as you look at various network diagrams

on the internet, you may come across some more

modern network diagrams that remove the symbols

and instead use pictures of networking equipment

that's going to be used in the diagrams instead.

In this example, you can see the router

connected to the switches and those switches are connected

to the client PCs.

Next, we have a wiring diagram

and this is something we already looked at briefly

as part of our physical network diagrams.

Wiring diagrams can occur with both physical

and logical network diagrams,

as long as they are clearly labeled,

which cable is connected to which port.

The more in-depth wiring diagrams

are going to include a floor plan or a rack diagram,

so you can see exactly where the cables are being run

in the physical environment.

Next, we have site survey reports.

These are often conducted as part of a wireless survey

or an assessment.

Now, a wireless site survey sometimes

called an RF or radio frequency site survey,

or a wireless survey

is the process of planning and designing a wireless network

to provide a wireless solution

that will deliver the required wireless coverage,

data rates, network capacity, roaming capability,

and quality of service or QoS.

In this example, you could see a floor plan that includes

the locations of each wireless access point being shown.

Then rating out from each access point,

you see bands of color, going from green to yellow

to orange to red.

And this indicates the strength of the wireless signal.

Now, when you see green, that's a strong signal.

When you see red, that's a weaker signal.

Wired site surveys are also conducted sometimes,

but in these cases

it's usually done as part of a preparation

for a major upgrade or installation.

With a wired site survey,

the installation team is going to come out

and look at your MDFs, your IDFs, and your data centers

to determine if you have the right power, space, and cooling

to support whatever new equipment

you're going to be installing as part of that upgrade.

For example, if I was going to install three new racks

of equipment in your data center,

I need to go out there and look at it

and make sure you have the physical space required

to hold those three racks.

In addition to that, I need to make sure

you have a powerful enough HVAC system

to remove all the extra heat that my new equipment

in these three racks is going to produce.

I also want to make sure your site has the right power

and backup generators and battery backups

that you can handle all the extra power

that's going to be drawn by all this new equipment.

Next, we have audit and assessment reports.

Audit and assessment reports

are delivered to your organization

after a formal assessment has been conducted.

These reports will contain an executive summary,

an overview of the assessment scope and objectives,

the assumptions and limitations of the assessment,

the methods and tools used during the assessment,

a diagram showing the current environment and systems,

the security requirements,

a summary of findings and recommendations,

and the results of the audit.

Essentially, this report is going to contain

all the issues the audit team found with your organization,

as well as anything your organization

is already doing right,

and things they should continue to keep doing.

Finally, we have baseline configurations.

The documented baseline configurations

are the most stable versions of a devices configurations.

These baseline configurations

are documented set of specifications

for information system or a configuration item

within that system, that has been formally reviewed

and agreed on at a given point in time,

and which can now only be changed

through change control procedures.

So, if you want to change the baseline

due to an operational need, you need to follow

the proper configuration management procedures

to request those changes.

Those changes will then be properly tested and approved,

and they become part of the new baseline

for those devices moving forward.

As you can see, there is a bunch of documentation

that you're going to use in your enterprise networks,

including your physical network diagrams,

logical network diagrams, wiring diagrams,

site survey reports, audit and assessment reports,

and baseline configurations.

(logo whirring)

Performance metrics.

In this lesson, we're going to talk

all about performance metrics and how they're used

to ensure network availability.

Now, network performance metrics are a large part

of network monitoring.

Network performance monitoring is the end-to-end

network monitoring of your end user experience.

This differs from traditional monitoring though,

because traditional monitoring is focused on performance

between two points, like a switch and a router,

but with network performance monitoring,

we're going to look at the overall end user experience

by monitoring the performance

from the end user's workstation to the final destination

that they're trying to reach.

So to help us monitor network performance,

there's really going to be three key metrics

that we're going to use.

These are latency, bandwidth and jitter.

The first metric is latency.

Now, latency is the measure of the time that it takes

for data to reach its destination across a network.

Usually we measure network latency as the round trip time

from a workstation to the distant end

and back to the workstation.

We report this time in milliseconds.

Now, for example, let's say you open up your command prompt

and you enter the command ping 8.8.8.8, and you hit enter.

You're going to get a response that tells you

how long it took for an ICNP packet to leave your computer

reach the Google DNS server located at eight dot 8.8.8.8

and return to your computer again.

In my case, this took an average time of 38.2 milliseconds

when I did it for four repetitive ping requests.

Now it's important to measure the round trip delay

for network latency because the computer that uses

a TCP IP network can send only a limited amount of data

to its destination at one time,

and then it sits and waits for an acknowledgement

that that data was received before it sends out

more data across the network.

So if you have high latency or a long round trip delay,

this can drastically slow down

your overall network performance for your end users.

Now, if you're seeing consistent delays

or even just spikes in the delay time in your network,

this could indicate a major performance issue

that's going to be occurring.

For regular web traffic these delays

aren't usually noticeable,

but if you're using streaming video applications,

things like voiceover IP, or you're playing video games,

these delays are extremely noticeable

and they can cause a lot of problems for your end users.

Our second metric media monitor is known as bandwidth.

Now bandwidth is the maximum rate of data transfer

across a given network.

Now, technically bandwidth is actually

a theoretical concept that measures how much data

could be transferred from a source to a destination

under ideal conditions.

But in reality, when we're talking about our networks

and our connections, they're rarely operating

at the perfect or ideal conditions.

Therefore we often measure something known as throughput

instead of bandwidth to monitor our network performance.

Throughput is the actual measure of data

as it's being successfully transferred

from the source to the destination,

but you'll often hear people use the terms

bandwidth and throughput interchangeably,

not realizing there is a difference.

Technically, bandwidth is a theoretical limit

where throughput is the reality of what you're achieving.

So if you want to a bandwidth speed test

or more accurately, a throughput test for your network,

you can go to something like speedtest.net and click on go,

and you'll have a series of downloads and uploads

that'll occur from your workstation

to their server and back.

Then it will report to you how fast or slow

your connection was in terms of throughput.

In this example, you can see my results indicate

I have a throughput with a top download speed

of 240 megabits per second,

and a top upload speed of around 241 megabits per second.

The problem with that is that my actual bandwidth,

my theoretical, limit should be 650 megabits per second

for downloads and 310 megabits per second for uploads.

So why is my throughput so much less?

Well, when I was doing this test,

I connected to my office network using my wifi adapter

and not directly connecting through a wired switch.

At the same time, there's other people

in the office using the connection,

and all of these factors lead to a less than ideal

environment and this makes my throughput much lower

than my expected bandwidth.

As I make different changes to my network,

I can retest the throughput to see if those changes

help or hurt my overall throughput.

For example, if I switched

from a wireless internet connection

to a wired internet connection,

I'll be able to see a dramatic increase

in overall throughput that I wouldn't see

over that wireless connection.

The third metric we need to monitor is known as jitter.

Jitter is the network condition that occurs

when a time delay in the sending of the data packets

over a network connection is occurring.

Now jitter is really a big problem

for any real-time applications that you may be supporting

on your network.

If you're doing things like video conferences

and voiceover IP, and virtual desktop infrastructure,

all of these are negatively affected by jitter.

Basically a jitter is simply a variation

in the delay of the packets.

And this can cause some really strange side effects,

especially for your voice and video calls.

If you've ever been in a video conference

and somebody starts speaking,

and then all of a sudden you hear their voice

start speeding up for about five or 10 seconds,

and then it returns back to normal speed,

that usually is because of jitter on their network.

If you have a good quality of service management in place,

you shouldn't experience a lot of jitter,

but if you're not doing QoS properly,

then jitter will occur.

You see, when your network suffers from congestion,

the network devices like your routers and switches

are going to be unable to send the equivalent amount of traffic

as what they're receiving.

This causes their packet buffers to start to fill up,

and eventually they'll start to drop packets

if they have too much in the buffers.

This is known as packet loss.

Now, when this happens, your TCP packets

are going to get resent, and this causes

increased network load again.

Now on the other hand, if the buffer begins to fill up,

but then the network congestion eases up

those buffers will be able to quickly send

all of their contents to the destination.

The destination will then try to process them all,

but usually it can't do that.

And this leads to delays in processing

that can result in jitter on the endpoint device as well.

So to prevent jitter, we want to ensure our network

is using quality of service properly.

We want to make sure we're categorizing

and prioritizing our voice and video traffic over

the other types of traffic.

Also, we need to verify our network connections

and our devices are large enough to support

the amount of data that we're trying to transfer.

As a network administrator,

it's your responsibility to always monitor

your network's performance and the three key metrics

you always should be keeping track of

are latency, bandwidth or throughput and jitter.

Sensors.

In this lesson,

we're going to talk about sensors

that help us monitor the performance of our network devices,

those devices like routers, switches, and firewalls.

Now, these sensors can be used to monitor

the device's temperature, its CPU usage, and its memory,

and these things can be key indicators

of whether a device is operating properly

or is about to suffer a catastrophic failure.

Our first sensor measurement we need to talk about

is the temperature of the device.

Now, most network devices like your routers,

switches, and firewalls

have the ability to report on the temperature

within their chasses.

Now, depending on the model,

there may be only one or two temperature readings

or on some larger enterprise devices,

you may have a temperature reading

on each and every controller, processor, interface card,

and thing like that inside the system.

Now, the temperature centers

can be used to measure the air temperature

inside the intake outlet

and the air temperature at the exhaust outlet at a minimum.

Now, for each of these sensors,

you can set up minor and major temperature thresholds.

A minor temperature threshold is used to set off an alarm

when a rising temperature is detected,

but it hasn't reached dangerous levels yet.

When this occurs,

a system message is displayed,

an SNMP notification is sent,

and an environmental alarm can be sounded.

Now, when you have a major temperature threshold,

this is going to be used to set off an alarm

when the temperature reaches dangerous conditions.

At this level, we want to still display those system messages,

get that SNMP notification,

and have the environmental alarm sounded.

But in addition to that,

the device can actually start to load shed

by turning off different functions to reduce the temperature

being generated by the device's processor.

For example, let's say you have a router

with multiple processing cards in it.

That device may shut down one of those processing cards

to prevent the entire system from overheating.

That's what I mean by load shedding.

Now, when a device runs at excessive temperatures

for too long,

the performance will decrease on that device

and the lifespan will decline on that device as well.

Over time, that device can even suffer

a catastrophic failure from overheating.

Our second sensor measurement we need to talk about

is CPU usage or utilization on the device.

At their core,

routers, switches, and firewalls

are just specialized computers.

When these devices are running under normal conditions,

their CPU or central processing unit

should have minimal utilization

somewhere in the range of 5 to 40%.

But if the devices begin to become extremely busy

or receive too many packets from its neighboring devices,

the CPU utilization can become over-utilized

and the percentage will increase.

Now, if the CPU utilization gets too high,

the device could become unable to process any more requests

and it'll start to drop packets

or the entire connection could fail.

Usually, when you see a high processor utilization rate,

this is an indication of a misconfigured network

or a network under attack.

If the network is misconfigured,

for example, let's say you have a switch

that's misconfigured,

you can end up having a broadcast storm that occurs,

and that's going to create

an excessive amount of broadcast traffic

that'll cause the switch's CPU to become utilized

as it tries to process all those requests.

Similarly, if you have a lot of complex

and intricate ACLs on your router,

and then people started sending a lot of inbound traffic,

that router has to go through all of those ACLs each time

for that traffic and that can make it become unresponsive

due to high CPU usage.

As an administrator,

you need to monitor the CPU utilization

in your network devices

to determine if they're operating properly,

if they're misconfigured,

or if they're under attack.

The third sensor measurement we use

is memory utilization for the device.

Similar to high CPU utilization,

high memory utilization can be indicative

of a larger problem in your network.

If your devices begin to use too much memory,

this can lead to system hangs, processor crashes,

and other undesirable behavior.

To help protect against this,

you should have minor, severe,

and critical memory threshold warnings

set up in your devices

and reporting back to your centralized monitoring dashboard

using SNMP.

As a baseline,

your never devices should operate

at around 40% memory utilization

under normal working conditions.

During busier times,

you may see this rise up to 60 to 70%,

and during peak times it may be up to 80%,

but if you're constantly seeing memory utilization

above 80%,

you may need to install a larger

or more powerful device for your network,

or you could be under an attack

for an excessive amount of time

that's causing excessive loading.

As you begin to operate your networks in the real world,

you're going to begin to see what normal looks like

for your particular network.

As you see temperatures rising

or CPU and memory utilizations increase,

this can trigger alarms in the network configuration

or a network performance issue is happening right now.

Then you need to investigate the root cause of that

and solve those issues

by bringing those metrics back to a normal level

within your baseline.

NetFlow data.

In this lesson, we're going to talk about NetFlow data

and how it's used to conduct traffic flow analysis

within our networks.

In order to best monitor traffic in our network,

we can either use full packet capture or NetFlow data.

Now, as you might've guessed,

packet captures can take up a lot of storage space

and they can grow quickly in size.

For example, if I'm conducting a full packet capture

on my home network each day,

I would need several gigabytes of storage

just for my small family,

because every single packet that goes in or out of my house

would be captured and logged.

Every video game my son is playing online,

every YouTube video he watches,

every Netflix show my wife is binging.

All of that will be captured bit-by-bit

inside of that full packet capture.

Now, a full packet capture, or FPC,

is going to capture the entire packet.

This includes the header and the payload

for all the traffic that's entering or leaving your network.

As I said, this would be a ton of information

and quickly eat up all of our storage.

Now, because full packet capture takes up so much space,

we often don't collect it in a lot of organizations.

Most businesses and organizations instead will use NetFlow.

Now, NetFlow data, and other similar protocols like that,

are used to conduct something known as flow analysis.

Flow analysis will rely on a flow collector

as a means of recording metadata

and statistics about network traffic

instead of recording each and every frame

or every single packet

that's going in or out of our network.

This allows us to use flow analysis tools

that provide network traffic statistics

sampled by the collector.

Now, by doing this, we can capture information

about the traffic flow instead of the data contained

within that data flow.

And this saves us a lot of storage space.

Now with NetFlow and flow analysis,

we're not going to have the contents of

what's going over the network like we would

with a full packet capture,

but we can still gather a lot of metadata

and information about the network traffic

that's helpful to us in our monitoring.

This information is stored inside a database

and it can be queried later by different tools

to produce different reports and graphs.

Now, the great thing about flow analysis is

it's going to allow us to highlight trends and patterns

in the traffic being generated by our network.

And this becomes really useful

in our network performance monitoring.

Flow analysis will allow us to get alerts

based on different anomalies we might see

and different patterns or triggers

that are outside of our expected baselines.

These tools also have a visualization component

that allows us to quickly create

a map of different network connections

and the associated flow patterns over those connections.

By identifying different traffic patterns

that might reveal bad behavior, malware and transit,

tunneling, or other bad things out there,

we're going to be able to quickly respond

to these potential problems or incidents.

Now, there are a few different tools we can use

when dealing with traffic flow analysis.

This includes things like, NetFlow, Zeek,

and the Multi Router Traffic Grapher.

Let's take a look at each of these for a moment.

First, we have NetFlow.

NetFlow is a Cisco develop means of reporting

network flow information to a structured database.

NetFlow is actually one of the first data flow analyzers

that was created out there, and eventually,

it became basically the standard that

everyone started to use under the term, IPFIX,

or IP Flow Information Export.

Now, NetFlow allows us to define a particular traffic flow

based on different packets that

share the same characteristics.

For example, if we want to identify packets

with the same source and destination IP,

this could signify there's a session between those two hosts

and it should be considered one data flow

that we can collect information on.

Now, when you look at NetFlow data,

you can capture information about the packets

that are going over these devices,

like the network protocol interface that's being used,

the version and type of IP being used,

the source and destination IP address,

the source and destination port, or the IP type of service.

All of this information can be gathered using NetFlow

and then analyzed and displayed visually

using our different tools.

For example, here you can see that I'm using SolarWinds

as a tool to show the NetFlow data of a network.

But you could also review this data

in a text-based environment using NetFlow exports themself.

In this graphical environment though,

it becomes really easy to see that

there are 15 different traffic flows.

And if I expand the 15th data flow,

we can see the source and destination IP,

the source port, the destination port,

some basic information about that data flow,

but we're not seeing the content of any of those packets

that were part of this data flow.

For us to be able to do that,

we would have to have a full packet capture,

but here we only captured the metadata

or the information about those traffic flows.

Now, if you want to be able to have the best of both worlds,

you can use something like Zeek.

Now, Zeek is a hybrid tool

that passively monitors your network like a sniffer,

but it's only going to log full packet captures

based on data of potential interest.

Essentially, Zeek is going to sample the data

going across the network, just like NetFlow does,

but when Zeek finds something that it deems interesting,

based on the parameters and rules you've configured,

it's going to log the entire packet for that part

and then send it over to our cybersecurity analyst

for further investigation.

This method helps us reduce our storage

and processing requirements,

and it gives us the ability to have all this data

in a single database.

Now, one of the great things about Zeek is that

it performs normalization of this data as well,

and then stores it as either a tab-delimited,

or JavaScript Object Notation, or JSON-formatted text file.

This allows you to use it with

lots of other different cybersecurity tools

and different network monitoring tools as well.

For example, now that I have this normalized data,

I can import that data into another tool for visualization,

searching and analysis.

Here, I've imported my Zeek logs into Splunk,

and then I can have my cybersecurity analyst

search for specific information during a potential incident.

Now, the third tool we have to talk about is

MRTG, or the Multi Router Traffic Grapher.

The Multi Router Traffic Grapher is a tool

that's used to create graphs to show network traffic flows

going through our network interfaces

on different routers and switches

and it does this by pulling these appliances,

using SNMP, the simple network management protocol.

So, what is useful about a visualization like this?

Well, you're going to be able to start seeing patterns emerge

that may be outside of your baseline.

For example, here in the top graph,

you could see a big spike in traffic

between 2:00 AM and 4:00 AM.

Is that normal?

Well, maybe, and maybe not,

but it's something we should further investigate and analyze

because we're seeing this big spike occur

between 2:00 AM and 4:00 AM.

And that might be something normal,

like doing offsite backups,

or it could be something malicious.

If it was the case of something that was normal,

like an offsite backup,

you're going to see this big spike in traffic

because we're sending a backup copy of

all of our data offsite to a cloud provider facility.

That might be a reasonable explanation.

And in that case, I wouldn't need to worry

because I would see that every single night

and I'd be used to seeing it.

Now, on the other hand,

maybe that server has been infected with malware

and every day at 2-4:00 AM,

it's going to send all of the data back to the bad actors

while all my administrators are at home sleeping.

This is considered data exfiltration

as part of an attack campaign,

that's something you want to be on the lookout for.

Now, just looking at this graphic,

I don't know which of these two cases it is.

Is this something normal, like a backup,

or is this something malicious?

But if you know your organization

and you know your baselines,

now you can look at this graph and identify

what should be investigated based on seeing

that spike between 2:00 AM and 4:00 AM,

and then figuring out where

that additional traffic flow is going, and why.

If we suspected something was malicious here,

like somebody exfiltrating our data,

then we might set up a network sniffer

in front of our file server and see

what traffic is leaving the network and where it's going.

Then, based on that, we can have an instant response

on our hands and do our cleanup.

Now at this point, we just don't know

if this is malicious or not,

but we do know it's something different

and something that is outside of the normal baseline

as indicated by that big spike.

So, it's important for us to investigate it

for the health of our network.

Interface Statistics.

In this lesson, we're going to talk about interface statistics

and how it's used to monitor our network's performance.

Now, if you're new to networking,

you may be wondering what exactly is an interface?

Well, an interface is just one of the physical

or logical switch ports on a router, switch or firewall.

In enterprise level devices,

each interface can generate its own statistics

and maintains its own status.

In this lesson, we're going to explore the link state,

the speed and duplex status,

the send receive traffic statistics,

the cyclic redundancy check statistics,

and the protocol packet and byte counts

that are collected for our network devices.

To help guide our discussions,

I'm going to be using the output from a Cisco router

for an interface called f0/0,

which simply means it's a fast Ethernet or Cat5 connection

going from this physical interface on slot zero

and port zero of a given router.

Now, first you can see, we have the Link State.

A link state is used to communicate whether or not

a given interface has a cable connected to it

and a valid protocol to use for communication.

For example, if I connected a fast Ethernet

unshielded twisted pair cable to the interface

on 0/0 of this router,

and then plug in the other end into another router

to create a connection,

I should see fast Ethernet 0/0 is up, line protocol is up.

This indicates that the interface is physically up

and the protocol is operational.

If we're using Ethernet, that means that frames

are able to be entering and leaving this interface.

Next, we have some information about the interface itself,

such as the MAC address and the IP address assigned to it.

After that, we see there's an MTU size set of 1500 bytes,

which is normally used by default in Ethernet.

And then we have the bandwidth

is being set at 100,000 kilobits per second,

which is 100 megabits per second.

This makes sense because I'm using fast Ethernet

or Cat5 cabling for our connection.

This speed is also used by the router

when it's trying to calculate the metrics

for the routing protocols like OSPF and EIGRP,

since they rely on the connection speed

in making their determinations and their link costs.

Next, we have the reliability,

which is being shown here as 255 out of 255.

This means if the connection begins to have more input

or output errors, you're going to see the reliability lower.

Basically, you read this as reliability equals

the number of packets divided by the total number of frames.

So, 255, over 255 is the best reliability,

and indicates there was no packets or frames

that have been dropped so far.

txload is our next statistic.

And this is going to indicate

how busy the router is transmitting frames

over this connection.

At one out of 255, this router is not very busy at all.

rxload is like txload, but instead of transmitting,

we're going to be measuring

how busy the router is in terms of receiving frames.

Next, we have the archetype being used.

In this case, we're using ARPA,

or the Advanced Research Projects Agency setting,

which indicates that we're using standard Ethernet.

This is because ARPA develops standard Ethernet,

and we're using that for Ethernet frames

for our encapsulation.

Now, if you're using something different,

like a serial link or a frame relay,

it would say something different here instead of ARPA.

But if you're using Ethernet,

you should expect to see ARPA right here.

Next, we have the keepalive,

and this is set to 10 seconds, which is the default.

This is how often the router

is going to send a keepalive packet

to other devices that it's connected to,

to check if they're still up and online.

Next, we have a line that says full-duplex,

100 megabits per second, 100BaseTX/FX.

Now, this indicates whether this interface is using half

or full-duplex, and in this case we're using full-duplex.

It also tells you what the bandwidth is,

and the interface type you're using.

In this case, as I said, we're using full-duplex

and we're using 100 megabits per second as our bandwidth,

and we have a fast Ethernet interface type,

and it's either using copper or fiber cabling,

because it says TX/FX.

Now, next we're going to have our ARP type.

And in this case, again, we're going to use ARPA.

The timeout here tells us how long each ARP cache

is going to remember a binding, and when it will be cleared.

In this case, we're using the default time of four hours.

The next two lines are the last input, last output,

and last clearing of the counters.

In this case, the router was just rebooted,

so they're all set to zero

because they were all just cleared.

Next, we have our input queue,

which tells us how many packets are in the input queue,

and their maximum size.

In this case, the maximum size is 75 packets for our queue.

Drops is the number of packets

that have been dropped so far.

Flushes is used to count the Selective Packet Discards

that have occurred basically when the router

or switch has a sign, it needs to start shedding some load,

and it starts dropping packets selectively.

SPD is a protocol that's going to drop

your lowest priority packets when the CPU becomes too busy,

so that way you can save capacity

for higher priority packets as a form of quality of service.

Now, the total output drops here is at zero.

This means that we've had no drops

because we never had a full output queue.

Since we have a hundred megabit per second connection,

as long as we're communicating

with another a hundred megabit per second connection,

we should see this stay at zero drop packets.

If we started using a 20 megabit per second connection,

for instance, on our ISP,

then we might likely have an experience here

with network congestion because we're sending at 100,

but they can only take it at 20.

That would cause a problem for us.

And at that point, some of our packets might get dropped.

Next, we have our queuing strategy

for our quality of service.

In this case, we're sending this as First In, First Out,

which is known as FIFO.

This is the default for this type of router.

Next, we have output queue size and the maximum.

Currently, our queue is empty and it's showing zero packets.

Now, the maximum queue size here is set at 40.

So, if I receive more than 40 packets,

the queue is not going to be able to hold it

and the rest of those will get dropped.

Next, We have our minute input and output rates.

Now, here are the average rates

at which packets are being received and being transmitted.

Packet input is our next line.

And here we can see 923 packet inputs was received

for a total of 158,866 bytes of data being processed.

The next line contains the receive broadcast.

And in this case, we received 860 broadcast frames.

We also have runts,

giants and throttles counted here as well.

Now, a runt is an Ethernet frame

that is less than 64 bytes in size.

It's really small, that's why it's a runt.

A giant is any Ethernet frame

that exceeds the 802.3 frame size of 1,518 bytes.

It's really large, so it's a giant.

Throttles are going to occur

when the interface fails to buffer the incoming packets.

If this is a high number, this is an indicator

that you may be having quality of service issues

to your end users.

Next, we have input errors,

CRC, frame, overrun, and ignored.

The input error counter will go up whenever the interface

is receiving a frame with any kind of error in it.

This can be something like a runt, a giant,

no buffer available, CRC errors, or other things like that.

CRC is the number of packets that were received,

but failed the cyclic redundancy checksum,

or CRC check upon receiving them.

If the checksum generated by the sender

doesn't match the one calculated by this interface

when it receives that frame,

a CRC error is counted and the packet gets rejected.

Now, frame is used to count the number of packets,

where a CRC error

and a non-integer number of octet was received.

Overrun is used account

how often the interface was unable to receive traffic

due to an insufficient hardware buffer.

Ignored is going to be used to count the number of packets

that the interface ignored since the hardware interface

was low on the internal buffers.

If you're experiencing a lot of noise on the connection

or a broadcast storm,

this ignore count will start to rise drastically.

Next, we have the watchdog counter, which is used to count

how many times the watchdog timer has expired.

This happens whenever a packet over 2048 bytes is received.

The next line contains the input packets

with dribble condition detected,

which means that a slightly longer than default frame

was received by the interface.

For example, we talked about the fact that the MTU size

was 1500 bytes by default,

but a frame wasn't considered a giant

until it reached 1,518 bytes.

So, if I got a frame that was 1,510 bytes inside,

it's technically above the MTU size,

but it's not yet a giant.

So it would still be processed,

but it would be added here on the dribble condition counter,

so I can know that I'm starting to get packets

above 1500 bytes.

Next, we have the packet output counter,

and this is the number of packets that have been sent

and the size of those transmissions in bytes.

The underrun is the number of times a sender

has operated faster than the router can handle,

and this causes buffers or drop packets.

Next, we have the output errors,

and this is just like our input errors, the only difference

is we're now counting the number of collisions

and the interface resets that are occurring as a result.

A collision is counted

anytime a packet needs to be retransmitted

because an Ethernet collision occurred.

Since we're using full-duplex, this number should be zero.

If it's not zero, something's wrong.

Next, we have the interface reset,

and this counts the number of times an interface

had to be completely reset since the last reboot.

Next, we have unknown protocol drops.

Anytime a protocol drops,

but our device can't determine what protocol it was,

it's going to be listed under the unknown protocol drops.

For example,

if you're not supposed to receive older types of protocols

like IPX traffic and AppleTalk on your router,

but somebody sends you a message that's formatted that way,

your router is going to drop it,

and it's not going to know what it was,

because it's not a properly format IP message

or an Ethernet frame.

So that counter is going to go up.

Next, we have babbles, late collision, and deferred.

Now, a babble is used to count any frame

that is transmitted, that is larger than 1,518 bytes.

This is similar to our giants,

but we're going to use this when we're transmitting,

instead of receiving.

A babble is for transmission, a giant is for receive.

Late collisions are going to be used

to count the number of collisions that occur

after the interface started transmitting its frame.

And deferred is used to count the number of frames

that were transmitted successfully

after waiting because the media was busy.

So, if your devices are using CSMA/CD

or collision detection, it's going to detect the media as busy,

it's going to wait, and then it's going to transmit.

When this happens,

this number is going to go up because it had to wait.

Again, we should see zero for late collisions

and deferred here

because we're using a full-duplex connection,

but if we're using a half-duplex connection,

there will be some numbers there.

Next, we have the loss carrier and the no carrier.

This is the number of times that the carrier was lost

or not present during the transmission.

The carrier we're talking about here

is the signal on the connection.

Finally, we have the output buffer failures and swapped out.

The Output Buffer Failure is going to be used

to count the number of times the packet was not output

from the output hold queue

because of a shortage of shared memory.

An Output Buffer Swap Out

is going to be the number of packets stored in the main memory

when the queue was full.

If this number is very high,

that means that you're likely experiencing

a busy time in your networks.

Now, for the exam, you don't need to know all these things

and memorize all their definitions,

but you should be aware of some key statistics here

on the interface.

Things like the link state, the speed and duplex status,

the send and receive traffic statistics,

the cyclic redundancy check statistics,

the protocol packet and byte counts,

the CRC errors, the giants, the runts,

and the encapsulation errors.

On the exam, you may get a question

that involves troubleshooting a device,

and you're going to see

an interface statistics screen like this,

and then you're going to have to recommend a solution

to that problem.

For example, if the question asks,

why the device is operating slowly,

and you see the connection is set to half-duplex

instead of full-duplex,

that would be a reason for the slowdown,

because you effectively cut your bandwidth in half,

because there's the listen before transmitting.

Or, if you see a large amount of collisions,

but you're running full-duplex,

that would indicate there's two devices

connected to this same switch port,

and that is causing you issues.

Or, maybe you see there's a lot of CRC errors,

this could indicate a dirty fiber connector,

or unshielded twisted pair cable

that's subject to too much electromagnetic interference.

This could be caused by lots of different things,

such as your cable being improperly run over

a florescent light or near a power line,

or something like that.

My point is,

it's important to be able to read the interface statistics

so you can then troubleshoot

your network connectivity issues

in your routers and switches.

Environmental sensors, in this lesson,

we're going to talk about environmental sensors

that help us monitor our physical environments

where our network devices are operating,

such as our data centers, our telecommunication closets,

and our main distribution frames.

These sensors are going to be used to monitor

our environmental conditions.

Things like our temperature and humidity,

as well as the electrical power status and whether or not,

we may be experiencing flooding.

After all, all of these routers and switches

are sitting in a telecommunication closet somewhere,

and nobody's sitting in there with them

looking at them every day.

So, how am I going to keep track of all of them?

How do I know the power is still on?

How do I know there's enough cooling there?

How do I know they haven't gotten covered in water

from a leaking pipe?

Well, this is where environmental monitoring

becomes extremely important.

Environmental monitoring relies on different types

of sensors that can be configured

to report back periodically,

or can be pulled from a central monitoring station

repeatedly, to maintain the status of those areas.

Our network devices need to operate in a cool and dry place.

To maintain the proper temperature and humidity,

we can have sensors that communicate with our HVAC system.

If the temperature begins to get too hot,

the HVAC system can increase the airflow

and cool the telecommunication closets more.

If the area gets too cold, it can reduce the airflow

and bring the temperature back to the right range.

Most network devices want to be operating

between 50 and 90 degrees Fahrenheit.

So, using an automated HVAC system with sensors,

can help ensure that occurs.

Additionally, we need to ensure this area

maintains the right humidity levels.

If there's too much humidity,

this can cause condensation in the equipment,

and that leads to water on our circuit boards,

which will destroy our network devices.

Conversely, if we have humidity that's too low,

static electricity can build up

and it can short out our equipment.

Therefore, we always want to make sure our humidity range

is between 40 and 60%.

Again, by having proper humidity sensors

connected to our HVAC systems,

we can increase or decrease the humidity

to keep it in that perfect 40 to 60% range.

Next, we need to ensure all our devices have power.

We can install sensors on our power lines,

or use our power distribution centers

to track the power levels

going into our pieces of networking equipment.

This allows to know if there's a surge, a spike,

a brown out or a blackout, or simply dirty power.

All of this can be remotely monitored

by our central monitoring systems

by using internet of things devices like power sensors.

Finally, we need to ensure devices

are not subject to flooding.

Again, we can place sensors in our telecommunication closets

and other non-human occupied spaces,

to detect if there's any water on the floor

due to a burst pipe or other sources of flooding.

These sensors can detect the change from dry to wet,

and when they become wet, they sound an alarm

or send a signal back to our central monitoring panel.

Remember, when it comes to our network equipment

and data centers, our devices need to be cool,

at the right humidity and receive clean powers and input,

and then, they need to stay dry from flooding

in order to continue doing their operations

day after day without any interruption.