Network Management
Network management.
In this section of the course,
we're going to talk about network management.
Now network management is the process of administering
and managing computer networks.
When you perform network management,
you're involved with thought analysis,
performance management,
provisioning of networks and clients,
as well as maintaining the quality of service.
To properly manage your network,
you need to ensure that you have
the right network documentation in place,
which includes the physical network diagrams,
logical network diagrams, wiring diagrams,
site survey reports, audit and assessment reports,
and baseline configurations,
as these are some of the key
and foundational pieces of documentation necessary
to fully understand where your network resides
and how it operates.
Also, you need to look at various performance metrics
and sensor data from across your network and its devices.
For example, if you don't know that one of your routers
is experiencing high processor utilization,
you may not be able to predict an upcoming network failure
or prevent it from occurring.
As you manage your network, you're also going to be working
to monitor the data flowing across this network,
which means you need to be able
to use things like NetFlow data to better understand
where this data is flowing.
On a network device level,
you can also manage individual interfaces
by considering their status, statistics
and the errors they generate.
Finally, it becomes really important
to understand the environmental factors
that can affect your network.
After all, our network devices are not operating
in the middle of nowhere,
but instead they're located inside of our data centers
and telecommunication closets.
To keep them operating at peak performance,
our devices need to ensure they're experiencing
the right environmental conditions,
and that means we have to make sure
these devices have the proper power, space and cooling
within their operating environments.
So in this section, we're going to focus on domain three,
network operations, and we're going to talk
about objectives 3.1 and 3.2.
Objective 3.1 states that given a scenario,
you must use the appropriate statistics and sensors
to ensure network availability.
And objective 3.2 states that you must explain
the purpose of organizational documents and policies.
So let's get started with our coverage of network management
in this section of the course.
Common documentation.
In this lesson, we're going to cover the common documentation
that you're going to use in your enterprise networks.
This includes physical network diagrams,
logical network diagrams, wiring diagrams,
site survey reports, audit and assessment reports,
and baseline configurations.
First, we have physical network diagrams.
A physical network diagram is used to show
the actual physical arrangement of the components
that make up your network
including the cabling and the hardware.
Typically, these diagrams give you a bird's eye view
of the network in its physical space,
and it looks a lot like a floor plan,
but you may also see physical network diagrams
that show how things are going to be cabled
within an individual rack of a data center as well.
For example, if I have a physical network diagram
showing the floor plan of a small office,
I can notate on that floor plan
exactly where each IP-based CCTV is going to be installed.
Now, in this example,
you can see I have nine IP-based cameras
and how the cable is going to run back to a central point,
such as this network video recorder
that contains nine power over Ethernet ports
to run this camera system.
I could just as easily have another flow plan like this
showing where all my network jacks are going to be located
in an office and how the cables are being run back
to a patch panel, and from that patch panel
back to an edge switch that connects to all these devices.
Inside my data center, though, I'm usually more concerned
with how things are physically located
within one single rack.
And so I can create a rack diagram.
For example, here's a diagram showing a rack,
containing two storage area network controllers,
two firewalls, two switches,
three virtual machine host servers running ESXi,
a backup server, a modular smart array,
and a tape backup library.
From this diagram, you can clearly see where in this rack
each of those units is going to be located
and which network cables will connect to which ports
on which devices.
Another version of this type of diagram
may include a front view,
which also shows the location inside the cabinet.
And it can have the different device names
and the IP addresses for each of those devices,
but it won't show the actual network cables
and where they connect
because you're looking at the front of those devices.
Now, another type of physical network diagram we have
is used to provide documentation
for how our main distribution frame or MDF
and our intermediate distribution frame or IDF
are connected and cabled.
In this example, you could see a very generic MDF
and IDF layout for a typical three-story office building.
Here, we have the MDF on the bottom right corner
of the first floor,
and then a smaller IDF on the right corner
of each of the remaining floors.
There's an interconnection between each IDF and the MDF,
and each floor has a single network cable
running to a jack into an office.
Now, this of course is a very oversimplified diagram
or an overview diagram,
but we can also have additionally detailed diagrams
depending on how much work we want.
That could show each rack inside the MDF or the IDF
and what they would look like,
how they're cabled, including their edge switches,
patch panels, and other networking equipment.
The second type of documentation we need to cover
is logical network diagrams.
Unlike the physical network diagrams
that show exactly which port or cable
is going to connect it to and how it's ran
on the physical floor plan or the rack layout,
we use a logical diagram.
We're going to use this to illustrate the flow of data
across a network and it's going to be used to show
how devices are communicating with each other.
These logical diagrams will include things like the subnets,
the network objects and devices,
the routing protocols and domains, voice gateways,
traffic flow, and network segments within a given network.
Traditionally, network diagrams were drawn by hand
and using symbols to represent the different network devices
like routers, switches, firewalls,
intrusion detection systems, and clients.
In this example, I'm using the standard Cisco notation
to demonstrate how the various switches and routers
are being connected to form this network.
On the logical diagram, we also include the IP addresses
and the interface identifiers such as G0/1
or gigabit Ethernet 0/1 or ATM1/0,
which is for an ATM interface for our routers and switches.
Notice the routers are being represented by a circle
with four arrows, two pointing inward,
and two pointing outward.
Switches are going to be represented by a square
with four arrows, all pointing outward.
Servers like a DHCP, DNS, or TFTP server
are represented by a large rectangle server icon.
And the computers are going to be shown
using a computer icon.
Another symbol you may see included
is an intrusion detection system
or intrusion prevention system,
which is going to be a rectangle
that contains a circle inside of it
with two arrows crossing over the circle.
A firewall is usually represented by a brick wall
and an access point is going to be represented
by a rectangle with a series of radio waves going out of it
from the left to the right.
Now, as you look at various network diagrams
on the internet, you may come across some more
modern network diagrams that remove the symbols
and instead use pictures of networking equipment
that's going to be used in the diagrams instead.
In this example, you can see the router
connected to the switches and those switches are connected
to the client PCs.
Next, we have a wiring diagram
and this is something we already looked at briefly
as part of our physical network diagrams.
Wiring diagrams can occur with both physical
and logical network diagrams,
as long as they are clearly labeled,
which cable is connected to which port.
The more in-depth wiring diagrams
are going to include a floor plan or a rack diagram,
so you can see exactly where the cables are being run
in the physical environment.
Next, we have site survey reports.
These are often conducted as part of a wireless survey
or an assessment.
Now, a wireless site survey sometimes
called an RF or radio frequency site survey,
or a wireless survey
is the process of planning and designing a wireless network
to provide a wireless solution
that will deliver the required wireless coverage,
data rates, network capacity, roaming capability,
and quality of service or QoS.
In this example, you could see a floor plan that includes
the locations of each wireless access point being shown.
Then rating out from each access point,
you see bands of color, going from green to yellow
to orange to red.
And this indicates the strength of the wireless signal.
Now, when you see green, that's a strong signal.
When you see red, that's a weaker signal.
Wired site surveys are also conducted sometimes,
but in these cases
it's usually done as part of a preparation
for a major upgrade or installation.
With a wired site survey,
the installation team is going to come out
and look at your MDFs, your IDFs, and your data centers
to determine if you have the right power, space, and cooling
to support whatever new equipment
you're going to be installing as part of that upgrade.
For example, if I was going to install three new racks
of equipment in your data center,
I need to go out there and look at it
and make sure you have the physical space required
to hold those three racks.
In addition to that, I need to make sure
you have a powerful enough HVAC system
to remove all the extra heat that my new equipment
in these three racks is going to produce.
I also want to make sure your site has the right power
and backup generators and battery backups
that you can handle all the extra power
that's going to be drawn by all this new equipment.
Next, we have audit and assessment reports.
Audit and assessment reports
are delivered to your organization
after a formal assessment has been conducted.
These reports will contain an executive summary,
an overview of the assessment scope and objectives,
the assumptions and limitations of the assessment,
the methods and tools used during the assessment,
a diagram showing the current environment and systems,
the security requirements,
a summary of findings and recommendations,
and the results of the audit.
Essentially, this report is going to contain
all the issues the audit team found with your organization,
as well as anything your organization
is already doing right,
and things they should continue to keep doing.
Finally, we have baseline configurations.
The documented baseline configurations
are the most stable versions of a devices configurations.
These baseline configurations
are documented set of specifications
for information system or a configuration item
within that system, that has been formally reviewed
and agreed on at a given point in time,
and which can now only be changed
through change control procedures.
So, if you want to change the baseline
due to an operational need, you need to follow
the proper configuration management procedures
to request those changes.
Those changes will then be properly tested and approved,
and they become part of the new baseline
for those devices moving forward.
As you can see, there is a bunch of documentation
that you're going to use in your enterprise networks,
including your physical network diagrams,
logical network diagrams, wiring diagrams,
site survey reports, audit and assessment reports,
and baseline configurations.
(logo whirring)
Performance metrics.
In this lesson, we're going to talk
all about performance metrics and how they're used
to ensure network availability.
Now, network performance metrics are a large part
of network monitoring.
Network performance monitoring is the end-to-end
network monitoring of your end user experience.
This differs from traditional monitoring though,
because traditional monitoring is focused on performance
between two points, like a switch and a router,
but with network performance monitoring,
we're going to look at the overall end user experience
by monitoring the performance
from the end user's workstation to the final destination
that they're trying to reach.
So to help us monitor network performance,
there's really going to be three key metrics
that we're going to use.
These are latency, bandwidth and jitter.
The first metric is latency.
Now, latency is the measure of the time that it takes
for data to reach its destination across a network.
Usually we measure network latency as the round trip time
from a workstation to the distant end
and back to the workstation.
We report this time in milliseconds.
Now, for example, let's say you open up your command prompt
and you enter the command ping 8.8.8.8, and you hit enter.
You're going to get a response that tells you
how long it took for an ICNP packet to leave your computer
reach the Google DNS server located at eight dot 8.8.8.8
and return to your computer again.
In my case, this took an average time of 38.2 milliseconds
when I did it for four repetitive ping requests.
Now it's important to measure the round trip delay
for network latency because the computer that uses
a TCP IP network can send only a limited amount of data
to its destination at one time,
and then it sits and waits for an acknowledgement
that that data was received before it sends out
more data across the network.
So if you have high latency or a long round trip delay,
this can drastically slow down
your overall network performance for your end users.
Now, if you're seeing consistent delays
or even just spikes in the delay time in your network,
this could indicate a major performance issue
that's going to be occurring.
For regular web traffic these delays
aren't usually noticeable,
but if you're using streaming video applications,
things like voiceover IP, or you're playing video games,
these delays are extremely noticeable
and they can cause a lot of problems for your end users.
Our second metric media monitor is known as bandwidth.
Now bandwidth is the maximum rate of data transfer
across a given network.
Now, technically bandwidth is actually
a theoretical concept that measures how much data
could be transferred from a source to a destination
under ideal conditions.
But in reality, when we're talking about our networks
and our connections, they're rarely operating
at the perfect or ideal conditions.
Therefore we often measure something known as throughput
instead of bandwidth to monitor our network performance.
Throughput is the actual measure of data
as it's being successfully transferred
from the source to the destination,
but you'll often hear people use the terms
bandwidth and throughput interchangeably,
not realizing there is a difference.
Technically, bandwidth is a theoretical limit
where throughput is the reality of what you're achieving.
So if you want to a bandwidth speed test
or more accurately, a throughput test for your network,
you can go to something like speedtest.net and click on go,
and you'll have a series of downloads and uploads
that'll occur from your workstation
to their server and back.
Then it will report to you how fast or slow
your connection was in terms of throughput.
In this example, you can see my results indicate
I have a throughput with a top download speed
of 240 megabits per second,
and a top upload speed of around 241 megabits per second.
The problem with that is that my actual bandwidth,
my theoretical, limit should be 650 megabits per second
for downloads and 310 megabits per second for uploads.
So why is my throughput so much less?
Well, when I was doing this test,
I connected to my office network using my wifi adapter
and not directly connecting through a wired switch.
At the same time, there's other people
in the office using the connection,
and all of these factors lead to a less than ideal
environment and this makes my throughput much lower
than my expected bandwidth.
As I make different changes to my network,
I can retest the throughput to see if those changes
help or hurt my overall throughput.
For example, if I switched
from a wireless internet connection
to a wired internet connection,
I'll be able to see a dramatic increase
in overall throughput that I wouldn't see
over that wireless connection.
The third metric we need to monitor is known as jitter.
Jitter is the network condition that occurs
when a time delay in the sending of the data packets
over a network connection is occurring.
Now jitter is really a big problem
for any real-time applications that you may be supporting
on your network.
If you're doing things like video conferences
and voiceover IP, and virtual desktop infrastructure,
all of these are negatively affected by jitter.
Basically a jitter is simply a variation
in the delay of the packets.
And this can cause some really strange side effects,
especially for your voice and video calls.
If you've ever been in a video conference
and somebody starts speaking,
and then all of a sudden you hear their voice
start speeding up for about five or 10 seconds,
and then it returns back to normal speed,
that usually is because of jitter on their network.
If you have a good quality of service management in place,
you shouldn't experience a lot of jitter,
but if you're not doing QoS properly,
then jitter will occur.
You see, when your network suffers from congestion,
the network devices like your routers and switches
are going to be unable to send the equivalent amount of traffic
as what they're receiving.
This causes their packet buffers to start to fill up,
and eventually they'll start to drop packets
if they have too much in the buffers.
This is known as packet loss.
Now, when this happens, your TCP packets
are going to get resent, and this causes
increased network load again.
Now on the other hand, if the buffer begins to fill up,
but then the network congestion eases up
those buffers will be able to quickly send
all of their contents to the destination.
The destination will then try to process them all,
but usually it can't do that.
And this leads to delays in processing
that can result in jitter on the endpoint device as well.
So to prevent jitter, we want to ensure our network
is using quality of service properly.
We want to make sure we're categorizing
and prioritizing our voice and video traffic over
the other types of traffic.
Also, we need to verify our network connections
and our devices are large enough to support
the amount of data that we're trying to transfer.
As a network administrator,
it's your responsibility to always monitor
your network's performance and the three key metrics
you always should be keeping track of
are latency, bandwidth or throughput and jitter.
Sensors.
In this lesson,
we're going to talk about sensors
that help us monitor the performance of our network devices,
those devices like routers, switches, and firewalls.
Now, these sensors can be used to monitor
the device's temperature, its CPU usage, and its memory,
and these things can be key indicators
of whether a device is operating properly
or is about to suffer a catastrophic failure.
Our first sensor measurement we need to talk about
is the temperature of the device.
Now, most network devices like your routers,
switches, and firewalls
have the ability to report on the temperature
within their chasses.
Now, depending on the model,
there may be only one or two temperature readings
or on some larger enterprise devices,
you may have a temperature reading
on each and every controller, processor, interface card,
and thing like that inside the system.
Now, the temperature centers
can be used to measure the air temperature
inside the intake outlet
and the air temperature at the exhaust outlet at a minimum.
Now, for each of these sensors,
you can set up minor and major temperature thresholds.
A minor temperature threshold is used to set off an alarm
when a rising temperature is detected,
but it hasn't reached dangerous levels yet.
When this occurs,
a system message is displayed,
an SNMP notification is sent,
and an environmental alarm can be sounded.
Now, when you have a major temperature threshold,
this is going to be used to set off an alarm
when the temperature reaches dangerous conditions.
At this level, we want to still display those system messages,
get that SNMP notification,
and have the environmental alarm sounded.
But in addition to that,
the device can actually start to load shed
by turning off different functions to reduce the temperature
being generated by the device's processor.
For example, let's say you have a router
with multiple processing cards in it.
That device may shut down one of those processing cards
to prevent the entire system from overheating.
That's what I mean by load shedding.
Now, when a device runs at excessive temperatures
for too long,
the performance will decrease on that device
and the lifespan will decline on that device as well.
Over time, that device can even suffer
a catastrophic failure from overheating.
Our second sensor measurement we need to talk about
is CPU usage or utilization on the device.
At their core,
routers, switches, and firewalls
are just specialized computers.
When these devices are running under normal conditions,
their CPU or central processing unit
should have minimal utilization
somewhere in the range of 5 to 40%.
But if the devices begin to become extremely busy
or receive too many packets from its neighboring devices,
the CPU utilization can become over-utilized
and the percentage will increase.
Now, if the CPU utilization gets too high,
the device could become unable to process any more requests
and it'll start to drop packets
or the entire connection could fail.
Usually, when you see a high processor utilization rate,
this is an indication of a misconfigured network
or a network under attack.
If the network is misconfigured,
for example, let's say you have a switch
that's misconfigured,
you can end up having a broadcast storm that occurs,
and that's going to create
an excessive amount of broadcast traffic
that'll cause the switch's CPU to become utilized
as it tries to process all those requests.
Similarly, if you have a lot of complex
and intricate ACLs on your router,
and then people started sending a lot of inbound traffic,
that router has to go through all of those ACLs each time
for that traffic and that can make it become unresponsive
due to high CPU usage.
As an administrator,
you need to monitor the CPU utilization
in your network devices
to determine if they're operating properly,
if they're misconfigured,
or if they're under attack.
The third sensor measurement we use
is memory utilization for the device.
Similar to high CPU utilization,
high memory utilization can be indicative
of a larger problem in your network.
If your devices begin to use too much memory,
this can lead to system hangs, processor crashes,
and other undesirable behavior.
To help protect against this,
you should have minor, severe,
and critical memory threshold warnings
set up in your devices
and reporting back to your centralized monitoring dashboard
using SNMP.
As a baseline,
your never devices should operate
at around 40% memory utilization
under normal working conditions.
During busier times,
you may see this rise up to 60 to 70%,
and during peak times it may be up to 80%,
but if you're constantly seeing memory utilization
above 80%,
you may need to install a larger
or more powerful device for your network,
or you could be under an attack
for an excessive amount of time
that's causing excessive loading.
As you begin to operate your networks in the real world,
you're going to begin to see what normal looks like
for your particular network.
As you see temperatures rising
or CPU and memory utilizations increase,
this can trigger alarms in the network configuration
or a network performance issue is happening right now.
Then you need to investigate the root cause of that
and solve those issues
by bringing those metrics back to a normal level
within your baseline.
NetFlow data.
In this lesson, we're going to talk about NetFlow data
and how it's used to conduct traffic flow analysis
within our networks.
In order to best monitor traffic in our network,
we can either use full packet capture or NetFlow data.
Now, as you might've guessed,
packet captures can take up a lot of storage space
and they can grow quickly in size.
For example, if I'm conducting a full packet capture
on my home network each day,
I would need several gigabytes of storage
just for my small family,
because every single packet that goes in or out of my house
would be captured and logged.
Every video game my son is playing online,
every YouTube video he watches,
every Netflix show my wife is binging.
All of that will be captured bit-by-bit
inside of that full packet capture.
Now, a full packet capture, or FPC,
is going to capture the entire packet.
This includes the header and the payload
for all the traffic that's entering or leaving your network.
As I said, this would be a ton of information
and quickly eat up all of our storage.
Now, because full packet capture takes up so much space,
we often don't collect it in a lot of organizations.
Most businesses and organizations instead will use NetFlow.
Now, NetFlow data, and other similar protocols like that,
are used to conduct something known as flow analysis.
Flow analysis will rely on a flow collector
as a means of recording metadata
and statistics about network traffic
instead of recording each and every frame
or every single packet
that's going in or out of our network.
This allows us to use flow analysis tools
that provide network traffic statistics
sampled by the collector.
Now, by doing this, we can capture information
about the traffic flow instead of the data contained
within that data flow.
And this saves us a lot of storage space.
Now with NetFlow and flow analysis,
we're not going to have the contents of
what's going over the network like we would
with a full packet capture,
but we can still gather a lot of metadata
and information about the network traffic
that's helpful to us in our monitoring.
This information is stored inside a database
and it can be queried later by different tools
to produce different reports and graphs.
Now, the great thing about flow analysis is
it's going to allow us to highlight trends and patterns
in the traffic being generated by our network.
And this becomes really useful
in our network performance monitoring.
Flow analysis will allow us to get alerts
based on different anomalies we might see
and different patterns or triggers
that are outside of our expected baselines.
These tools also have a visualization component
that allows us to quickly create
a map of different network connections
and the associated flow patterns over those connections.
By identifying different traffic patterns
that might reveal bad behavior, malware and transit,
tunneling, or other bad things out there,
we're going to be able to quickly respond
to these potential problems or incidents.
Now, there are a few different tools we can use
when dealing with traffic flow analysis.
This includes things like, NetFlow, Zeek,
and the Multi Router Traffic Grapher.
Let's take a look at each of these for a moment.
First, we have NetFlow.
NetFlow is a Cisco develop means of reporting
network flow information to a structured database.
NetFlow is actually one of the first data flow analyzers
that was created out there, and eventually,
it became basically the standard that
everyone started to use under the term, IPFIX,
or IP Flow Information Export.
Now, NetFlow allows us to define a particular traffic flow
based on different packets that
share the same characteristics.
For example, if we want to identify packets
with the same source and destination IP,
this could signify there's a session between those two hosts
and it should be considered one data flow
that we can collect information on.
Now, when you look at NetFlow data,
you can capture information about the packets
that are going over these devices,
like the network protocol interface that's being used,
the version and type of IP being used,
the source and destination IP address,
the source and destination port, or the IP type of service.
All of this information can be gathered using NetFlow
and then analyzed and displayed visually
using our different tools.
For example, here you can see that I'm using SolarWinds
as a tool to show the NetFlow data of a network.
But you could also review this data
in a text-based environment using NetFlow exports themself.
In this graphical environment though,
it becomes really easy to see that
there are 15 different traffic flows.
And if I expand the 15th data flow,
we can see the source and destination IP,
the source port, the destination port,
some basic information about that data flow,
but we're not seeing the content of any of those packets
that were part of this data flow.
For us to be able to do that,
we would have to have a full packet capture,
but here we only captured the metadata
or the information about those traffic flows.
Now, if you want to be able to have the best of both worlds,
you can use something like Zeek.
Now, Zeek is a hybrid tool
that passively monitors your network like a sniffer,
but it's only going to log full packet captures
based on data of potential interest.
Essentially, Zeek is going to sample the data
going across the network, just like NetFlow does,
but when Zeek finds something that it deems interesting,
based on the parameters and rules you've configured,
it's going to log the entire packet for that part
and then send it over to our cybersecurity analyst
for further investigation.
This method helps us reduce our storage
and processing requirements,
and it gives us the ability to have all this data
in a single database.
Now, one of the great things about Zeek is that
it performs normalization of this data as well,
and then stores it as either a tab-delimited,
or JavaScript Object Notation, or JSON-formatted text file.
This allows you to use it with
lots of other different cybersecurity tools
and different network monitoring tools as well.
For example, now that I have this normalized data,
I can import that data into another tool for visualization,
searching and analysis.
Here, I've imported my Zeek logs into Splunk,
and then I can have my cybersecurity analyst
search for specific information during a potential incident.
Now, the third tool we have to talk about is
MRTG, or the Multi Router Traffic Grapher.
The Multi Router Traffic Grapher is a tool
that's used to create graphs to show network traffic flows
going through our network interfaces
on different routers and switches
and it does this by pulling these appliances,
using SNMP, the simple network management protocol.
So, what is useful about a visualization like this?
Well, you're going to be able to start seeing patterns emerge
that may be outside of your baseline.
For example, here in the top graph,
you could see a big spike in traffic
between 2:00 AM and 4:00 AM.
Is that normal?
Well, maybe, and maybe not,
but it's something we should further investigate and analyze
because we're seeing this big spike occur
between 2:00 AM and 4:00 AM.
And that might be something normal,
like doing offsite backups,
or it could be something malicious.
If it was the case of something that was normal,
like an offsite backup,
you're going to see this big spike in traffic
because we're sending a backup copy of
all of our data offsite to a cloud provider facility.
That might be a reasonable explanation.
And in that case, I wouldn't need to worry
because I would see that every single night
and I'd be used to seeing it.
Now, on the other hand,
maybe that server has been infected with malware
and every day at 2-4:00 AM,
it's going to send all of the data back to the bad actors
while all my administrators are at home sleeping.
This is considered data exfiltration
as part of an attack campaign,
that's something you want to be on the lookout for.
Now, just looking at this graphic,
I don't know which of these two cases it is.
Is this something normal, like a backup,
or is this something malicious?
But if you know your organization
and you know your baselines,
now you can look at this graph and identify
what should be investigated based on seeing
that spike between 2:00 AM and 4:00 AM,
and then figuring out where
that additional traffic flow is going, and why.
If we suspected something was malicious here,
like somebody exfiltrating our data,
then we might set up a network sniffer
in front of our file server and see
what traffic is leaving the network and where it's going.
Then, based on that, we can have an instant response
on our hands and do our cleanup.
Now at this point, we just don't know
if this is malicious or not,
but we do know it's something different
and something that is outside of the normal baseline
as indicated by that big spike.
So, it's important for us to investigate it
for the health of our network.
Interface Statistics.
In this lesson, we're going to talk about interface statistics
and how it's used to monitor our network's performance.
Now, if you're new to networking,
you may be wondering what exactly is an interface?
Well, an interface is just one of the physical
or logical switch ports on a router, switch or firewall.
In enterprise level devices,
each interface can generate its own statistics
and maintains its own status.
In this lesson, we're going to explore the link state,
the speed and duplex status,
the send receive traffic statistics,
the cyclic redundancy check statistics,
and the protocol packet and byte counts
that are collected for our network devices.
To help guide our discussions,
I'm going to be using the output from a Cisco router
for an interface called f0/0,
which simply means it's a fast Ethernet or Cat5 connection
going from this physical interface on slot zero
and port zero of a given router.
Now, first you can see, we have the Link State.
A link state is used to communicate whether or not
a given interface has a cable connected to it
and a valid protocol to use for communication.
For example, if I connected a fast Ethernet
unshielded twisted pair cable to the interface
on 0/0 of this router,
and then plug in the other end into another router
to create a connection,
I should see fast Ethernet 0/0 is up, line protocol is up.
This indicates that the interface is physically up
and the protocol is operational.
If we're using Ethernet, that means that frames
are able to be entering and leaving this interface.
Next, we have some information about the interface itself,
such as the MAC address and the IP address assigned to it.
After that, we see there's an MTU size set of 1500 bytes,
which is normally used by default in Ethernet.
And then we have the bandwidth
is being set at 100,000 kilobits per second,
which is 100 megabits per second.
This makes sense because I'm using fast Ethernet
or Cat5 cabling for our connection.
This speed is also used by the router
when it's trying to calculate the metrics
for the routing protocols like OSPF and EIGRP,
since they rely on the connection speed
in making their determinations and their link costs.
Next, we have the reliability,
which is being shown here as 255 out of 255.
This means if the connection begins to have more input
or output errors, you're going to see the reliability lower.
Basically, you read this as reliability equals
the number of packets divided by the total number of frames.
So, 255, over 255 is the best reliability,
and indicates there was no packets or frames
that have been dropped so far.
txload is our next statistic.
And this is going to indicate
how busy the router is transmitting frames
over this connection.
At one out of 255, this router is not very busy at all.
rxload is like txload, but instead of transmitting,
we're going to be measuring
how busy the router is in terms of receiving frames.
Next, we have the archetype being used.
In this case, we're using ARPA,
or the Advanced Research Projects Agency setting,
which indicates that we're using standard Ethernet.
This is because ARPA develops standard Ethernet,
and we're using that for Ethernet frames
for our encapsulation.
Now, if you're using something different,
like a serial link or a frame relay,
it would say something different here instead of ARPA.
But if you're using Ethernet,
you should expect to see ARPA right here.
Next, we have the keepalive,
and this is set to 10 seconds, which is the default.
This is how often the router
is going to send a keepalive packet
to other devices that it's connected to,
to check if they're still up and online.
Next, we have a line that says full-duplex,
100 megabits per second, 100BaseTX/FX.
Now, this indicates whether this interface is using half
or full-duplex, and in this case we're using full-duplex.
It also tells you what the bandwidth is,
and the interface type you're using.
In this case, as I said, we're using full-duplex
and we're using 100 megabits per second as our bandwidth,
and we have a fast Ethernet interface type,
and it's either using copper or fiber cabling,
because it says TX/FX.
Now, next we're going to have our ARP type.
And in this case, again, we're going to use ARPA.
The timeout here tells us how long each ARP cache
is going to remember a binding, and when it will be cleared.
In this case, we're using the default time of four hours.
The next two lines are the last input, last output,
and last clearing of the counters.
In this case, the router was just rebooted,
so they're all set to zero
because they were all just cleared.
Next, we have our input queue,
which tells us how many packets are in the input queue,
and their maximum size.
In this case, the maximum size is 75 packets for our queue.
Drops is the number of packets
that have been dropped so far.
Flushes is used to count the Selective Packet Discards
that have occurred basically when the router
or switch has a sign, it needs to start shedding some load,
and it starts dropping packets selectively.
SPD is a protocol that's going to drop
your lowest priority packets when the CPU becomes too busy,
so that way you can save capacity
for higher priority packets as a form of quality of service.
Now, the total output drops here is at zero.
This means that we've had no drops
because we never had a full output queue.
Since we have a hundred megabit per second connection,
as long as we're communicating
with another a hundred megabit per second connection,
we should see this stay at zero drop packets.
If we started using a 20 megabit per second connection,
for instance, on our ISP,
then we might likely have an experience here
with network congestion because we're sending at 100,
but they can only take it at 20.
That would cause a problem for us.
And at that point, some of our packets might get dropped.
Next, we have our queuing strategy
for our quality of service.
In this case, we're sending this as First In, First Out,
which is known as FIFO.
This is the default for this type of router.
Next, we have output queue size and the maximum.
Currently, our queue is empty and it's showing zero packets.
Now, the maximum queue size here is set at 40.
So, if I receive more than 40 packets,
the queue is not going to be able to hold it
and the rest of those will get dropped.
Next, We have our minute input and output rates.
Now, here are the average rates
at which packets are being received and being transmitted.
Packet input is our next line.
And here we can see 923 packet inputs was received
for a total of 158,866 bytes of data being processed.
The next line contains the receive broadcast.
And in this case, we received 860 broadcast frames.
We also have runts,
giants and throttles counted here as well.
Now, a runt is an Ethernet frame
that is less than 64 bytes in size.
It's really small, that's why it's a runt.
A giant is any Ethernet frame
that exceeds the 802.3 frame size of 1,518 bytes.
It's really large, so it's a giant.
Throttles are going to occur
when the interface fails to buffer the incoming packets.
If this is a high number, this is an indicator
that you may be having quality of service issues
to your end users.
Next, we have input errors,
CRC, frame, overrun, and ignored.
The input error counter will go up whenever the interface
is receiving a frame with any kind of error in it.
This can be something like a runt, a giant,
no buffer available, CRC errors, or other things like that.
CRC is the number of packets that were received,
but failed the cyclic redundancy checksum,
or CRC check upon receiving them.
If the checksum generated by the sender
doesn't match the one calculated by this interface
when it receives that frame,
a CRC error is counted and the packet gets rejected.
Now, frame is used to count the number of packets,
where a CRC error
and a non-integer number of octet was received.
Overrun is used account
how often the interface was unable to receive traffic
due to an insufficient hardware buffer.
Ignored is going to be used to count the number of packets
that the interface ignored since the hardware interface
was low on the internal buffers.
If you're experiencing a lot of noise on the connection
or a broadcast storm,
this ignore count will start to rise drastically.
Next, we have the watchdog counter, which is used to count
how many times the watchdog timer has expired.
This happens whenever a packet over 2048 bytes is received.
The next line contains the input packets
with dribble condition detected,
which means that a slightly longer than default frame
was received by the interface.
For example, we talked about the fact that the MTU size
was 1500 bytes by default,
but a frame wasn't considered a giant
until it reached 1,518 bytes.
So, if I got a frame that was 1,510 bytes inside,
it's technically above the MTU size,
but it's not yet a giant.
So it would still be processed,
but it would be added here on the dribble condition counter,
so I can know that I'm starting to get packets
above 1500 bytes.
Next, we have the packet output counter,
and this is the number of packets that have been sent
and the size of those transmissions in bytes.
The underrun is the number of times a sender
has operated faster than the router can handle,
and this causes buffers or drop packets.
Next, we have the output errors,
and this is just like our input errors, the only difference
is we're now counting the number of collisions
and the interface resets that are occurring as a result.
A collision is counted
anytime a packet needs to be retransmitted
because an Ethernet collision occurred.
Since we're using full-duplex, this number should be zero.
If it's not zero, something's wrong.
Next, we have the interface reset,
and this counts the number of times an interface
had to be completely reset since the last reboot.
Next, we have unknown protocol drops.
Anytime a protocol drops,
but our device can't determine what protocol it was,
it's going to be listed under the unknown protocol drops.
For example,
if you're not supposed to receive older types of protocols
like IPX traffic and AppleTalk on your router,
but somebody sends you a message that's formatted that way,
your router is going to drop it,
and it's not going to know what it was,
because it's not a properly format IP message
or an Ethernet frame.
So that counter is going to go up.
Next, we have babbles, late collision, and deferred.
Now, a babble is used to count any frame
that is transmitted, that is larger than 1,518 bytes.
This is similar to our giants,
but we're going to use this when we're transmitting,
instead of receiving.
A babble is for transmission, a giant is for receive.
Late collisions are going to be used
to count the number of collisions that occur
after the interface started transmitting its frame.
And deferred is used to count the number of frames
that were transmitted successfully
after waiting because the media was busy.
So, if your devices are using CSMA/CD
or collision detection, it's going to detect the media as busy,
it's going to wait, and then it's going to transmit.
When this happens,
this number is going to go up because it had to wait.
Again, we should see zero for late collisions
and deferred here
because we're using a full-duplex connection,
but if we're using a half-duplex connection,
there will be some numbers there.
Next, we have the loss carrier and the no carrier.
This is the number of times that the carrier was lost
or not present during the transmission.
The carrier we're talking about here
is the signal on the connection.
Finally, we have the output buffer failures and swapped out.
The Output Buffer Failure is going to be used
to count the number of times the packet was not output
from the output hold queue
because of a shortage of shared memory.
An Output Buffer Swap Out
is going to be the number of packets stored in the main memory
when the queue was full.
If this number is very high,
that means that you're likely experiencing
a busy time in your networks.
Now, for the exam, you don't need to know all these things
and memorize all their definitions,
but you should be aware of some key statistics here
on the interface.
Things like the link state, the speed and duplex status,
the send and receive traffic statistics,
the cyclic redundancy check statistics,
the protocol packet and byte counts,
the CRC errors, the giants, the runts,
and the encapsulation errors.
On the exam, you may get a question
that involves troubleshooting a device,
and you're going to see
an interface statistics screen like this,
and then you're going to have to recommend a solution
to that problem.
For example, if the question asks,
why the device is operating slowly,
and you see the connection is set to half-duplex
instead of full-duplex,
that would be a reason for the slowdown,
because you effectively cut your bandwidth in half,
because there's the listen before transmitting.
Or, if you see a large amount of collisions,
but you're running full-duplex,
that would indicate there's two devices
connected to this same switch port,
and that is causing you issues.
Or, maybe you see there's a lot of CRC errors,
this could indicate a dirty fiber connector,
or unshielded twisted pair cable
that's subject to too much electromagnetic interference.
This could be caused by lots of different things,
such as your cable being improperly run over
a florescent light or near a power line,
or something like that.
My point is,
it's important to be able to read the interface statistics
so you can then troubleshoot
your network connectivity issues
in your routers and switches.
Environmental sensors, in this lesson,
we're going to talk about environmental sensors
that help us monitor our physical environments
where our network devices are operating,
such as our data centers, our telecommunication closets,
and our main distribution frames.
These sensors are going to be used to monitor
our environmental conditions.
Things like our temperature and humidity,
as well as the electrical power status and whether or not,
we may be experiencing flooding.
After all, all of these routers and switches
are sitting in a telecommunication closet somewhere,
and nobody's sitting in there with them
looking at them every day.
So, how am I going to keep track of all of them?
How do I know the power is still on?
How do I know there's enough cooling there?
How do I know they haven't gotten covered in water
from a leaking pipe?
Well, this is where environmental monitoring
becomes extremely important.
Environmental monitoring relies on different types
of sensors that can be configured
to report back periodically,
or can be pulled from a central monitoring station
repeatedly, to maintain the status of those areas.
Our network devices need to operate in a cool and dry place.
To maintain the proper temperature and humidity,
we can have sensors that communicate with our HVAC system.
If the temperature begins to get too hot,
the HVAC system can increase the airflow
and cool the telecommunication closets more.
If the area gets too cold, it can reduce the airflow
and bring the temperature back to the right range.
Most network devices want to be operating
between 50 and 90 degrees Fahrenheit.
So, using an automated HVAC system with sensors,
can help ensure that occurs.
Additionally, we need to ensure this area
maintains the right humidity levels.
If there's too much humidity,
this can cause condensation in the equipment,
and that leads to water on our circuit boards,
which will destroy our network devices.
Conversely, if we have humidity that's too low,
static electricity can build up
and it can short out our equipment.
Therefore, we always want to make sure our humidity range
is between 40 and 60%.
Again, by having proper humidity sensors
connected to our HVAC systems,
we can increase or decrease the humidity
to keep it in that perfect 40 to 60% range.
Next, we need to ensure all our devices have power.
We can install sensors on our power lines,
or use our power distribution centers
to track the power levels
going into our pieces of networking equipment.
This allows to know if there's a surge, a spike,
a brown out or a blackout, or simply dirty power.
All of this can be remotely monitored
by our central monitoring systems
by using internet of things devices like power sensors.
Finally, we need to ensure devices
are not subject to flooding.
Again, we can place sensors in our telecommunication closets
and other non-human occupied spaces,
to detect if there's any water on the floor
due to a burst pipe or other sources of flooding.
These sensors can detect the change from dry to wet,
and when they become wet, they sound an alarm
or send a signal back to our central monitoring panel.
Remember, when it comes to our network equipment
and data centers, our devices need to be cool,
at the right humidity and receive clean powers and input,
and then, they need to stay dry from flooding
in order to continue doing their operations
day after day without any interruption.
Network management.
In this section of the course,
we're going to talk about network management.
Now network management is the process of administering
and managing computer networks.
When you perform network management,
you're involved with thought analysis,
performance management,
provisioning of networks and clients,
as well as maintaining the quality of service.
To properly manage your network,
you need to ensure that you have
the right network documentation in place,
which includes the physical network diagrams,
logical network diagrams, wiring diagrams,
site survey reports, audit and assessment reports,
and baseline configurations,
as these are some of the key
and foundational pieces of documentation necessary
to fully understand where your network resides
and how it operates.
Also, you need to look at various performance metrics
and sensor data from across your network and its devices.
For example, if you don't know that one of your routers
is experiencing high processor utilization,
you may not be able to predict an upcoming network failure
or prevent it from occurring.
As you manage your network, you're also going to be working
to monitor the data flowing across this network,
which means you need to be able
to use things like NetFlow data to better understand
where this data is flowing.
On a network device level,
you can also manage individual interfaces
by considering their status, statistics
and the errors they generate.
Finally, it becomes really important
to understand the environmental factors
that can affect your network.
After all, our network devices are not operating
in the middle of nowhere,
but instead they're located inside of our data centers
and telecommunication closets.
To keep them operating at peak performance,
our devices need to ensure they're experiencing
the right environmental conditions,
and that means we have to make sure
these devices have the proper power, space and cooling
within their operating environments.
So in this section, we're going to focus on domain three,
network operations, and we're going to talk
about objectives 3.1 and 3.2.
Objective 3.1 states that given a scenario,
you must use the appropriate statistics and sensors
to ensure network availability.
And objective 3.2 states that you must explain
the purpose of organizational documents and policies.
So let's get started with our coverage of network management
in this section of the course.
Common documentation.
In this lesson, we're going to cover the common documentation
that you're going to use in your enterprise networks.
This includes physical network diagrams,
logical network diagrams, wiring diagrams,
site survey reports, audit and assessment reports,
and baseline configurations.
First, we have physical network diagrams.
A physical network diagram is used to show
the actual physical arrangement of the components
that make up your network
including the cabling and the hardware.
Typically, these diagrams give you a bird's eye view
of the network in its physical space,
and it looks a lot like a floor plan,
but you may also see physical network diagrams
that show how things are going to be cabled
within an individual rack of a data center as well.
For example, if I have a physical network diagram
showing the floor plan of a small office,
I can notate on that floor plan
exactly where each IP-based CCTV is going to be installed.
Now, in this example,
you can see I have nine IP-based cameras
and how the cable is going to run back to a central point,
such as this network video recorder
that contains nine power over Ethernet ports
to run this camera system.
I could just as easily have another flow plan like this
showing where all my network jacks are going to be located
in an office and how the cables are being run back
to a patch panel, and from that patch panel
back to an edge switch that connects to all these devices.
Inside my data center, though, I'm usually more concerned
with how things are physically located
within one single rack.
And so I can create a rack diagram.
For example, here's a diagram showing a rack,
containing two storage area network controllers,
two firewalls, two switches,
three virtual machine host servers running ESXi,
a backup server, a modular smart array,
and a tape backup library.
From this diagram, you can clearly see where in this rack
each of those units is going to be located
and which network cables will connect to which ports
on which devices.
Another version of this type of diagram
may include a front view,
which also shows the location inside the cabinet.
And it can have the different device names
and the IP addresses for each of those devices,
but it won't show the actual network cables
and where they connect
because you're looking at the front of those devices.
Now, another type of physical network diagram we have
is used to provide documentation
for how our main distribution frame or MDF
and our intermediate distribution frame or IDF
are connected and cabled.
In this example, you could see a very generic MDF
and IDF layout for a typical three-story office building.
Here, we have the MDF on the bottom right corner
of the first floor,
and then a smaller IDF on the right corner
of each of the remaining floors.
There's an interconnection between each IDF and the MDF,
and each floor has a single network cable
running to a jack into an office.
Now, this of course is a very oversimplified diagram
or an overview diagram,
but we can also have additionally detailed diagrams
depending on how much work we want.
That could show each rack inside the MDF or the IDF
and what they would look like,
how they're cabled, including their edge switches,
patch panels, and other networking equipment.
The second type of documentation we need to cover
is logical network diagrams.
Unlike the physical network diagrams
that show exactly which port or cable
is going to connect it to and how it's ran
on the physical floor plan or the rack layout,
we use a logical diagram.
We're going to use this to illustrate the flow of data
across a network and it's going to be used to show
how devices are communicating with each other.
These logical diagrams will include things like the subnets,
the network objects and devices,
the routing protocols and domains, voice gateways,
traffic flow, and network segments within a given network.
Traditionally, network diagrams were drawn by hand
and using symbols to represent the different network devices
like routers, switches, firewalls,
intrusion detection systems, and clients.
In this example, I'm using the standard Cisco notation
to demonstrate how the various switches and routers
are being connected to form this network.
On the logical diagram, we also include the IP addresses
and the interface identifiers such as G0/1
or gigabit Ethernet 0/1 or ATM1/0,
which is for an ATM interface for our routers and switches.
Notice the routers are being represented by a circle
with four arrows, two pointing inward,
and two pointing outward.
Switches are going to be represented by a square
with four arrows, all pointing outward.
Servers like a DHCP, DNS, or TFTP server
are represented by a large rectangle server icon.
And the computers are going to be shown
using a computer icon.
Another symbol you may see included
is an intrusion detection system
or intrusion prevention system,
which is going to be a rectangle
that contains a circle inside of it
with two arrows crossing over the circle.
A firewall is usually represented by a brick wall
and an access point is going to be represented
by a rectangle with a series of radio waves going out of it
from the left to the right.
Now, as you look at various network diagrams
on the internet, you may come across some more
modern network diagrams that remove the symbols
and instead use pictures of networking equipment
that's going to be used in the diagrams instead.
In this example, you can see the router
connected to the switches and those switches are connected
to the client PCs.
Next, we have a wiring diagram
and this is something we already looked at briefly
as part of our physical network diagrams.
Wiring diagrams can occur with both physical
and logical network diagrams,
as long as they are clearly labeled,
which cable is connected to which port.
The more in-depth wiring diagrams
are going to include a floor plan or a rack diagram,
so you can see exactly where the cables are being run
in the physical environment.
Next, we have site survey reports.
These are often conducted as part of a wireless survey
or an assessment.
Now, a wireless site survey sometimes
called an RF or radio frequency site survey,
or a wireless survey
is the process of planning and designing a wireless network
to provide a wireless solution
that will deliver the required wireless coverage,
data rates, network capacity, roaming capability,
and quality of service or QoS.
In this example, you could see a floor plan that includes
the locations of each wireless access point being shown.
Then rating out from each access point,
you see bands of color, going from green to yellow
to orange to red.
And this indicates the strength of the wireless signal.
Now, when you see green, that's a strong signal.
When you see red, that's a weaker signal.
Wired site surveys are also conducted sometimes,
but in these cases
it's usually done as part of a preparation
for a major upgrade or installation.
With a wired site survey,
the installation team is going to come out
and look at your MDFs, your IDFs, and your data centers
to determine if you have the right power, space, and cooling
to support whatever new equipment
you're going to be installing as part of that upgrade.
For example, if I was going to install three new racks
of equipment in your data center,
I need to go out there and look at it
and make sure you have the physical space required
to hold those three racks.
In addition to that, I need to make sure
you have a powerful enough HVAC system
to remove all the extra heat that my new equipment
in these three racks is going to produce.
I also want to make sure your site has the right power
and backup generators and battery backups
that you can handle all the extra power
that's going to be drawn by all this new equipment.
Next, we have audit and assessment reports.
Audit and assessment reports
are delivered to your organization
after a formal assessment has been conducted.
These reports will contain an executive summary,
an overview of the assessment scope and objectives,
the assumptions and limitations of the assessment,
the methods and tools used during the assessment,
a diagram showing the current environment and systems,
the security requirements,
a summary of findings and recommendations,
and the results of the audit.
Essentially, this report is going to contain
all the issues the audit team found with your organization,
as well as anything your organization
is already doing right,
and things they should continue to keep doing.
Finally, we have baseline configurations.
The documented baseline configurations
are the most stable versions of a devices configurations.
These baseline configurations
are documented set of specifications
for information system or a configuration item
within that system, that has been formally reviewed
and agreed on at a given point in time,
and which can now only be changed
through change control procedures.
So, if you want to change the baseline
due to an operational need, you need to follow
the proper configuration management procedures
to request those changes.
Those changes will then be properly tested and approved,
and they become part of the new baseline
for those devices moving forward.
As you can see, there is a bunch of documentation
that you're going to use in your enterprise networks,
including your physical network diagrams,
logical network diagrams, wiring diagrams,
site survey reports, audit and assessment reports,
and baseline configurations.
(logo whirring)
Performance metrics.
In this lesson, we're going to talk
all about performance metrics and how they're used
to ensure network availability.
Now, network performance metrics are a large part
of network monitoring.
Network performance monitoring is the end-to-end
network monitoring of your end user experience.
This differs from traditional monitoring though,
because traditional monitoring is focused on performance
between two points, like a switch and a router,
but with network performance monitoring,
we're going to look at the overall end user experience
by monitoring the performance
from the end user's workstation to the final destination
that they're trying to reach.
So to help us monitor network performance,
there's really going to be three key metrics
that we're going to use.
These are latency, bandwidth and jitter.
The first metric is latency.
Now, latency is the measure of the time that it takes
for data to reach its destination across a network.
Usually we measure network latency as the round trip time
from a workstation to the distant end
and back to the workstation.
We report this time in milliseconds.
Now, for example, let's say you open up your command prompt
and you enter the command ping 8.8.8.8, and you hit enter.
You're going to get a response that tells you
how long it took for an ICNP packet to leave your computer
reach the Google DNS server located at eight dot 8.8.8.8
and return to your computer again.
In my case, this took an average time of 38.2 milliseconds
when I did it for four repetitive ping requests.
Now it's important to measure the round trip delay
for network latency because the computer that uses
a TCP IP network can send only a limited amount of data
to its destination at one time,
and then it sits and waits for an acknowledgement
that that data was received before it sends out
more data across the network.
So if you have high latency or a long round trip delay,
this can drastically slow down
your overall network performance for your end users.
Now, if you're seeing consistent delays
or even just spikes in the delay time in your network,
this could indicate a major performance issue
that's going to be occurring.
For regular web traffic these delays
aren't usually noticeable,
but if you're using streaming video applications,
things like voiceover IP, or you're playing video games,
these delays are extremely noticeable
and they can cause a lot of problems for your end users.
Our second metric media monitor is known as bandwidth.
Now bandwidth is the maximum rate of data transfer
across a given network.
Now, technically bandwidth is actually
a theoretical concept that measures how much data
could be transferred from a source to a destination
under ideal conditions.
But in reality, when we're talking about our networks
and our connections, they're rarely operating
at the perfect or ideal conditions.
Therefore we often measure something known as throughput
instead of bandwidth to monitor our network performance.
Throughput is the actual measure of data
as it's being successfully transferred
from the source to the destination,
but you'll often hear people use the terms
bandwidth and throughput interchangeably,
not realizing there is a difference.
Technically, bandwidth is a theoretical limit
where throughput is the reality of what you're achieving.
So if you want to a bandwidth speed test
or more accurately, a throughput test for your network,
you can go to something like speedtest.net and click on go,
and you'll have a series of downloads and uploads
that'll occur from your workstation
to their server and back.
Then it will report to you how fast or slow
your connection was in terms of throughput.
In this example, you can see my results indicate
I have a throughput with a top download speed
of 240 megabits per second,
and a top upload speed of around 241 megabits per second.
The problem with that is that my actual bandwidth,
my theoretical, limit should be 650 megabits per second
for downloads and 310 megabits per second for uploads.
So why is my throughput so much less?
Well, when I was doing this test,
I connected to my office network using my wifi adapter
and not directly connecting through a wired switch.
At the same time, there's other people
in the office using the connection,
and all of these factors lead to a less than ideal
environment and this makes my throughput much lower
than my expected bandwidth.
As I make different changes to my network,
I can retest the throughput to see if those changes
help or hurt my overall throughput.
For example, if I switched
from a wireless internet connection
to a wired internet connection,
I'll be able to see a dramatic increase
in overall throughput that I wouldn't see
over that wireless connection.
The third metric we need to monitor is known as jitter.
Jitter is the network condition that occurs
when a time delay in the sending of the data packets
over a network connection is occurring.
Now jitter is really a big problem
for any real-time applications that you may be supporting
on your network.
If you're doing things like video conferences
and voiceover IP, and virtual desktop infrastructure,
all of these are negatively affected by jitter.
Basically a jitter is simply a variation
in the delay of the packets.
And this can cause some really strange side effects,
especially for your voice and video calls.
If you've ever been in a video conference
and somebody starts speaking,
and then all of a sudden you hear their voice
start speeding up for about five or 10 seconds,
and then it returns back to normal speed,
that usually is because of jitter on their network.
If you have a good quality of service management in place,
you shouldn't experience a lot of jitter,
but if you're not doing QoS properly,
then jitter will occur.
You see, when your network suffers from congestion,
the network devices like your routers and switches
are going to be unable to send the equivalent amount of traffic
as what they're receiving.
This causes their packet buffers to start to fill up,
and eventually they'll start to drop packets
if they have too much in the buffers.
This is known as packet loss.
Now, when this happens, your TCP packets
are going to get resent, and this causes
increased network load again.
Now on the other hand, if the buffer begins to fill up,
but then the network congestion eases up
those buffers will be able to quickly send
all of their contents to the destination.
The destination will then try to process them all,
but usually it can't do that.
And this leads to delays in processing
that can result in jitter on the endpoint device as well.
So to prevent jitter, we want to ensure our network
is using quality of service properly.
We want to make sure we're categorizing
and prioritizing our voice and video traffic over
the other types of traffic.
Also, we need to verify our network connections
and our devices are large enough to support
the amount of data that we're trying to transfer.
As a network administrator,
it's your responsibility to always monitor
your network's performance and the three key metrics
you always should be keeping track of
are latency, bandwidth or throughput and jitter.
Sensors.
In this lesson,
we're going to talk about sensors
that help us monitor the performance of our network devices,
those devices like routers, switches, and firewalls.
Now, these sensors can be used to monitor
the device's temperature, its CPU usage, and its memory,
and these things can be key indicators
of whether a device is operating properly
or is about to suffer a catastrophic failure.
Our first sensor measurement we need to talk about
is the temperature of the device.
Now, most network devices like your routers,
switches, and firewalls
have the ability to report on the temperature
within their chasses.
Now, depending on the model,
there may be only one or two temperature readings
or on some larger enterprise devices,
you may have a temperature reading
on each and every controller, processor, interface card,
and thing like that inside the system.
Now, the temperature centers
can be used to measure the air temperature
inside the intake outlet
and the air temperature at the exhaust outlet at a minimum.
Now, for each of these sensors,
you can set up minor and major temperature thresholds.
A minor temperature threshold is used to set off an alarm
when a rising temperature is detected,
but it hasn't reached dangerous levels yet.
When this occurs,
a system message is displayed,
an SNMP notification is sent,
and an environmental alarm can be sounded.
Now, when you have a major temperature threshold,
this is going to be used to set off an alarm
when the temperature reaches dangerous conditions.
At this level, we want to still display those system messages,
get that SNMP notification,
and have the environmental alarm sounded.
But in addition to that,
the device can actually start to load shed
by turning off different functions to reduce the temperature
being generated by the device's processor.
For example, let's say you have a router
with multiple processing cards in it.
That device may shut down one of those processing cards
to prevent the entire system from overheating.
That's what I mean by load shedding.
Now, when a device runs at excessive temperatures
for too long,
the performance will decrease on that device
and the lifespan will decline on that device as well.
Over time, that device can even suffer
a catastrophic failure from overheating.
Our second sensor measurement we need to talk about
is CPU usage or utilization on the device.
At their core,
routers, switches, and firewalls
are just specialized computers.
When these devices are running under normal conditions,
their CPU or central processing unit
should have minimal utilization
somewhere in the range of 5 to 40%.
But if the devices begin to become extremely busy
or receive too many packets from its neighboring devices,
the CPU utilization can become over-utilized
and the percentage will increase.
Now, if the CPU utilization gets too high,
the device could become unable to process any more requests
and it'll start to drop packets
or the entire connection could fail.
Usually, when you see a high processor utilization rate,
this is an indication of a misconfigured network
or a network under attack.
If the network is misconfigured,
for example, let's say you have a switch
that's misconfigured,
you can end up having a broadcast storm that occurs,
and that's going to create
an excessive amount of broadcast traffic
that'll cause the switch's CPU to become utilized
as it tries to process all those requests.
Similarly, if you have a lot of complex
and intricate ACLs on your router,
and then people started sending a lot of inbound traffic,
that router has to go through all of those ACLs each time
for that traffic and that can make it become unresponsive
due to high CPU usage.
As an administrator,
you need to monitor the CPU utilization
in your network devices
to determine if they're operating properly,
if they're misconfigured,
or if they're under attack.
The third sensor measurement we use
is memory utilization for the device.
Similar to high CPU utilization,
high memory utilization can be indicative
of a larger problem in your network.
If your devices begin to use too much memory,
this can lead to system hangs, processor crashes,
and other undesirable behavior.
To help protect against this,
you should have minor, severe,
and critical memory threshold warnings
set up in your devices
and reporting back to your centralized monitoring dashboard
using SNMP.
As a baseline,
your never devices should operate
at around 40% memory utilization
under normal working conditions.
During busier times,
you may see this rise up to 60 to 70%,
and during peak times it may be up to 80%,
but if you're constantly seeing memory utilization
above 80%,
you may need to install a larger
or more powerful device for your network,
or you could be under an attack
for an excessive amount of time
that's causing excessive loading.
As you begin to operate your networks in the real world,
you're going to begin to see what normal looks like
for your particular network.
As you see temperatures rising
or CPU and memory utilizations increase,
this can trigger alarms in the network configuration
or a network performance issue is happening right now.
Then you need to investigate the root cause of that
and solve those issues
by bringing those metrics back to a normal level
within your baseline.
NetFlow data.
In this lesson, we're going to talk about NetFlow data
and how it's used to conduct traffic flow analysis
within our networks.
In order to best monitor traffic in our network,
we can either use full packet capture or NetFlow data.
Now, as you might've guessed,
packet captures can take up a lot of storage space
and they can grow quickly in size.
For example, if I'm conducting a full packet capture
on my home network each day,
I would need several gigabytes of storage
just for my small family,
because every single packet that goes in or out of my house
would be captured and logged.
Every video game my son is playing online,
every YouTube video he watches,
every Netflix show my wife is binging.
All of that will be captured bit-by-bit
inside of that full packet capture.
Now, a full packet capture, or FPC,
is going to capture the entire packet.
This includes the header and the payload
for all the traffic that's entering or leaving your network.
As I said, this would be a ton of information
and quickly eat up all of our storage.
Now, because full packet capture takes up so much space,
we often don't collect it in a lot of organizations.
Most businesses and organizations instead will use NetFlow.
Now, NetFlow data, and other similar protocols like that,
are used to conduct something known as flow analysis.
Flow analysis will rely on a flow collector
as a means of recording metadata
and statistics about network traffic
instead of recording each and every frame
or every single packet
that's going in or out of our network.
This allows us to use flow analysis tools
that provide network traffic statistics
sampled by the collector.
Now, by doing this, we can capture information
about the traffic flow instead of the data contained
within that data flow.
And this saves us a lot of storage space.
Now with NetFlow and flow analysis,
we're not going to have the contents of
what's going over the network like we would
with a full packet capture,
but we can still gather a lot of metadata
and information about the network traffic
that's helpful to us in our monitoring.
This information is stored inside a database
and it can be queried later by different tools
to produce different reports and graphs.
Now, the great thing about flow analysis is
it's going to allow us to highlight trends and patterns
in the traffic being generated by our network.
And this becomes really useful
in our network performance monitoring.
Flow analysis will allow us to get alerts
based on different anomalies we might see
and different patterns or triggers
that are outside of our expected baselines.
These tools also have a visualization component
that allows us to quickly create
a map of different network connections
and the associated flow patterns over those connections.
By identifying different traffic patterns
that might reveal bad behavior, malware and transit,
tunneling, or other bad things out there,
we're going to be able to quickly respond
to these potential problems or incidents.
Now, there are a few different tools we can use
when dealing with traffic flow analysis.
This includes things like, NetFlow, Zeek,
and the Multi Router Traffic Grapher.
Let's take a look at each of these for a moment.
First, we have NetFlow.
NetFlow is a Cisco develop means of reporting
network flow information to a structured database.
NetFlow is actually one of the first data flow analyzers
that was created out there, and eventually,
it became basically the standard that
everyone started to use under the term, IPFIX,
or IP Flow Information Export.
Now, NetFlow allows us to define a particular traffic flow
based on different packets that
share the same characteristics.
For example, if we want to identify packets
with the same source and destination IP,
this could signify there's a session between those two hosts
and it should be considered one data flow
that we can collect information on.
Now, when you look at NetFlow data,
you can capture information about the packets
that are going over these devices,
like the network protocol interface that's being used,
the version and type of IP being used,
the source and destination IP address,
the source and destination port, or the IP type of service.
All of this information can be gathered using NetFlow
and then analyzed and displayed visually
using our different tools.
For example, here you can see that I'm using SolarWinds
as a tool to show the NetFlow data of a network.
But you could also review this data
in a text-based environment using NetFlow exports themself.
In this graphical environment though,
it becomes really easy to see that
there are 15 different traffic flows.
And if I expand the 15th data flow,
we can see the source and destination IP,
the source port, the destination port,
some basic information about that data flow,
but we're not seeing the content of any of those packets
that were part of this data flow.
For us to be able to do that,
we would have to have a full packet capture,
but here we only captured the metadata
or the information about those traffic flows.
Now, if you want to be able to have the best of both worlds,
you can use something like Zeek.
Now, Zeek is a hybrid tool
that passively monitors your network like a sniffer,
but it's only going to log full packet captures
based on data of potential interest.
Essentially, Zeek is going to sample the data
going across the network, just like NetFlow does,
but when Zeek finds something that it deems interesting,
based on the parameters and rules you've configured,
it's going to log the entire packet for that part
and then send it over to our cybersecurity analyst
for further investigation.
This method helps us reduce our storage
and processing requirements,
and it gives us the ability to have all this data
in a single database.
Now, one of the great things about Zeek is that
it performs normalization of this data as well,
and then stores it as either a tab-delimited,
or JavaScript Object Notation, or JSON-formatted text file.
This allows you to use it with
lots of other different cybersecurity tools
and different network monitoring tools as well.
For example, now that I have this normalized data,
I can import that data into another tool for visualization,
searching and analysis.
Here, I've imported my Zeek logs into Splunk,
and then I can have my cybersecurity analyst
search for specific information during a potential incident.
Now, the third tool we have to talk about is
MRTG, or the Multi Router Traffic Grapher.
The Multi Router Traffic Grapher is a tool
that's used to create graphs to show network traffic flows
going through our network interfaces
on different routers and switches
and it does this by pulling these appliances,
using SNMP, the simple network management protocol.
So, what is useful about a visualization like this?
Well, you're going to be able to start seeing patterns emerge
that may be outside of your baseline.
For example, here in the top graph,
you could see a big spike in traffic
between 2:00 AM and 4:00 AM.
Is that normal?
Well, maybe, and maybe not,
but it's something we should further investigate and analyze
because we're seeing this big spike occur
between 2:00 AM and 4:00 AM.
And that might be something normal,
like doing offsite backups,
or it could be something malicious.
If it was the case of something that was normal,
like an offsite backup,
you're going to see this big spike in traffic
because we're sending a backup copy of
all of our data offsite to a cloud provider facility.
That might be a reasonable explanation.
And in that case, I wouldn't need to worry
because I would see that every single night
and I'd be used to seeing it.
Now, on the other hand,
maybe that server has been infected with malware
and every day at 2-4:00 AM,
it's going to send all of the data back to the bad actors
while all my administrators are at home sleeping.
This is considered data exfiltration
as part of an attack campaign,
that's something you want to be on the lookout for.
Now, just looking at this graphic,
I don't know which of these two cases it is.
Is this something normal, like a backup,
or is this something malicious?
But if you know your organization
and you know your baselines,
now you can look at this graph and identify
what should be investigated based on seeing
that spike between 2:00 AM and 4:00 AM,
and then figuring out where
that additional traffic flow is going, and why.
If we suspected something was malicious here,
like somebody exfiltrating our data,
then we might set up a network sniffer
in front of our file server and see
what traffic is leaving the network and where it's going.
Then, based on that, we can have an instant response
on our hands and do our cleanup.
Now at this point, we just don't know
if this is malicious or not,
but we do know it's something different
and something that is outside of the normal baseline
as indicated by that big spike.
So, it's important for us to investigate it
for the health of our network.
Interface Statistics.
In this lesson, we're going to talk about interface statistics
and how it's used to monitor our network's performance.
Now, if you're new to networking,
you may be wondering what exactly is an interface?
Well, an interface is just one of the physical
or logical switch ports on a router, switch or firewall.
In enterprise level devices,
each interface can generate its own statistics
and maintains its own status.
In this lesson, we're going to explore the link state,
the speed and duplex status,
the send receive traffic statistics,
the cyclic redundancy check statistics,
and the protocol packet and byte counts
that are collected for our network devices.
To help guide our discussions,
I'm going to be using the output from a Cisco router
for an interface called f0/0,
which simply means it's a fast Ethernet or Cat5 connection
going from this physical interface on slot zero
and port zero of a given router.
Now, first you can see, we have the Link State.
A link state is used to communicate whether or not
a given interface has a cable connected to it
and a valid protocol to use for communication.
For example, if I connected a fast Ethernet
unshielded twisted pair cable to the interface
on 0/0 of this router,
and then plug in the other end into another router
to create a connection,
I should see fast Ethernet 0/0 is up, line protocol is up.
This indicates that the interface is physically up
and the protocol is operational.
If we're using Ethernet, that means that frames
are able to be entering and leaving this interface.
Next, we have some information about the interface itself,
such as the MAC address and the IP address assigned to it.
After that, we see there's an MTU size set of 1500 bytes,
which is normally used by default in Ethernet.
And then we have the bandwidth
is being set at 100,000 kilobits per second,
which is 100 megabits per second.
This makes sense because I'm using fast Ethernet
or Cat5 cabling for our connection.
This speed is also used by the router
when it's trying to calculate the metrics
for the routing protocols like OSPF and EIGRP,
since they rely on the connection speed
in making their determinations and their link costs.
Next, we have the reliability,
which is being shown here as 255 out of 255.
This means if the connection begins to have more input
or output errors, you're going to see the reliability lower.
Basically, you read this as reliability equals
the number of packets divided by the total number of frames.
So, 255, over 255 is the best reliability,
and indicates there was no packets or frames
that have been dropped so far.
txload is our next statistic.
And this is going to indicate
how busy the router is transmitting frames
over this connection.
At one out of 255, this router is not very busy at all.
rxload is like txload, but instead of transmitting,
we're going to be measuring
how busy the router is in terms of receiving frames.
Next, we have the archetype being used.
In this case, we're using ARPA,
or the Advanced Research Projects Agency setting,
which indicates that we're using standard Ethernet.
This is because ARPA develops standard Ethernet,
and we're using that for Ethernet frames
for our encapsulation.
Now, if you're using something different,
like a serial link or a frame relay,
it would say something different here instead of ARPA.
But if you're using Ethernet,
you should expect to see ARPA right here.
Next, we have the keepalive,
and this is set to 10 seconds, which is the default.
This is how often the router
is going to send a keepalive packet
to other devices that it's connected to,
to check if they're still up and online.
Next, we have a line that says full-duplex,
100 megabits per second, 100BaseTX/FX.
Now, this indicates whether this interface is using half
or full-duplex, and in this case we're using full-duplex.
It also tells you what the bandwidth is,
and the interface type you're using.
In this case, as I said, we're using full-duplex
and we're using 100 megabits per second as our bandwidth,
and we have a fast Ethernet interface type,
and it's either using copper or fiber cabling,
because it says TX/FX.
Now, next we're going to have our ARP type.
And in this case, again, we're going to use ARPA.
The timeout here tells us how long each ARP cache
is going to remember a binding, and when it will be cleared.
In this case, we're using the default time of four hours.
The next two lines are the last input, last output,
and last clearing of the counters.
In this case, the router was just rebooted,
so they're all set to zero
because they were all just cleared.
Next, we have our input queue,
which tells us how many packets are in the input queue,
and their maximum size.
In this case, the maximum size is 75 packets for our queue.
Drops is the number of packets
that have been dropped so far.
Flushes is used to count the Selective Packet Discards
that have occurred basically when the router
or switch has a sign, it needs to start shedding some load,
and it starts dropping packets selectively.
SPD is a protocol that's going to drop
your lowest priority packets when the CPU becomes too busy,
so that way you can save capacity
for higher priority packets as a form of quality of service.
Now, the total output drops here is at zero.
This means that we've had no drops
because we never had a full output queue.
Since we have a hundred megabit per second connection,
as long as we're communicating
with another a hundred megabit per second connection,
we should see this stay at zero drop packets.
If we started using a 20 megabit per second connection,
for instance, on our ISP,
then we might likely have an experience here
with network congestion because we're sending at 100,
but they can only take it at 20.
That would cause a problem for us.
And at that point, some of our packets might get dropped.
Next, we have our queuing strategy
for our quality of service.
In this case, we're sending this as First In, First Out,
which is known as FIFO.
This is the default for this type of router.
Next, we have output queue size and the maximum.
Currently, our queue is empty and it's showing zero packets.
Now, the maximum queue size here is set at 40.
So, if I receive more than 40 packets,
the queue is not going to be able to hold it
and the rest of those will get dropped.
Next, We have our minute input and output rates.
Now, here are the average rates
at which packets are being received and being transmitted.
Packet input is our next line.
And here we can see 923 packet inputs was received
for a total of 158,866 bytes of data being processed.
The next line contains the receive broadcast.
And in this case, we received 860 broadcast frames.
We also have runts,
giants and throttles counted here as well.
Now, a runt is an Ethernet frame
that is less than 64 bytes in size.
It's really small, that's why it's a runt.
A giant is any Ethernet frame
that exceeds the 802.3 frame size of 1,518 bytes.
It's really large, so it's a giant.
Throttles are going to occur
when the interface fails to buffer the incoming packets.
If this is a high number, this is an indicator
that you may be having quality of service issues
to your end users.
Next, we have input errors,
CRC, frame, overrun, and ignored.
The input error counter will go up whenever the interface
is receiving a frame with any kind of error in it.
This can be something like a runt, a giant,
no buffer available, CRC errors, or other things like that.
CRC is the number of packets that were received,
but failed the cyclic redundancy checksum,
or CRC check upon receiving them.
If the checksum generated by the sender
doesn't match the one calculated by this interface
when it receives that frame,
a CRC error is counted and the packet gets rejected.
Now, frame is used to count the number of packets,
where a CRC error
and a non-integer number of octet was received.
Overrun is used account
how often the interface was unable to receive traffic
due to an insufficient hardware buffer.
Ignored is going to be used to count the number of packets
that the interface ignored since the hardware interface
was low on the internal buffers.
If you're experiencing a lot of noise on the connection
or a broadcast storm,
this ignore count will start to rise drastically.
Next, we have the watchdog counter, which is used to count
how many times the watchdog timer has expired.
This happens whenever a packet over 2048 bytes is received.
The next line contains the input packets
with dribble condition detected,
which means that a slightly longer than default frame
was received by the interface.
For example, we talked about the fact that the MTU size
was 1500 bytes by default,
but a frame wasn't considered a giant
until it reached 1,518 bytes.
So, if I got a frame that was 1,510 bytes inside,
it's technically above the MTU size,
but it's not yet a giant.
So it would still be processed,
but it would be added here on the dribble condition counter,
so I can know that I'm starting to get packets
above 1500 bytes.
Next, we have the packet output counter,
and this is the number of packets that have been sent
and the size of those transmissions in bytes.
The underrun is the number of times a sender
has operated faster than the router can handle,
and this causes buffers or drop packets.
Next, we have the output errors,
and this is just like our input errors, the only difference
is we're now counting the number of collisions
and the interface resets that are occurring as a result.
A collision is counted
anytime a packet needs to be retransmitted
because an Ethernet collision occurred.
Since we're using full-duplex, this number should be zero.
If it's not zero, something's wrong.
Next, we have the interface reset,
and this counts the number of times an interface
had to be completely reset since the last reboot.
Next, we have unknown protocol drops.
Anytime a protocol drops,
but our device can't determine what protocol it was,
it's going to be listed under the unknown protocol drops.
For example,
if you're not supposed to receive older types of protocols
like IPX traffic and AppleTalk on your router,
but somebody sends you a message that's formatted that way,
your router is going to drop it,
and it's not going to know what it was,
because it's not a properly format IP message
or an Ethernet frame.
So that counter is going to go up.
Next, we have babbles, late collision, and deferred.
Now, a babble is used to count any frame
that is transmitted, that is larger than 1,518 bytes.
This is similar to our giants,
but we're going to use this when we're transmitting,
instead of receiving.
A babble is for transmission, a giant is for receive.
Late collisions are going to be used
to count the number of collisions that occur
after the interface started transmitting its frame.
And deferred is used to count the number of frames
that were transmitted successfully
after waiting because the media was busy.
So, if your devices are using CSMA/CD
or collision detection, it's going to detect the media as busy,
it's going to wait, and then it's going to transmit.
When this happens,
this number is going to go up because it had to wait.
Again, we should see zero for late collisions
and deferred here
because we're using a full-duplex connection,
but if we're using a half-duplex connection,
there will be some numbers there.
Next, we have the loss carrier and the no carrier.
This is the number of times that the carrier was lost
or not present during the transmission.
The carrier we're talking about here
is the signal on the connection.
Finally, we have the output buffer failures and swapped out.
The Output Buffer Failure is going to be used
to count the number of times the packet was not output
from the output hold queue
because of a shortage of shared memory.
An Output Buffer Swap Out
is going to be the number of packets stored in the main memory
when the queue was full.
If this number is very high,
that means that you're likely experiencing
a busy time in your networks.
Now, for the exam, you don't need to know all these things
and memorize all their definitions,
but you should be aware of some key statistics here
on the interface.
Things like the link state, the speed and duplex status,
the send and receive traffic statistics,
the cyclic redundancy check statistics,
the protocol packet and byte counts,
the CRC errors, the giants, the runts,
and the encapsulation errors.
On the exam, you may get a question
that involves troubleshooting a device,
and you're going to see
an interface statistics screen like this,
and then you're going to have to recommend a solution
to that problem.
For example, if the question asks,
why the device is operating slowly,
and you see the connection is set to half-duplex
instead of full-duplex,
that would be a reason for the slowdown,
because you effectively cut your bandwidth in half,
because there's the listen before transmitting.
Or, if you see a large amount of collisions,
but you're running full-duplex,
that would indicate there's two devices
connected to this same switch port,
and that is causing you issues.
Or, maybe you see there's a lot of CRC errors,
this could indicate a dirty fiber connector,
or unshielded twisted pair cable
that's subject to too much electromagnetic interference.
This could be caused by lots of different things,
such as your cable being improperly run over
a florescent light or near a power line,
or something like that.
My point is,
it's important to be able to read the interface statistics
so you can then troubleshoot
your network connectivity issues
in your routers and switches.
Environmental sensors, in this lesson,
we're going to talk about environmental sensors
that help us monitor our physical environments
where our network devices are operating,
such as our data centers, our telecommunication closets,
and our main distribution frames.
These sensors are going to be used to monitor
our environmental conditions.
Things like our temperature and humidity,
as well as the electrical power status and whether or not,
we may be experiencing flooding.
After all, all of these routers and switches
are sitting in a telecommunication closet somewhere,
and nobody's sitting in there with them
looking at them every day.
So, how am I going to keep track of all of them?
How do I know the power is still on?
How do I know there's enough cooling there?
How do I know they haven't gotten covered in water
from a leaking pipe?
Well, this is where environmental monitoring
becomes extremely important.
Environmental monitoring relies on different types
of sensors that can be configured
to report back periodically,
or can be pulled from a central monitoring station
repeatedly, to maintain the status of those areas.
Our network devices need to operate in a cool and dry place.
To maintain the proper temperature and humidity,
we can have sensors that communicate with our HVAC system.
If the temperature begins to get too hot,
the HVAC system can increase the airflow
and cool the telecommunication closets more.
If the area gets too cold, it can reduce the airflow
and bring the temperature back to the right range.
Most network devices want to be operating
between 50 and 90 degrees Fahrenheit.
So, using an automated HVAC system with sensors,
can help ensure that occurs.
Additionally, we need to ensure this area
maintains the right humidity levels.
If there's too much humidity,
this can cause condensation in the equipment,
and that leads to water on our circuit boards,
which will destroy our network devices.
Conversely, if we have humidity that's too low,
static electricity can build up
and it can short out our equipment.
Therefore, we always want to make sure our humidity range
is between 40 and 60%.
Again, by having proper humidity sensors
connected to our HVAC systems,
we can increase or decrease the humidity
to keep it in that perfect 40 to 60% range.
Next, we need to ensure all our devices have power.
We can install sensors on our power lines,
or use our power distribution centers
to track the power levels
going into our pieces of networking equipment.
This allows to know if there's a surge, a spike,
a brown out or a blackout, or simply dirty power.
All of this can be remotely monitored
by our central monitoring systems
by using internet of things devices like power sensors.
Finally, we need to ensure devices
are not subject to flooding.
Again, we can place sensors in our telecommunication closets
and other non-human occupied spaces,
to detect if there's any water on the floor
due to a burst pipe or other sources of flooding.
These sensors can detect the change from dry to wet,
and when they become wet, they sound an alarm
or send a signal back to our central monitoring panel.
Remember, when it comes to our network equipment
and data centers, our devices need to be cool,
at the right humidity and receive clean powers and input,
and then, they need to stay dry from flooding
in order to continue doing their operations
day after day without any interruption.