Proposal for operational and security monitoring of the digital enterprise environment

Analysis, design, and coordination of operational and security monitoring across application, identity, and infrastructure components of the digital environment.

Context

The digital enterprise environment consisted of a combination of cloud and on-premise components, including identity services, integration points, and application platforms. Operational responsibilities were divided among multiple teams and individual service owners.

Problem

Historically, monitoring was handled in isolation at the level of individual technologies, without a unified view of the availability, security, and operational health of the entire environment. Incidents were often detected only by users, and there was no clear correlation between operational and security signals.

Constraints

  • Hybrid environment combining cloud and on-premises services
  • Different service owners with different operational priorities
  • Dependence on existing monitoring tools and processes
  • Need to separate operational and security monitoring

My role

Solution architect responsible for analyzing the digital environment, designing operational and security monitoring of individual components, and coordinating requirements between service owners and the monitoring team.

Solution

A unified approach to operational and security monitoring based on cooperation with the owners of individual components was proposed. Meaningful operational metrics and separate security use cases were defined for each service. These requirements were formalized into specifications and subsequently handed over to the monitoring team for implementation.

Below is an example of a requirements monitoring matrix that shows the monitoring options in a regulated business environment—an example at the network, identity, integration, and infrastructure layers. It maps technical signals (availability, performance, capacity, security), severity, and ownership, enabling repeatable incident detection, classification, and clear accountability. Specific examples of monitoring depend on the environment of the company in question.

Operational monitoring

ComponentDNS
LayerNetwork
What is monitoredName resolution availability
Signal typeAvailability
HowDNS query (A/AAAA)
Trigger / ThresholdTimeout
SeverityCritical
Primary ownerNetwork team
NotesCore dependency for all services
ComponentDNS
LayerNetwork
What is monitoredQuery latency
Signal typePerformance
HowDNS response time
Trigger / ThresholdLatency above agreed threshold
SeverityMajor
Primary ownerNetwork team
NotesEarly signal of network issues
ComponentDHCP
LayerNetwork
What is monitoredScope capacity
Signal typeCapacity
HowLease utilization
Trigger / ThresholdCapacity above agreed threshold
SeverityMajor
Primary ownerNetwork team
NotesPrevents new clients from connecting
ComponentF5 Load Balancer
LayerNetwork / L7
What is monitoredAvailability/HealthCheck VIP
Signal typeAvailability
HowHealth check
Trigger / ThresholdVS down
SeverityCritical
Primary ownerNetwork team
NotesEntry point for applications
ComponentF5 Load Balancer
LayerNetwork / L7
What is monitoredPool member health
Signal typeAvailability
HowNode/pool status
Trigger / ThresholdHealthy members < N
SeverityMajor
Primary ownerNetwork team
NotesDetects backend degradation
ComponentFirewall
LayerNetwork / Security
What is monitoredDropped packets
Signal typeSecurity / Network
HowFirewall counters
Trigger / ThresholdSpike over baseline
SeverityMajor
Primary ownerSecOps
NotesDetects misrouting or attack
ComponentProxy
LayerNetwork
What is monitoredOutbound connectivity
Signal typeAvailability
HowSynthetic HTTP probe
Trigger / ThresholdTimeout / 5xx
SeverityCritical
Primary ownerNetwork team
NotesAffects SaaS and external APIs
ComponentActive Directory
LayerIdentity
What is monitoredLDAP availability
Signal typeAvailability
HowLDAP bind check
Trigger / ThresholdBind failure
SeverityCritical
Primary ownerIdentity team
NotesAuthentication dependency
ComponentActive Directory
LayerIdentity
What is monitoredReplication health
Signal typeConsistency
HowAD replication status
Trigger / ThresholdReplication delay
SeverityMajor
Primary ownerIdentity team
NotesPrevents stale identity data
ComponentActive Directory
LayerIdentity
What is monitoredAuthentication failures
Signal typeSecurity
HowAuth error rate
Trigger / ThresholdSpike over baseline
SeverityMajor
Primary ownerIdentity team
NotesDetects misconfig or attack
ComponentNTP
LayerInfrastructure
What is monitoredTime synchronization
Signal typeAvailability
HowTime drift check
Trigger / ThresholdTime sync over baseline
SeverityMajor
Primary ownerPlatform team
NotesCritical for auth and logs
ComponentMonitoring Agent
LayerObservability
What is monitoredAgent heartbeat
Signal typeAvailability
HowHeartbeat signal
Trigger / ThresholdHeartbeat missing for agreed time
SeverityMajor
Primary ownerPlatform team
NotesBlind spot detection

← swipe →

Security monitoring

ComponentDNS
LayerNetwork
Use caseDNS abuse / tunneling
What is monitoredAbnormal query patterns
Signal typeSecurity
HowDNS logs / Sec monitoring tool
Trigger / ThresholdSpike in TX/long queries
SeverityHigh
Primary ownerSecOps
NotesEarly sign of data exfiltration
ComponentDNS
LayerNetwork
Use caseMalware C2 resolution
What is monitoredResolution of known bad domains
Signal typeSecurity
HowThreat intel feed + DNS logs
Trigger / ThresholdMatch on IOC
SeverityCritical
Primary ownerSecOps
NotesBlocks malware communication
ComponentFirewall
LayerNetwork / Security
Use caseUnauthorized access attempt
What is monitoredDenied inbound connections
Signal typeSecurity
HowFirewall logs
Trigger / ThresholdRepeated denies from same source
SeverityHigh
Primary ownerSecOps
NotesRecon or brute-force attempt
ComponentFirewall
LayerNetwork / Security
Use casePolicy violation
What is monitoredTraffic outside allowed zones
Signal typeSecurity
HowFirewall policy logs
Trigger / ThresholdRule hit anomaly
SeverityHigh
Primary ownerSecOps
NotesDetects misconfigured or bypassed flows
ComponentProxy
LayerNetwork / Security
Use caseSuspicious outbound traffic
What is monitoredRequests to risky categories
Signal typeSecurity
HowProxy logs + URL categories
Trigger / ThresholdAccess to malware/phishing category
SeverityCritical
Primary ownerSecOps
NotesUser or service compromise
ComponentActive Directory
LayerIdentity
Use caseBrute-force authentication
What is monitoredFailed logon attempts
Signal typeSecurity
HowAD security events
Trigger / ThresholdFailures > baseline
SeverityCritical
Primary ownerIdentity / SecOps
NotesCredential stuffing or password spray
ComponentActive Directory
LayerIdentity
Use casePrivilege escalation
What is monitoredGroup membership changes
Signal typeSecurity
HowAD audit logs
Trigger / ThresholdAdmin group modification
SeverityCritical
Primary ownerIdentity / SecOps
NotesHigh-impact identity event
ComponentActive Directory
LayerIdentity
Use caseSuspicious Kerberos activity
What is monitoredTicket anomalies
Signal typeSecurity
HowKerberos logs
Trigger / ThresholdGolden/Silver ticket patterns
SeverityCritical
Primary ownerSecOps
NotesAdvanced attack detection
ComponentLoad Balancer
LayerL7
Use caseApplication abuse
What is monitoredUnusual request rate
Signal typeSecurity
HowL7 metrics
Trigger / ThresholdTraffic spike per client
SeverityHigh
Primary ownerAppSec
NotesBot or DoS behavior
ComponentNTP
LayerInfrastructure
Use caseTime manipulation attempt
What is monitoredTime drift anomalies
Signal typeSecurity
HowNTP offset monitoring
Trigger / ThresholdSudden drift change
SeverityHigh
Primary ownerPlatform team
NotesCan impact auth & logging

← swipe →

Key decisions

  • Division of monitoring into operational and security perspectives
  • Definition of monitoring requirements in cooperation with service owners
  • Focus on monitoring key integration and identity points
  • Separation of monitoring design from its technical implementation

Outcome

  • Better overview of the operational health of the digital environment
  • Faster identification and triage of incidents
  • Clearly defined monitoring responsibilities across teams
  • Meaningful security use cases linked to real-world operations
  • Higher operational stability of business-critical services

Technologies & Standards

SNMPRESTSOAPSynthetic MonitoringHTTP ProbesProxyIdentity ServicesObservability