Richard Gunstone - Blog

Recently:


Dorset Chartered Institute for IT 2023/2024 Annual General Meeting

For several years I've supported BCS, the Chartered Institute for IT Dorset Branch in a variety of roles including Chairman, having returned to a committee member role since 2021.

Before a brief AGM we'll have a guest talk from Andrew Radcliffe which is open to all, both members and non-members of BCS.

Guest presentation from Spyrosoft

In 2021, Spyrosoft was recognised by the Financial Times as the fastest-growing tech company in Europe; find out how they achieved it

As a global technology services company, Andrew aims to take you behind the scenes, telling the story of how Spyrosoft was built from the ground up, describing a unique customer engagement approach, the successes, failures and learnings along the way before giving an insight into future plans.

In 2021, many companies faced another year of slower growth as the pandemic continued to constrain business activity. Delayed recovery in demand across European economies and the implications of Brexit complicated this trading period.

Accenture reported that half of all European companies reported a revenue or profit decline in the previous year and didn’t expect immediate improvement. Based on revenue growth in the preceding three years, as compiled by data provider Statista, the Financial Times annual FT1000 review of Europe’s fastest-growing companies identified Spyrosoft as the fastest-growing technology company in Europe.

This was achieved based on the highest compound annual growth rate in revenue between 2016 and 2019.

About the speaker - Andrew Radcliffe

Andrew Radcliffe BSc MSc is an experienced CTO, technology entrepreneur and Co-Founder of Spyrosoft.

Having extensive software development experience in international markets, Andrew has been instrumental in leading the delivery of digital technologies into numerous companies around the world.

Andrew has over 30 years' experience in software engineering, working across the UK public sector, Telecommunications, Corporate Real Estate, HR, Aerospace and Legal within the UK, USA, Sweden, Poland and Austria. Formally a Digital Advisor to UK Government and Head of Development at Ordnance Survey, the National Mapping Authority of Great Britain, responsible for technology delivery and operation.

Andrew Radcliffe BSc MSc's LinkedIn profile

The BCS Dorset Branch presentation and AGM event, on Monday 22 Jan is now online at BCS Events

Some recent and forthcoming events

Some recent and forthcoming events from the events calendar are worth viewing if of interest:

Tuesday 31 October, 5pm , Delivering Cyber Education in HEI, by Mazhar Malik, organised by the Cybercrime Forensics SG, – details and registration at https://www.bcs.org/events-calendar/2023/october/webinar-delivering-cyber-education-in-hei/

Wednesday 1 November, 6pm, after the BCS AGD AGM, Generative AI and Metaverse what a powerful combination! By Ian Hughes – details and registration at https://www.bcs.org/events-calendar/2023/november/webinar-generative-ai-and-metaverse-what-a-powerful-combination/

Monday 6 November, 7pm, Reviews and Inspections used in various ways, by Niels Malotaux, followed by the Quality SG AGM – details and registration at https://www.bcs.org/events-calendar/2023/november/webinar-reviews-and-inspections-used-in-various-ways-bcs-quality-sg/

Friday 1 December, at 5pm, AI driving force of Cyber Security, by Mazhar Malik, organised by the Cybercrime Forensics SG, with the Hampshire and the Dorset Branches, followed by the AGM of Cybercrime Forensics SG – details and registration at https://www.bcs.org/events-calendar/2023/december/webinar-ai-driving-force-of-cyber-security-and-the-agm-of-cybecrime-forensics-sg/

November 21, 2023


Redevelopment

I've been meaning to make this site a lot less overhead to run given it's rarely updated, and also migrate to a new hosting platform.

These pages are now generated by a home-grown static site generator, which contains a number of features to generate the content of the entire site. All web pages are generated from Markdown and the site generator is written in Python.

Despite running Python programming courses a few years back as a university lecturer, it's surprising how rusty programming skills can get, so very refreshing to revisit some fairly rudimentary Python coding in a text editor. I must do more.

November 19, 2023


Institute for Security Science and Technology event (October 2023)

Joining colleagues from the NHS and Academia, I recently took part in a panel discussion chaired by Prova Health CMO, Dr Saira Ghafur, hosted by the Institute for Security Science and Technology (ISST) at Imperial College London.

This interesting panel discussion explored how healthcare systems can become more resilient against cyber attacks.

It was fascinating to hear the range of insights from the panel on the challenges and solutions for increasing cyber security in healthcare, where systems rely increasingly on digital technologies and cyber threats pose a serious risk to patient safety.

The most interesting aspect as always was the audience participation, questions, and after event discussions.

October 19, 2023


The key ingredients for good patch management governance

What has made enterprise patch management tougher recently is how dynamic and dispersed computing assets are, as well as the sheer number of installed software components to patch. In addition, patch management processes and technology take different forms depending on the type of assets (e.g., OT, IoT, mobile, cloud, traditional IT, virtual machines, containers). The result is that many organizations are unable to keep up with patching. Patching often becomes primarily reactive (i.e., quickly deploy a patch when a severe vulnerability is being widely exploited) versus proactive (i.e., quickly deploy patches to correct many vulnerabilities before exploitation is likely to occur) (NIST SP 800-40r4 [4])

Patching is the oft-mentioned bane of any information security practice. A significant proportion of security results can be obtained by getting the basics right, and patch management is one of those basics. Many organisations fail to do it correctly, having at best a hap-hazard patching regime relying on individual technical expertise or, worst, no coordination of patch management at all.

Statements like "we adhere to Patch Tuesday release cycles" or "our Linux hosts are patched using an on-premise staging server" are usually indicators of further work being needed. But it can be a beneficial baseline nevertheless: even for organisations working at these relatively low levels of patch management maturity, they were able to avoid exploitation by the WannaCry attack [1,2,3] – so even some automation is beneficial.

As SANS note [1,2,3], patch management is one of the biggest security and compliance challenges for organizations to operate effectively; many large data breaches were successful precisely because patches weren’t applied, often for a critical security update.

The underlying data for patching is not a great backdrop: consistently since 2001, the number of vulnerabilities registered with CVSS shows a trend that is increasing over time [1,2,3]. Even worse, much of this growth is in the medium and high severity categories, which outpace growth in low-severity vulnerabilities, thus over the last 20 years as a crude inference the risk of exploitation due to missing patches has significantly increased. Granted, CVSS data is not an easy read across to patches, and the value of CVSS is often discussed (c.f. prioritised vulnerability ratings etc.), but as a broadly interpreted indicator it is valuable in the context of patch management.

To make matters worse, the time between discovery of a vulnerability in an application or operating system and the emergence of a viable exploit has drastically shrunk to a matter of hours in some cases. This presents in some respects a toxic combination, which, of course, threat actors are very willing to exploit.

Yet, patch management has not received anywhere near the same level of emphasis in organisational governance or technical journals, and potentially to an extent compliance/audit frameworks.

So how does it all go wrong for so many organisations? The dearth of industry reporting suggests organisations suffer from:

In addition to all of these, software vendors have been changing their patch release practices that if anything create added yet hidden risk for corporate security. For example, several vendors like Microsoft are now rolling up patches to prevent patch fragmentation and integration issues [1,2,3]. While ostensibly the practice of rolling up patches is a good thing, since they often contain cumulative patches when installed, they create additional change management considerations for IT functions, which makes the overall process of patch management potentially more complex.

What good looks like

One useful practice to embed in an organisation is a coherent, cogent patch management policy and associated technical standards. The extent to which organisations can achieve the latter will vary, but an effective policy is a good baseline and enables:

Good patch management policies contain:

Policy governance for out-of-band (OOB) patches or emergency patches is a further element that good patch management policies embed. This element is often missing in patch management policies and applies to critical patches that are released by vendors typically for vulnerabilities that are being actively exploited or pose too significant a risk to their customer base to leave to a regular release cadence.

For larger organisations, OOB patching provisions in patch management governance enable incident response teams and any security operations capabilities to reduce attack surface exposure quickly by directing the relevant teams to apply patches once released and following any necessary integration/compatibility tests. Yet more mature organisations can implement risk assessment processes around OOB patching to perform impact analysis and make recommendations.

Of course, all of this is just a small part of the wider range of considerations, processes and governance. NIST SP 800-40r4 has some useful suggestions and is well worth a read [4]. The application of patches also touches upon a myriad of other areas, including how you approach patching of systems that are not online all of the time. An effective reporting regime usually produces some interesting results, with the main one being data that shows how difficult it can be to patch the entirety of the IT estate, which raises the obvious question as to how that is managed.

More widely, the culture (or "putting it into the corporate DNA") argument is often mentioned in the context of patch management, which revolves around a series complementary and interlocking strategies performed in an organisation to achieve good practices that are consistently followed.

Achieving cultural change is difficult, but to get there some key governance practices need to be put into place. An effective patch management policy (potentially with associated minimum standards) is a necessary first step to improve organisational security maturity - or, even better, implement it as part of comprehensive ISMS if you have the resources to do that.

References

  1. Agile Security Patching, Michael Hoehl, 2018. https://www.sans.org/white-papers/38410/

  2. Patch Management and the need for Metrics, Ken MacLeod, https://www.sans.org/white-papers/1461/

  3. A Practical Methodology for Implementing a Patch management Process, Daniel Voldal, 2003. https://www.sans.org/white-papers/1206/

  4. NIST SP 800-40r4, https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-40r4.pdf

July 10, 2022


Livepatching under Ubuntu and reflections on patch management

So, how to get kernel livepatching for Ubuntu on your home devices? Indeed, what is kernel livepatching and it is worth using at all?

This, sometimes obscure feature, is a frequent feature of Cloud Service Provider VMs, but it's not that well known and can be easily overlooked.

First, a bit of history. Commonly mentioned in connection with Ubuntu Server (e.g. in CSPs), kernel live patching (KLP) is actually a feature of the Linux kernel. Some of the early workable implementations were based on the "Ksplice" project undertaken at MIT, later bought up by Oracl3. Live patches are distributed as kernel modules, which are dynamically loaded into the kernel at install time. Each module includes a complement of functions that are to take the place of existing functions being operated in the kernel. Once the livepatch is loaded and enabled, the kernel will start using any replacement functions automatically. The Linux kernel pages have a lot more information around the technical underpinnings of live patching, but in essence a variety of checks are carried out to ensure the process can operate smoothly. Once the new functions are active, existing uses of earlier functions are converged to using the new functions.

Livepatching has a number of limitations. First, only traceable functions can be patched. There are also limitations around the placement of the dynamic ftrace in relation to a function.

There is more information on the above available here, and Red Hat's presentation of the process is pretty good.

Livepatching support can be found in Amazon EC2 instances, on-prem Amazon Linux 2, Gentoo, Oracl3 Linux, RHEL 7 and later, SUSE, and Ubuntu 16.04 and later. Due to different implementations, the mechanism used in each distribution can vary, generally being based on livepatch, kpatch-git, kpatch, ksplice, or kGraft. Live patching is a desirable feature and is typically only available with a paid-for support package. If enterprise deployment is desired, then it's going to require a suitable budget and subscriptions to work.

Turning to Ubuntu, the kernel live patching capability is part of the distribution. Using the underlying capability for live patching, this enables the patch flow from Ubuntu to remediate critical vulnerabilities without the need for a system restart and re-load of the kernel. This is obviously extremely helpful in server scenarios, and where uptime must be maintained. Both high and critical vulnerabilities in the Linux kernel can be remediated in this way.

In Ubuntu, this is implemented using a client application (Livepatch) which connects to the Canonical live patch servers. In enterprise environments it's also possible to set up an on-premise server in place of the Cloud Canonical servers. Livepatches are cumulative, so later patches include the CVEs addressed in that and previous livepatches.

Livepatching in this scheme does not include CPU firmware and microcode, shared libraries and low-level dependencies, nor BIOS or EFI updates. If the kernel is hardened to prevent the loading of modules, which is the mechanism by which the likes of kpatch, kGraft and livepatch operate, then it may not be possible to perform KLP (in these cases a reboot will be necessary).

Once the livepatch is tested by Canonical, it is made available for release for the supported kernels

Unfortunately, livepatch is not available by default on Ubuntu. In some installations it is (namely CSP-provisioned VMs) but for the end-user further steps are necessary. This essentially boils down to creating an Ubuntu Advantage account. On first glance you might get the impression that UA then requires enterprise-level support packages to be purchased and used, but having a closer look in the UI reveals support for up to 3 hosts for free (personal use).

Once the UA account is created, you can copy and paste the key from UA into your root shell session, attaching the subscription and enabling the support as follows:

# ua attach [TOKEN]
# ua enable livepatch

It's worth noting that due the limited scope of livepatching under Ubuntu it's always a good idea to think about automatic updates (e.g. by using unattended-upgrades) or some other way of ensuring your hosts are patched regularly, as live patching is narrowly restricted to high and critical CVE's relating to the kernel and the production of live patches is vendor-driven.

You can find more information about livepatch under Ubuntu at the following link: Ubuntu Livepatch and Ubuntu Advantage.

So what, in conclusion, are the benefits and dis-benefits?

Patching is, essentially, a race against time, to ensure a host has the required patches quickly enough to reduce the potential for an attacker to develop a workable exploit (or, indeed, to take a workable exploit and use it to obtain unauthorised access to your systems). The time between a patch being made available, and the time a patch is applied, is effectively a window in which an attacker can (a) obtain the patch, (b) reverse engineer the patch to develop an exploit, and (c) to then use the exploit to attack a system. Notwithstanding this flow of events, the situation is obviously significantly worse if the attacker has harnessed a zero-day that has, unwittingly, been identified and remediated through the development of a patch by a supplier. The attacker, in the second scenario, is in posession of the exploit and could use it in advance of a patch being made available or during the window between patch availability and patch application. In either case, the pace at which a patch is applied has significant benefit and this is where live-patching can deliver ROI in reducing attack surface.

The shortcomings with live patching are mainly around change management - the risk of automated application of patches/changes of functionality that lead to faulty functionality or incompatibility with a managed configuration. While conventional patch cycles are typically managed, in an enterprise context, using an approved change management process, live-patching short-circuits this. In extremis, this can lead to catastrophic availability issues if a live-patch is applied that is incompatible with an existing change-managed enterprise configuration in one form or another. Related to this, and somewhat interwined, is the obvious risk of technical deficiencies in the live patch that then lead to availability issues or unexpected changes in functionality.

Ultimately live patching is a risk balance decision: accept the risk of automatic application of patches and the potential risk to availability as a reasonable mitigation in the effort to reduce the risk of high-impact exploitation, or not. In the final analysis after working through all the potential options, this usually boils down to a carefully managed approach that prioritises at-risk hosts and services over others where live patching may not be as essential (for example, hosts protected by other security controls such as gateways, reverse proxies, WAFs, API gateways, etc.)

On the risk presented by automatic mass-patching, Ubuntu at least has some options there if you have a server setup for livepatch on-prem on your LAN. See https://ubuntu.com/security/livepatch/docs/on_prem for more info.

The takeaways for a typical enterprise would therefore be:

June 30, 2022


More adventures in home working connectivity: enter Unfi Security Gateway and Unfi Controller

It’s been a while since I’ve updated my blog with something useful. I’ve recently been working diligently on home networking connectivity, following a house move so it seemed like a good time to write up new developments.

Moving out to rural Wales brought with it some Internet connectivity challenges, but fortunately as I wrote about in my previous blog post, having 4G LTE capability on a router paid dividends. The first challenge was encountering, to my surprise, a remarkably low guaranteed speed with no green-cabinet capacity, resulting in a line speed of around 0.4Mbps (down from 70Mbps). This made 4G LTE indispensable, and to cut a long story short, it worked extremely well while I turned my attention to Plusnet and BT OpenReach in an attempt to resurrect at least some level of usable connectivity.

Fortunately, it has proven possible to improve the line speed, although guaranteed speed remains incredibly low. I am unsure how OpenReach and Plusnet can have such a difference in their picture of a VDSL line, but erring on the side of caution I have kept my DrayTek 4G LTE router for emergencies.

As I wrote about in a previous blog post, the Unifi AP Lite is an excellent unit and in my new abode, despite 100-year-old stone walls in some cases a foot thick, it does a sterling job of creating both 2.4Ghz and 5Ghz 802.11 bubbles that extend to all parts of the house.

However, the Unifi AP is only partially functional using the phone App – and the area of particular concern is that, using the App, it’s only possible to configure the AP in WPA/WPA2 mode. It’s not possible using the App to access many of the desirable features – this being possible only when you install a Unifi Controller on the network.

In the rather confusing diaspora of Unifi product names, to get access to the Controller functionality you either buy and install the Unifi Cloud Key, which is a hardware device that is basically a Unifi Controller, or you download and configure it yourself either on a dedicated system or as a VM.

I opted for a VM and installed the Controller using a vanilla Ubuntu VM. Unifi do not make the process of rolling your own controller particularly easy, and it’s not straightforward to install and certainly not turn-key. It requires some manual work to reconfigure and bring up the controller daemon, which listens on a predefined web port offering up a web application and the Unifi Controller interface. This can take a couple of hours researching forums for configuration settings. Once all of the configuration work is complete, it is fortunately easy to use.

I also added a new Unifi Security Gateway (USG) to the mix, which I’ll return to in a moment.

Some of the features the controller offers include:

On the security front the controller adds a lot more in terms of features, including:

Wireless networks can be fully configured using the controller. It’s also possible to create segmented guest SSIDs and to isolate users using them, including a variety of guest portal options. The controller also allows the following features of 802.11 to be configured:

The USG is capable of talking PPPoE out of the WAN port, which allows a conventional VDSL modem to be used. I previously bought a DrayTrek Vigor VDSL modem, and had it integrated with a Cisco firewall, which offered PPPoE. As I talked about in my previous blog posts, I boxed this up when I switched over to the Draytek 2862 with 4G capability.

So I decided to relegate the DrayTek LTE router as a separate hotspot, and instead brought the Vigor modem back into the mix by directly connecting it to the WAN port of the USG. Adding the Plusnet account details in, it was quickly connected. The reduced latency over the whole stack of kit seemed to be improved using a modem, edging below 9ms.

I then connected the Unifi AP Lite to the WAN2 port of the USG (reconfigured as a second LAN port), and connected the fixed network to the LAN port.

After adoption, the controller was able to adopt both the AP and USG. Some of the interesting features I enabled after integrating the USG included:

The USG in particular allows a number of useful capabilities:

The unified DHCP management offered through the controller is useful in achieving a number of goals, including:

With integrated updates using the controller, updates can be scheduled, and you can configure separate schedules for specific devices, allowing the impact of upgrades to be minimised.

Perhaps the most useful element of the USG is that it is effectively a culvert for all Internet-bound traffic. This allows the firewall engine to be used to implement a default-deny with exceptions on outbound traffic, potentially also allowing the implementation of enforced proxy access. While this creates some management overhead, it is helpful for SMEs who need more capable Internet connectivity patterns e.g. in support of certification routes such as Cyber Essentials.

So what have I been able to implement using all of this?

June 21, 2021


Combining LTE and VDSL on the Draytek 2862

Following on from my earlier blog posts using the Vigor 2862 series of "hybrid" routers, I thought I’d put together an update on further refining the potential of the implementation I’ve been developing.

In my earlier posts I was extolling the virtues of the 2862 for resilient communications, and in particular, making use of an LTE (4G) cellular SIM combined with conventional VDSL broadband to achieve a resilient home working setup. LTE results are good when you combine the DrayTek with upgraded internal or external antennas. It’s possible, at some times of the day, to achieve 50Mbps down and 25+Mbps up through LTE alone.

Now, in practice, this is all well and good. You can configure VDSL as the primary bearer, and then fall back to LTE in the event the router starts to experience issues with VDSL. But it’s not as simple as that as this default mode remains clunky to operate.

First off, it’s worth reviewing the WAN > Internet Access options for both the VDSL and WAN connections. Under "WAN Connection Detection" there are a few ways of monitoring the link for up/down state; For VDSL for example, the 2862 can monitor the PPP link or alternatively use ICMP ping to ping a remote target. This is the foundation for resilience in this router, whether you use active/passive or active/active WAN configurations.

The Draytek documentation seems to recommend using ICMP ping if possible, though it notes a 20MB or so daily bandwidth hit from using ICMP in this way (ping per second). ICMP has the advantage of testing both the local bearer connectivity but also the full route out to an Internet destination, so it’s much more comprehensive than PPP or ARP detection. However, that is also its weakness, as it only tests one Internet route that may, for some reason, become temporarily unavailable; of course, the Internet is a distributed routing model (at least in theory, whether in practice ISPs really do that is a separate discussion) that might still be able to route even if one path fails. Also, you need a target IP that you can reliably ping without causing concerns from an unsuspecting system operator.

For each WAN connection, "WAN > General Setup > Active Mode" is the setting that configures the dynamic behaviour of the WAN link. In a simple setup, the VDSL might be configured as "Always On" and the LTE connection as "Failover". In the event the link detection identifies that VDSL has failed, the LTE in this configuration would start and communications would be re-established using the SIM card.

However, this is not the most elegant setup. For one, the SIM card will take 5-10 seconds potentially to start up and become serviceable for data connections. This will inevitably lead to some form of disruption in network connectivity. The second limitation is, especially when you have an unlimited data SIM, is you’re potentially missing out on an additional 20-60Mbps of bandwidth that the SIM card provides. If it spends most of its time inactive, you’re losing this added bandwidth and in addition it’s not really getting tested for the resilience scenario. On top of all of that, the upload speed of LTE is much more symmetric than VDSL, which is asymmetric by design, so the relative boost in upload throughput is significant.

So this got me thinking about how to maximise the benefit from the additional LTE connection. Fortunately it’s possible to develop a more sophisticated setup with the Draytek that really drives home the ROI on both connections and the unit itself.

In this more elaborate setup, I set about achieving the following:

Though LTE latency has significantly reduced compared to HSDPA, depending on the connection it still may a multiple of 4 or 5 times that of VDSL. Because the DrayTek cannot truly bond the bearers to support transparent migration across the WAN links, it means that any source allocated to LTE will also be limited to the bandwidth of LTE for the duration of the session/call. Under heavy contention LTE may reduce in bandwidth significantly, or in the presence of traffic management in the cellular provider or MVNO you might find data rates capped. It is therefore a safer bet that LTE is suited to applications that are slightly more elastic in nature, such as downloads, updates, Sky Q downloads, streaming, etc. It is also susceptible to co-channel interference being a shared medium so throughput can go up and down depending on nearby LTE stations such as handsets or some other kind of interference.

In contrast, VDSL is perhaps less prone to significant contention (though it is, ultimately, contention based), It also benefits in good locations from low latency, so is better suited to applications that are sensitive to latency (UX applications, online gaming, etc.)

Based on this, it’s possible to configure the DrayTek 2862 to make better use of both VDSL and LTE. Reconfiguring the Internet access configuration for each to "Always on" is the first step.

The next step is to configure the DrakTek routing panel under "Routing > Load-Balance/Route Policy". Click each of the index numbers to set the default routing policy for specific protocol/source/destination/port patterns. Then configure the "send via…" section to preferentially select the bearer of first preference. Then select "Failover to" to the alternate WAN link (either LTE or VDSL). Optionally then configure the "Failback" to revert back to the rule in the event the preferential bearer restores service after an outage (though note that this will clear existing sessions immediately). Finally enable the check box adjacent to each rule.

The only other point worth noting on the LB/RP policy is the ordering of rules. I have a final "catch all" all traffic rule at the bottom of the list to preferentially select VDSL over LTE (with failover to LTE). This ensures anything that is not configured explicitly for LTE goes over VDSL. I’ve not experimented with leaving this out yet, which may cause the router to use the "Load Balance Mode" setting under "WAN > General Setup". This should enable me to achieve the goal of combining bandwidth.

The default load-balance mode on the 2862 is according to DrayTek to "makes the routing decision according to the ratio of each WAN's remaining bandwidth, which is calculated by subtracting the bandwidth currently used from the maximum bandwidth". Reviewing the DrayTek documentation, "Auto Weight" looks the best option for LTE as it dynamically calculates recent peak throughput to make the decision on remaining bandwidth.

In summary, with some work on load-balancing rules it’s possible to keep LTE and VDSL up all of the time, use the extra LTE bandwidth alongside VDSL, and have a seamless failover that minimises LTE startup time.

March 12, 2021


More adventures in home-working connectivity

In my previous blog post I touched upon my setup with a Draytek 130 Vigor VDSL modem, combined with my Juniper firewall, and the potential for the SRX to provide resilient communications across multiple bearers.

Winding forward, I took delivery this week of a Draytek 2862 modem/router/firewall device, a considerable upgrade on the 130 modem.

This unit is a wholly different proposition to my previous idea of using the SRX and an Android AP to provide some kind of automatic failover. It is positioned as Draytek's flagship router/firewall, and has a number of interesting features. It has the ability to support 4 WAN bearers, including VDSL, ADSL, Ethernet and LTE. It also supports a host of other features as you'd expect, including IPV6, high-availability, SMS, VLANs, content filtering, QoS, and a whole host of other useful features.

With a Vodafone unlimited data SIM floating around (actually from the MVNO Lebara), I decided to implement my previous idea of resilient home-working comms using this device instead.

The interesting features for me at the moment in this device are:

Configuring Long Term Evolution (LTE)

The setup process was very straightforward. A SIM card can be added either using a USB dongle or using the in-built SIM card slot (on "ln" models).

Configuring the unit to support Plusnet's VDSL infrastructure is also straightforward from the management console. Mapping across my previous configuration, I set the display name for the LTE connection ("VodafoneLTE"), enabled Load Balancing on LTE, configured "3G/4G LTE Modem DHCP mode", and set the APN name to "uk.lebara.mobi" and username/password both to "wap". The 2862 also has an MTU discovery tool built-in for MTU value, and this returned "1500".

I also configured the WAN Connection Detection method for LTE to "Ping Detect" to a server I have in a Data Centre, with values of TTL=255, Interval=1, Retry=5.

Commiting the changes for LTE fired up the SIM and it worked flawlessly, with an RSRP of -83dBm and RSSI of -58 dBm (both "Excellent" quality according to the 2862 Dashboard).

Plusnet VDSL configuration

VDSL2 configuration for Plusnet was also straightforward, using the settings I talked about in my previous blog post. The settings I used were:

Other features

There are a number of interesting features in this device. It's possible to share bandwidth across all configured bearers, and fallback is naturally supported, e.g. if VDSL2 fails.

Over LTE the down/up speeds are around 18Mbps in both directions, which is a considerable improvement in the upload rate offered by the asymmetric profile offered under VDSL. This is not, however the theoretical rates offered by the LTE standard of 50Mbps up, and 150Mbps down, but this is obviously variable due to LTE being a shared medium.

Other useful features to enable are under the Firewall configuration, including:

Quality of Service

Under Bandwidth Management > Quality of Service, I configured QoS for both VDSL and LTE, though a value for up/down rates is needed for LTE as it is not auto-detected. I used the standard sharing of 25% each for Classes 1-3. VoIP priorisation is also useful to switch on. I assigned precedence tags for belt and braces for each class (precedence 1, 3 and 6), though these are only really useful if Plusnet honors them within its own infrastructure (unlikely).

Under Bandwidth Management > APP QoS, I enabled a variety of protocols for Class 1 High QoS handling by the router, such as Netflix, Spotify, Zoom, Citrix and TeamViewer.

SMS remote reporting

Handily the unit also supports SMS status reports over SMS, by sending the "router status" string followed by the password/PIN to the number assigned to the LTE SIM card.

The WiFi WAN bearer feature is also interesting, which provides the potential to use public WiFi (such as BT WiFi) as a further fallback for connectivity if VDSL and LTE fail.

Conclusions

There are a lot of features in this unit that I won't cover in this post, but first impressions are this is a high quality unit that is eminently capable of providing a resilient communications setup.

All that was left was to reconfigure the SRX to use a standard IP WAN interface (instead of PPP).

I also opted to remove NAT in the SRX and present the internal IP address space to the 2862. My aim here (further work down the line) is to shape the use of bearers based on IP address, ensuring IPs that need low latency use VDSL and others take advantage of the LTE WAN connection.

It's also possible in the 2862 to shepherd traffic based on a port profile to a particular bearer, so for instance under "Routing >> Load-Balance/Route Policy" you could direct all traffic for Spotify ports to LTE.

More to come on that when I get some time. Now to get my morning coffee before an early start.

February 03, 2021


A simple Certificate Authority for a Home Network

Self-signed certificates (SSCs) are less than ideal. The most problematic effect of using an SSC generating numerous warnings from browsers and tools using SSL or TLS, but all of the usual information security limitations also apply. These limitations include weak authentication of remote services, potential issues with application that rely on SSL or TLS to your own servers, and so on. SSCs do not possess a suitable trust chain to a known trusted issuer, and therefore application support can be patchy and interrupt UX.

SSCs are not great if you intend to use your system on other untrusted networks, as it's possible an application will generate a prompt to accept a new SSC from a destination on an untrusted network, and, inadvertently, you accept a certificate from a malicious target. It's much better to build trust through your own CA-generated certificates, to better handle these kinds of eventualities.

Self-signed certificates have advantages. For private IP ranges, it is difficult to find a reputable CA (i.e., a CA that your OS and browser trusts) that will issue a certificate for a private IP. They are quite easy to generate and are sometimes auto generated by server applications. You can self-validate them. In some cases, they can be quick and easy to use and deploy. They are also relatively easy to use compared to establishing your own CA.

However, with EasyRSA, a tool for managing X.509 PKI, it is not that difficult to build your own internal CA. In this blog post I run through the setup of EasyRSA on an Ubuntu server. We’ll then walk through the process of adding the new CA to the trusted roots of a Windows 10 system.

Setting up internal DNS using Bind

It makes a lot of sense to set up an internal DNS service using Bind for your home network. This provides a lot of flexibility, and is relatively easy to do. I won’t go through this here – a good guide to get started is this article on DigitalOcean and there are many similar guides.

In your forward zone, ensure you have IN A records present for the host names you wish to create certificates for. These can point to the same IP address.

Some examples you may wish to create are:

Where server1 is the first server to create a certificate for.

Installing EasyRSA

The installation command is as follows:

# apt install easy-rsa

Build the CA folder

For simplicity I’ve put this under /home, but there are likely to be more secure options particularly if you are intending to use EasyRSA in a production environment:

# mkdir /home/myca && cd /home && make-cadir myca

Edit the vars file

This file contains a number of settings to customise for your installation. Updating the following lines in the file to suitable values will complete this step:

# These are the default values for fields
# which will be placed in the certificate.
# Don't leave any of these fields blank.
export KEY_COUNTRY="UK"
export KEY_PROVINCE="England"
export KEY_CITY="London"
export KEY_ORG="Your_organisation_name"
export KEY_EMAIL="Your_email"
export KEY_OU="Your_organisation_name"
# X509 Subject Field
export KEY_NAME="Your_default_key_name"

Source the vars file

When executing commands with EasyRSA, you’ll need to load the vars file into the current session. Use the following command:

# cd /home/myca
# . vars

Build the CA

Use the following command:

# ./build-ca

Follow the prompts given. Ensure the "Common Name" refers to the DNS name of the server for the CA. Once complete you will have created your root CA issuing authority. We will later export this certificate and add it to the trusted certificates store under Windows, to ensure subsequent certificates are recognised. The security of the information generated at this stage is critical, as it is the basis of all of the trust for the internal CA.

Build the Intermediate CA

In a similar way, generate the intermediate CA using the following command:

# ./build-inter interca.yourdomain

Again follow the prompts, ensuring the Common Name reflects the interca DNS entry created in your Bind (or equivalent) internal DNS service. The CA has now been created and can be used to generate certificates as required.

Build the first certificate

Build a certificate for a host using the following command:

# ./build-key-server server1.yourdomain

Follow the prompts, ensuring the Common Name reflects the corresponding entry in internal DNS for the server.

Add the Root and Intermediate CA certificates to trusted stores

Export the certificates for the root and intermediate CAs to target devices. How this is done is dependent on your setup. By one means or another, ensure you can get the ca.crt file (located in the "keys" folder) and "interca.yourdomain.crt" files onto the local file system of the target device.

On Windows 10, double-click the ca.crt file and click "Install Certificate". Add this to the "Current User" and manually specify the trust store selecting "Trusted Root Certification Authorities". Similarly, install the Intermediate CA crt file under "Intermediate Certification Authorities".

It's not strictly necessary to add the intermediate CA to the trust store, however Windows does provide the service for this and it can sometimes help if a suitable chain is not presented by a remote application due to a misconfiguration.

Install server certificates into corresponding applications

Install the server-specific crt files and associated key files into the required applications. Restart services as required.

Summary

This is more or less the sequence of steps needed to create a rudimentary Internal CA using EasyRSA, though you’d need to implement further steps to achieve a baseline assurance level.

This has significant limitations, both technical and assurance. These considerations plus a couple of other points are worth bearing in mind:

Internal CAs are complex animals, requiring significant policy, standard, and procedure wraps, supporting operational processes and similar. For most businesses, hiring the services of an external consultant is essential if an Internal CA is to achieve the assurance required.

January 16, 2021


Using Imapfilter (the swiss army knife for IMAP mailbox management)

Imapfilter is one of the more useful email management tools available on Linux. Using Imapfilter, you can connect to remote IMAP mailboxes and perform queries and take actions based on user-customisable filters. Periodically running Imapfilter allows simple processes to be implemented. The commands it is able to perform on a remote mailbox include delete, copy, move, flagging.

An interesting feature of Imapfilter is the capability to maintain connections to multiple IMAP mailboxes at the same time, and take actions between remote mailboxes.

Imapfilter is configured using the Lua programming language and specified in the config.lua file usually located under /home/user/.imapfilter folder.

Some of the example queries that can be sent to an IMAP mailbox include:

For example, using Imapfilter the following activities could be performed:

One potential use case is managing legacy mailboxes: by migrating emails directly using Imapfilter, one can easily avoid the pitfalls of some of the common alternatives. For instance, repeated polling using some webmail solutions can leave original messages behind, decreasing available quota until the mailbox reaches capacity.

Here are some examples of config.lua and further discussion of use cases and example code:

https://fossies.org/linux/imapfilter/samples/config.lua

https://raymii.org/s/blog/Filtering_IMAP_mail_with_imapfilter.html

January 15, 2021


Overhauling QoS on my Junos SRX

Welcome December! In my previous blog post, in haste I neglected to test my QoS configuration and found that using the interface stats command (show interfaces extensive ge-/X/X/X) all of the packets were being sent under the Best Effort class.

This evening I updated it to follow the Juniper multifield classifier, and also enabled DSCP.

Under firewall > family inet, I changed the filter to the following:

        filter mf-classifier {
            term ssh {
                from {
                    protocol [ tcp udp ];
                    port 22;
                }
                then {
                    forwarding-class Premium-data;
                    accept;
                }
            }
            term counterstrike1 {
                from {
                    protocol [ tcp udp ];
                    port 27015;
                }
                then {
                    forwarding-class Premium-data;
                    accept;
                }
            }
            term counterstrike2 {
                from {
                    protocol [ tcp udp ];
                    port 27020;
                }
                then {
                    forwarding-class Premium-data;
                    accept;
                }
            }
            term counterstrike3 {
                from {
                    protocol [ tcp udp ];
                    port 27005;
                }
                then {
                    forwarding-class Premium-data;
                    accept;
                }
            }
            term counterstrike4 {
                from {
                    protocol [ tcp udp ];
                    port 51840;
                }
                then {
                    forwarding-class Premium-data;
                    accept;
                }
            }
            term wificalling1 {
                from {
                    protocol [ tcp udp ];
                    port 500;
                }
                then {
                    forwarding-class Voice;
                    accept;
                }
            }
            term wificalling2 {
                from {
                    protocol [ tcp udp ];
                    port 4500;
                }
                then {
                    forwarding-class Voice;
                    accept;
                }
            }
            term spotify {
                from {
                    protocol [ tcp udp ];
                    port 4070;
                }
                then {
                    forwarding-class Voice;
                    accept;
                }
            }
            term sip {
                from {
                    protocol [ tcp udp ];
                    port 5060-5061;
                }
                then {
                    forwarding-class Voice;
                    accept;
                }
            }
            term rtp {
                from {
                    protocol [ tcp udp ];
                    port 16384-32767;
                }
                then {
                    forwarding-class Voice;
                    accept;
                }
            }
            term webex {
                from {
                    protocol [ tcp udp ];
                    port 9000;
                }
                then {
                    forwarding-class Voice;
                    accept;
                }
            }
            term CORP {
                from {
                    address {
                        192.168.5.0/24;
                    }
                    protocol [ tcp udp ];
                }
                then {
                    forwarding-class Voice;
                    accept;
                }
            }
            term accept-all {
                then accept;
            }
        }

Then, under class-of-service, I defined:

class-of-service {
    forwarding-classes {
        queue 0 BE-data;
        queue 1 Premium-data;
        queue 2 Voice;
        queue 3 NC;
    }
    interfaces {
        ge-0/0/0 {
            unit 0 {
                classifiers {
                    dscp default;
                }
            }
        }
        ge-0/0/1 {
            unit 0 {
                classifiers {
                    dscp default;
                }
            }
        }
        ge-0/0/2 {
            unit 0 {
                classifiers {
                    dscp default;
                }
            }
        }
        ge-0/0/3 {
            unit 0 {
                classifiers {
                    dscp default;
                }
            }
        }
        ge-0/0/4 {
            unit 0 {
                classifiers {
                    dscp default;
                }
            }
        }
        ge-0/0/5 {
            unit 0 {
                classifiers {
                    dscp default;
                }
            }
        }
        ge-0/0/7 {
            unit 0 {
                classifiers {
                    dscp default;
                }
            }
        }
    }
}

Then under interfaces, I specified the interfaces as (example):

    ge-0/0/1 {
        unit 0 {
            family inet {
                address 192.168.10.254/24;
                filter {
                    input mf-classifier;
                    output mf-classifier;
                }            
            }
        }
    }

This can be verified by issuing the statistics command noted above:

  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0 BE-data                 17300925             17300925                    0
    1 Premium-data                   0                    0                    0
    2 Voice                        125                  125                    0
    3 NC                             5                    5                    0

The definition of these forwarding classes, which are built into the default configuration are (source: Default Forwarding Classes):

My hope (in time I will test) is that DSCP will also be picked up. These are flags that software introduce into packets to assist upstream routers. The story doesn't end here though, as recent versions of Windows will overwite any of these to zero, which is problematic (Cisco Article). Fortunately this can be re-enabled in the registry for non-Dmain-joined hosts as follows:

HKEY_LOCAL_MACHINE > CurrentControlSet > Services > tcpip > QoS

1. Go to HKLM\System\CurrentControlSet\Services\Tcpip\QoS. If "QoS" folder doesn't exist there - create it.

2. Add a DWORD parameter named "Do not use NLA" and assign "1" as its value.

3. Reboot.

I need to spend some time to see if this takes effect - results for the above are mixed in forum discussions. For now my hope is any non-DSCP tagged packets will be picked up by the port and IP based rules at the SRX level.

December 01, 2020


Enhancing Juniper untrusted interface security

Junos on the SRX has a number of additional configuration options that can enhance protection against a range of basic security threats. These are configurable under the screens configuration and cover ICMP, IP, TCP, and UDP.

Here are some example configuration options provided by Juniper:

set security screen ids-option screen-config icmp ip-sweep threshold 1000
set security screen ids-option screen-config icmp fragment
set security screen ids-option screen-config icmp large
set security screen ids-option screen-config icmp flood threshold 200
set security screen ids-option screen-config icmp ping-death
set security screen ids-option screen-config ip bad-option
set security screen ids-option screen-config ip stream-option
set security screen ids-option screen-config ip spoofing
set security screen ids-option screen-config ip strict-source-route-option
set security screen ids-option screen-config ip unknown-protocol
set security screen ids-option screen-config ip tear-drop
set security screen ids-option screen-config tcp syn-fin
set security screen ids-option screen-config tcp tcp-no-flag
set security screen ids-option screen-config tcp syn-frag
set security screen ids-option screen-config tcp port-scan threshold 1000
set security screen ids-option screen-config tcp syn-ack-ack-proxy threshold 500
set security screen ids-option screen-config tcp syn-flood alarm-threshold 500
set security screen ids-option screen-config tcp syn-flood attack-threshold 500
set security screen ids-option screen-config tcp syn-flood source-threshold 50
set security screen ids-option screen-config tcp syn-flood destination-threshold 1000
set security screen ids-option screen-config tcp syn-flood timeout 10
set security screen ids-option screen-config tcp land
set security screen ids-option screen-config tcp winnuke
set security screen ids-option screen-config tcp tcp-sweep threshold 1000
set security screen ids-option screen-config udp flood threshold 500
set security screen ids-option screen-config udp udp-sweep threshold 1000
set security zones security-zone untrust screen screen-config

The IDS profiles must then be attached to the relevant security zone.

set security zones security-zone untrust screen screen-config

The following setting can be used to generate an alarm and allow the packet, to allow tuning of the parameters:

alarm-without-drop;

This should then be reversed to enable packet dropping of packets that meet the thresholds defined.

November 29, 2020


Juniper PPPoE connections that fail and stay down

One of the oddities of the Plusnet Broadband infrastructure I’ve seen occurs around 1am to 2am in the morning. A forced reconnect/renegotiation appears to take place around this time, most or all days, leading to changes in achievable throughput and latency. Whether this is just on my line, the cabinet serving the line, or at a national level, I'm unsure. This is unnoticeable if your equipment automatically reconnects.

I’ve noticed the last few days, after working with the configuration of my SRX, the PPPoE to the VDSL modem drops and stays down around this time.

In my case, tinkering in the Wizard led to an overwrite of parts of the configuration, removing the following lines that are essential to keep the pp interface up:

            pppoe-options {
                underlying-interface ge-0/0/4.0;
                idle-timeout 0;
                auto-reconnect 10;
                client;
            }

A number of the other MTU settings and similar were also rewritten, and needed to be restored.

November 28, 2020


Basic Juniper QoS for home working

Following on from my earlier blog posts, I’ve now implemented some basic Quality of Service using my home Juniper firewall. The idea here is to prioritise the following types of traffic:

The scheme below does not allocate all available bandwidth on a percentage basis. According to the Juniper docs, this should not be a problem: "If a queue receives offered loads in excess of the queue’s bandwidth allocation, the queue has negative bandwidth credit, and receives a share of any available leftover bandwidth. Negative bandwidth credit means the queue has used up its allocated bandwidth. If a queue’s bandwidth credit is positive, meaning it is not receiving offered loads in excess of its bandwidth configuration, then the queue does not receive a share of leftover bandwidth. If the credit is positive, then the queue does not need to use leftover bandwidth, because it can use its own allocation." [1]

This is easy by adding firewall and class-of-service sections to the configuration editor. I’ve created a scheduler map called "my-sched-map" and assigned this to each of the ports on the firewall. I also define three schedulers, for best effort, "class2", and network control. What I'm attempting to do below is identify certain types of traffic and assign it to class2 scheduler and prioritise the packets involved.

The scheme below has some limitations. First off it matches ports unidirectionally for both TCP and UDP. Second the reservation scheme is simplistic. However it's simple to implement without descending into a spaghetti.

Here’s the configuration:

firewall {

    family inet {

        filter classify-traffic {
            term ssh {
                from {
                    protocol [ tcp udp ];
                    port 22;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }
            term counterstrike1 {
                from {
                    protocol [ tcp udp ];
                    port 27015;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }            
            term counterstrike2 {
                from {
                    protocol [ tcp udp ];
                    port 27020;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }            
            term counterstrike3 {
                from {
                    protocol [ tcp udp ];
                    port 27005 ;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }            
            term counterstrike4 {
                from {
                    protocol [ tcp udp ];
                    port 51840  ;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }            
            term wificalling1 {
                from {
                    protocol [ tcp udp ];
                    port 500;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }
            term wificalling2 {
                from {
                    protocol [ tcp udp ];
                    port 4500;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }
            term spotify {
                from {
                    protocol [ tcp udp ];
                    port 4070;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }
            term sip {
                from {
                    protocol [ tcp udp ];
                    port 5060-5061;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }
            term rtp {
                from {
                    protocol [ tcp udp ];
                    port 16384-32767;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }
            term webex {
                from {
                    protocol [ tcp udp ];
                    port 9000;
                }
                then {
                    forwarding-class class2;
                    accept;
                }
            }
            term corp {
                from {
                    protocol [ tcp udp ];
                    address {
                        192.168.5.0/24;
                    }
                }
                then {
                    forwarding-class class2;
                    accept;
                }                               
            }
            term accept-all {
                then accept;
            }
        }
    }
}

class-of-service {

    forwarding-classes {
        queue 2 class2;
    }

    schedulers {
        best-effort-sched {
            transmit-rate percent 40;
            buffer-size percent 40;
            priority low;
        }
        class2-sched {
            transmit-rate percent 30;
            buffer-size percent 30;
            priority high;
        }
        network-control-sched {
            transmit-rate percent 5;
            buffer-size percent 5;          
            priority medium-high;
        }
    }

    scheduler-maps {
        my-sched-map {
            forwarding-class best-effort scheduler best-effort-sched;
            forwarding-class class2 scheduler class2-sched;
            forwarding-class network-control scheduler network-control-sched;
        }
    }

    interfaces {
        ge-0/0/0 {
            scheduler-map my-sched-map;
        }
        ge-0/0/1 {
            scheduler-map my-sched-map;
        }
        ge-0/0/2 {
            scheduler-map my-sched-map;
        }
        ge-0/0/3 {
            scheduler-map my-sched-map;
        }
        ge-0/0/4 {
            scheduler-map my-sched-map;
        }
        ge-0/0/5 {
            scheduler-map my-sched-map;
        }
        ge-0/0/7 {
            scheduler-map my-sched-map;
        }
    }
}

[1] https://www.juniper.net/documentation/en_US/junos/topics/usage-guidelines/cos-configuring-scheduler-transmission-rate.html

November 26, 2020


Home firewall optimisation for WFH

As I am currently a home worker due to Covid-19, I’ve become increasingly interested in how I can create a rock-solid networking environment for my corporate ICT equipment – free from interference, network contention, and similar. The idea of everything co-existing on a relatively flat network provided by a fairly cheap ISP broadband router quickly lost its appeal, and I started to think about how I could create a more enterprise-class network.

I've achieved this using fairly standard kit:

Now that my Broadband appears to be stable (it’s been stable since lunch time using the Draytek 130 I talked about in my previous blog post – very pleased to say the least), my time working in my Juniper firewall’s configuration pages reminded me that I wanted to write up about how I’ve approached configuring my home firewall, in case it is of use to others, and also reflect on some broader home network desirables.

In a typical home there are lots of device types, and from my point of view they can be lumped together into broad types.

In common with other firewalls, Junipers can isolate individual ports, creating security zones. This is easily achievable from the Juniper configuration wizard. On top of that, the inter-zone communication policy can also be defined, both between the zones defined for an internal network and for the Internet (WAN) zone.

In the above scheme, my approach has been to fan-out as many IoT ports as possible, and then divide the other groups into ports for each type (PERS and CORP) and then to switch downstream of those ports. No ports on the Juniper can communicate between one another, and the only communications paths supported are between the zones defined and the WAN.

IoT devices are connected on a 1-2-1 basis into the corresponding IoT port. The rationale here is that IoT devices have significantly weaker security regimes as a generalised observation, and in the event of compromise or malfunction they are each siloed away from one another. Additionally, I segment VOIP telephony onto its own port to avoid contention/performance issues.

Segmentation obviously has benefits in preventing traffic from reaching other parts of the network, which can help contain malicious code or threat actors. A flat network provides all the incentive needed for an actor to move laterally across a wide range of devices. By using segmentation, an additional benefit is enhanced performance. Rather than heap all the home network traffic onto a consumer-grade broadband router, including broadcast traffic spilling across all devices, it’s possible to distribute load across a variety of devices, reduce broadcast traffic, and improve performance overall.

The final benefit from segmenting ports at the firewall in this way is the opportunity to create traffic management policies to cap data rates and potentially introduce a level of Quality of Service (QoS). It is fairly easy to rate cap ports on enterprise firewalls, and this provides a useful way of controlling contention for the Broadband connection itself.

Other benefits are the ability to segment DMZ services to the outer world, port mirroring for IDS, and redirecting DNS requests to a security-filtering DNS server, to name a few.

I’ll try to write some more in the future about specific Junos configuration settings.

The take home messages from this and my previous blog post are along the following lines:

The downside is that all of the above involves a not-insignificant amount of technical knowledge and interest.

November 25, 2020


Broadband survival adventures

So Covid-19 has led to many people working from home on an extended basis, and I am in that camp. For most of the time I have found my Broadband provider, Plusnet, to be reliable. However, in recent weeks this has substantially degraded to a point of having 30-60 dropouts a day. The pace of Plusnet’s combined with OpenReach’s support team has, in all honesty, left a lot to be desired and I am contemplating whether I’ll be a long-term customer.

On the positive side, two visits from BT OpenReach engineers has provided a wealth of information about how they deliver the service, and some useful insights. Lots of work in the VDSL cabinet and line tests took place but the source of random dropouts was continuing to evade both me and the BT OpenReach engineers. Error correction has been upped to medium level and the line has been banded by Plusnet. A setting on the VDSL cabinet green port, allocated to the property, was remotely changed from the Diagnostic Centre of Excellence, all the way over in India, last week.

I was hopeful that with BT OpenReach involved the fix would appear quickly before I started to have to spend money on workarounds. I am still none the wiser, and reluctantly have tried to find alternative solutions, with the expense ensuing.

The first solution (the ultimate insurance policy) I cobbled together was to dispense with VDSL completely and use another bearer, the obvious and economical choice being 4G and a hotspot. Fairly standard stuff and easy to get going on Android. Yet I wanted to achieve something more than a laptop link up to my hotspot. As the house has a mixture of Fast Ethernet cabling and several 802.11 SSIDs, I wanted to integrate the lot into the Android AP without breaking the bank - the idea being that all devices in the house from TV to DVD player would use the hotspot as a bearer.

It was easy to do this as some time back in Q1 I had installed a Juniper SRX as the perimeter firewall relegating the ISP VDSL router to become a glorified VDSL modem. I bridged network (and down-stream 802.11 APs) from the SRX WAN port using a Netgear N300 WiFi extender. A simple re-IP and connection of the WAN port to the N300 switched all connectivity to a hotspot running off my Android phone.

This worked fairly well, and was easily usable for Internet traffic, my WFH VPN, and some AV applications such as WebEx. The downsides are relatively high latency and a perhaps 4G network that is not expecting 15+ hard-wired Ethernet devices and WiFi, streaming sticks, and Sky Q, to be presenting themselves through your Android handset. The Android OS build hardcodes IP address space as 192.168.46.0/24, with 192.168.46.1/24 as the gateway, and it is not easy to change it but easy enough to transfer this into the SRX configuration. The configuration I opted for in the N300 was to present the Android AP hotspot under a new SSID albeit with the same key. Moving the handset around to get the best signal became an occasional task. Finally, days of unlimited cellular data seem to be long gone, so I started to rapidly chew through my data allowance, which was the most expensive downside.

Looking at my Plusnet router constantly disconnecting and, it seemed, restarting itself, became frustrating. So, I took on board the positive remarks about the DrayTek product line from OpenReach and sourced a second-hand DrayTek 130 from eBay (https://www.draytek.co.uk/products/business/vigor-130).

This is an ADSL & VDSL2 (FTTC/BT Infinity) Ethernet Modem capable of bridging single IPs or subnets. It also has support for MTU1508 Jumbo Frames, and VDSL vectoring. The beauty of the DrayTek 130 solution is there is no dreaded NAT, so one less point of failure. As a bridging modem it also lacks 802.11, switching and routing functionality, and multimedia support. It is probably the best match to my requirements, dispensing with features I had no need for.

This then makes the Juniper SRX the true perimeter device, exposed to the Internet. This requires PPP over Ethernet functions to be configured and active on the WAN port, as opposed to Ethernet and TCP/IP stack previously needed for the Plusnet router. Phil Lavin’s notes on this were extremely helpful, proposing the following configuration:

interfaces { ge-0/0/4 { description "Plusnet Off-Net WAN via Zyxel Modem"; unit 0 { encapsulation ppp-over-ether; } } pp0 { unit 0 { ppp-options { chap { default-chap-secret "your-password"; local-name "yourusername@plusdsl.net"; no-rfc2486; passive; } } pppoe-options { underlying-interface ge-0/0/4.0; idle-timeout 0; auto-reconnect 10; client; } family inet { mtu 1480; negotiate-address; } } } } routing-options { static { route 0.0.0.0/0 next-hop pp0.0; } } security { zones { security-zone public { interfaces { pp0.0 { host-inbound-traffic { system-services { ping; traceroute; ike; ssh; } } } } } } flow { tcp-mss { all-tcp { mss 1440; } } } }

Source: https://phil.lavin.me.uk/2017/04/juniper-srx-pppoe-configuration-for-plusnet-adsl/

My approach was to use the setup wizard in the SRX to write out the configuration file and as much of the PPPoE configuration that it could, in an attempt to minimise syntax errors in a rather nested configuration file. Then I used the interactive editor to modify the configuration to match Phil’s notes, except for removing the exposed services under "zones > security-zone public > .... > system-services" and leaving the static routing configuration as the SRX created. The username and password were those required by Plusnet.

On the DrayTek 130, the indicator lights are as follows:

Apart from that, it is a plug and play unit with no requirement to access a management interface. Nevertheless you can, and I did, access the interface by using a DHCP-enabled client over Ethernet to apply firmware updates. After deploying the updated configuration to the SRX, viola – the connection came up with ACT flashing, LAN steady/flash and DSL steady. A quick external port scan to detect open ports on the SRX and the job was finished.

Having switched everything back to VDSL, the DrayTek 130 seems to be very stable, without any of the difficulties encountered by the Plusnet One router. My development server on the network runs some home-grown latency scripts and a throughput tester, so I will get some results from that over the next 24 hours and I will update this blog post.

So how much has all this cost? More than I'd prefer as I naturally expect Plusnet to provide kit that works! Cost-wise, the 4G option was £20 for the outlay for a new Netgear N300 and a monthly contact for Lebara of around £15 pcm. The Draytek 130 was second hand for about £40, though they retail new for over £100.

Incidentally, I did not entirely abandon the idea of using cellular data for the house. Some searching around highlighted Lebara offer an unlimited SIM only deal, with no contract, for around 15 GBP per month. Putting this into an old Android handset together with the N300 gives me some fallback in the event of further woes and provides a useful mobile hotspot.

The SRX, interestingly, has an RPM probe capability down to a per-second resolution. This conceivably would allow automatic failover within the SRX itself for two WAN ports – one the VDSL DrayTek and the other the N300 that uplinks to the Android AP. So it might be possible to integrate both approaches as a cheaper alternative to something like BT 4G Assure.

The nice capability in using a VDSL modem in this way is that you can cater for a subnet and not only a single public IP, and present this to firewall. This is helpful if you need to offer Internet services and/or segment outbound traffic in some way.

November 24, 2020


The implications of Covid-19 for Information Security and IT sector

March 2020 amidst the spread of Covid-19 and the UK government imposing the first "lockdown", marked the most significant challenge to IT departments within organisations across the country. Remote access services became the foundation for a monumental re-engineering of working practices, with many cases entire workforces changing into a "work from home" – an arrangement that has continued to persist for nearly an entire year.

If ever there were any tongue-in-cheek interpretations of WFH, these were quickly dispelled as IT functions received the challenge to enable organisations to work in this new context en masse.

It is worth pointing out the significance of this challenge, as most organisations size their resources to support from say about 20% of the workforce accessing IT resources remotely. This scaling up of resource requirements and lots of other associated implications, and the achievement of it, is a testament to the significance of IT within the UK economy. Indeed, this is a motivator of the BCS VitalWorker scheme that was launched over the summer of 2020 [3].

But now that we have seen many organisations in the IT sector address the remote working requirement, can we take a longer-term view of what the information security function (and indeed many office-based functions) of the future might be like? According to a poll mentioned in the Independent, 7 out of 10 organisations surveyed were considering a change to rules and regulations considering how staff have reacted to Covid-19 [1]. 57 per cent of "business owners were already looking at adapting many of their usual practices moving forward" [1]. I have seen through my professional network that some organisations have already started taking action to reduce real estate footprint, e.g. by not renewing leases, instead offering staff passes to office space across the country so they can access meeting rooms and office space on an as required basis.

The business opportunity presented by WFH "at scale" is obvious. Organisations can scale back often expensive resources, prime real estate, the associated running costs, and change their organisational working practices to assume WFH is the default for many.

In the information security space, this presents some interesting ideas. Can a Security Operations Centre run on an entirely remote basis? Can your information security risk team be run remotely? In many cases this is entirely possible with additional benefits. It’s also worth mentioning that much of the home working technology we take for granted in 2020 has only been made possible by innovations by IT professionals and the contributions made by cyber security professionals over many years prior.

There are some sub-surface considerations. In the WFH mode, information security and IT functions could potentially benefit in several ways:

For an information security function, many of the day to day activities translate entirely to WFH. This offers the potential for a job market being created that is entirely dynamic and flexible, with the possibility of addressing inequalities between geographical regions in the UK. It will not matter so much whether a person lives in vicinity to a workplace if the potential were realised. This is especially important in light of the scarcity of resources, particularly in cyber security. There may also opportunities within a WFH environment to improve diversity in the workforce, for example an employee base that spans the regions of the UK and perhaps represent the customer outlook more accurately.

But there are also new challenges that also must be considered. The impact of sometimes solitary working needs to be fully understood. While technology allows for communication between employees online using video and audio, and other tools are available, it is too early to say how this affects the working dynamic on a large scale.

The information security implications of home working extend beyond the feasibility of transferring information security teams to the same way of working. These implications include:

Many businesses will see some of this as an opportunity. The cost of networking infrastructure in the home is borne by the home worker, which relieves employers of the installation and maintenance implications. The opportunity to use personal ICT equipment could add further cost savings into the mix. However, some aspects will require further analysis and strengthened security controls may be appropriate.

There are also personal factors in the WFH mode that can have a significant bearing on whether an employee can realistically work from home on a permanent basis. These factors relate to the "duty of care" interests of employers and mean WFH is not a viable prospect for all employees, and where accommodation must be made. In any WFH scheme, thinking beyond the flatscreen display and laptop will be critical. This also includes "lone working" and the risk management requirements that entails.

The digital divide, a term common in the late 1990s and early 2000s, is also highly relevant when we look at moving IT and information security functions to a remote working mode. This term refers to the gap between demographics and regions in terms of their access to modern ICT technology and those that do not have access. A simple example could be the availability of Fibre broadband technology, but it is more broader and encompasses all information and communications technology. Could we see WFH contracts of the future require that an employee to achieve a minimum broadband speed wherever they choose to live? This could be a possibility. In a sense a WFH arrangement is at the mercy of the national communications infrastructure, offered by the likes of BT and Virgin Media. This interactive map tool from the FT makes it clear how much this can vary in practice.

At a micro-level the performance of individual home connectivity solutions will be fundamental to employee performance. Here employees invariably use consumer residential-grade connectivity, which has neither the Service Level Agreements (SLAs) one would expect to see (either explicitly or implicity) in corporate environments, or underlying technology infrastructure that supports resilient communications. Home workers have one route out of their property for Internet connectivity, that has to transit a significant number of dependencies before reaching resilient IP-based infrastructure, in turn then transiting Internet backbones back into the corporate environment. This has considerably more complexity than most in-office network access. A study by EY showed that 26% of users reported inconsistent broadband had caused difficulties when working from home, and also noted the implications of Covid-19 on support services within communications providers [5].

We've started to see interest in resilient communications being addressed by communications providers. While the UK VDSL and VDSL2+ broadband infrastructure is the obvious port of call for home working connectivity, the advances in cellular technology has been astonishing since the introduction of 3G (UMTS) in the UK in 2003. This was then followed by HSPA offering up to 7.2Mbits/second and then up to 21 Mbits/sec with HSPA+. Long Term Evolution (aka 4G) and then LTE-Advanced have recently transformed both the throughput and latency of Internet communications. This has highlighted the viability of 3G and 4G as a bearer for Internet access, which is now common in rural environments. Crucially, the arrival of hybrid 4G/VDSL routers offer the prospect of resilient connectivity that seamlessly switches to LTE carriers when broadband becomes unavailable (for example, BT's 4G Assure service). This has made the potential of relatively resilient home connectivity apparent.

Aside from the technological resilience of the communications service, we also have implications at Layer 3 as a result of mixing residential and corporate packet forwarding. Corporate applications, particularly VOIP and video conferencing (aka "inelastic applications"), are less tolerant to packet drops, delay, jitter, and out-of-order delivery than applications like web browsing and email (aka "elastic applications"). The most widely adopted strategy - over-provisioning - may not be viable in some cases, suggesting a role for Quality of Service. Evidence to date suggests broadband providers are unwilling to introduce support for QoS. This contrasts with corporate networks where the potential for end-to-end QoS is clear. There is potential within the home environment to achieve some level of QoS, such as using Wi-Fi Multimedia (WMM) (802.11e) that provides traffic prioritisation for Voice, Video, Best Effort and Background transfers, or router-specific QoS capabilities, but this is dependent on user expertise to identify settings and configure appropriately.

Coupled with innovations in home connectivity, the wider implication for businesses may lie within business process engineering - ensuring businesses processes are themselves resilient to personnel being unavailable due to connectivity issues.

We should also be mindful of the physical environment home workers operate from. There has also been some interesting discussion about house builders taking into account home working in new build projects such as creating communal work spaces [2], and it will be interesting to see how this unfolds both for existing housing stock in the UK and new builds.

What has struck me in 2020 is how different this situation would have been were this to have taken place in the mid-1990s, when the emergence of ADSL was in its infancy and many were using 56K modems (or indeed less!) to connect to the Internet. The convergence of a wide variety of technology, systems, and requirements, on Internet protocols and solutions has also been a significant enabler this year – think VOIP, audio and video conferencing, etc. combined with vast reductions in the cost of transacting business in the form of calling rates and Internet access.

It is early days as we review the implications of Covid-19 on future working practices. There have been considerable successes – witness the adoption of Microsoft Teams and Zoom for instance. But we need to understand more about the broader implications of fundamentally changing working practices, considering more than technology. There is also an important role that is already being fulfilled by our national and international professional associations, such as BCS [4].

Covid-19 while obviously awful in its direct impact could potentially bring about indirect changes to way IT and information security operates across the UK economy. The consequence of bringing many workforces entirely into a WFH mode will crystallise minds as to what this could mean in terms of the future workplace and how this could benefit organisations and the wider UK economy.

References

[1] https://www.independent.co.uk/news/business/news/working-home-cheap-save-office-business-boss-a9542781.html

[2] https://www.buyassociation.co.uk/2020/05/21/construction-new-developments-focus-on-home-working-after-coronavirus/

[3] https://www.bcs.org/more/about-us/press-office/press-releases/bcs-campaign-recognises-the-vital-role-of-it-workers-in-our-national-life/

[4] https://www.bcs.org/

[5] https://www.ey.com/en_uk/tmt/broadband-quality-and-resilience-a-key-consumer-concern-during-covid-19

November 21, 2020


Bitlocker, SSDs and full disk encryption

A concerning development for Windows 10 has been the discovery that BitLocker can sometimes opt to use unreliable hardware cryptographic capabilities over CPU-based software cryptographic capabilities. This Hardware Based Encryption (HBE) can be enabled when BitLocker is configured for an SSD.

The university research that publicised this weakness was published back in 2018, but it’s now that many corporate environments are starting to move off earlier versions of Windows and turn to Windows 10 in earnest. Moreover, SSDs are now becoming commonplace across a wide range of consumer and enterprise products.

What, you might say, is the problem with HBE? Surely, by using hardware encryption we are offloading all that CPU-cycle hungry encryption and decryption, thereby getting every ounce of performance out of our systems?

It turns out that not all hardware implementations are created equal, or, indeed, capable. Some SSD chips have some alarming defects in their implementations, meaning the performance gain could easily be reversed by the increase in risk from using a flawed cryptosystem implementation.

Some SSDs have an empty master password, and it is that password (or rather, lack of) that protects the keys that are used to encrypt the data [1]. Research by Radbound University shows that a "malicious expert with direct physical access to widely sold storage devices can bypass existing protection mechanisms and access the data without knowing the user-chosen password". Admittedly, this hinges on the device concerned and not all devices may be vulnerable.

However, if devices are used that have weak HBE, this significantly changes the risk dynamic around lost or stolen devices, as data recovery may be relatively easy.

How can I found out whether my devices are affected?

With administrative privileges, it’s easy to check if your current device is using HBE or SBE. From PowerShell prompt with administrative privileges:

PS C:\WINDOWS\system32> manage-bde -status

And look for the "Encryption Method" line. If this states "Hardware Encryption" then HBE is enabled.

What can be done to avoid this?

The first step is to execute a risk assessment and treatment process, to ensure all risks are captured and managed. This sounds laborious, but these kinds of issues are always best progressed from a thorough risk assessment. Your risk may be relatively low if you are using a verified encryption implementation in an SSD.

The next step is to review Group Policy configuration. Windows 10 Group Policy includes configuration within the BitLocker settings under Administrative Templates that allow the use of HBE to be prevented. This results in BitLocker using software-based encryption, which is clearly preferable. There are lots of other settings within Admin Templates BitLocker, and it pays dividends to research each of these and ensure they are enabled. Settings for secure boot, use of TPM, PINs, and more are by default not configured and should be, particularly in corporate environments.

It is also sensible to use firmware update services from SSD manufacturers, including periodic updates of SSD firmware. On consumer devices, using the manufacturer SSD management tool (for example, Samsung Magician) can provide firmware updates automatically.

More broadly, when creating gold builds for new Windows 10 devices you should always follow best-practice hardening guidelines. There are many of these online, and you can leverage a lot of expertise from them. FDE should always be reviewed as part of a BitLocker deployment.

An alternative approach is to use an encryption product that only supports SBE. VeraCrypt is recommended by some, and some enterprise FDE products also fit the bill. This can be a better approach as you can minimise any potential risk of a change to Group Policy undermining FDE policy.

What is the cost

Encryption always comes at a cost, particularly when we opt for SBE over HBE. Since the advent of newer chipset architectures that have optimised crypto instructions (such as AES), the CPU is able to execute cryptographic operations faster than the latency of SSD I/O. This means the CPU is not the bottleneck when compared to the performance of the SSD. However the CPU crypto cycles will exact an overhead. Microsoft state an overhead of around 5% for BitLocker, though empirical tests have shown overhead of up to 10%. Most applications, particularly EUC, rarely fully utilise the CPU at all times, so the impact is relatively small from using SBE.

Intel processor families that contain the enhanced functionality for BitLocker performance (AES-NI) include Skylake, Goldmont, Haswell, Broadwell, Ivy Bridge, Sandy Bridge and Westmere. Equivalent AMD processor families include Puma, Jaguar and Zen (including Ryzen).

The abstract from the paper

We have analyzed the hardware full-disk encryption of several solid state drives (SSDs) by reverse engineering their firmware. These drives were produced by three manufacturers between 2014 and 2018, and are both internal models using the SATA and NVMe interfaces (in a M.2 or 2.5" traditional form factor) and external models using the USB interface. In theory, the security guarantees offered by hardware encryption are similar to or better than software implementations. In reality, we found that many models using hardware encryption have critical security weaknesses due to specification, design, and implementation issues. For many models, these security weaknesses allow for complete recovery of the data without knowledge of any secret (such as the password). BitLocker, the encryption software built into Microsoft Windows will rely exclusively on hardware full-disk encryption if the SSD advertises support for it. Thus, for these drives, data protected by BitLocker is also compromised. We conclude that, given the state of affairs affecting roughly 60% of the market, currently one should not rely solely on hardware encryption offered by SSDs and users should take additional measures to protect their data.

References

Radboud University researchers discover security flaws in widely used data storage devices, Radbound University, 2018, link

June 13, 2020