One of the reasons why I ***HATE*** Windows…

A few weeks ago, I posted an entry about why I use windows. Here is part of why I hate it, and why I do so less than willingly (to put it mildly).

For some months, I have been having a worsening problem with my browser and windows as a whole not being able to do DNS lookups. If you are one of those who is not aware of what this means, think of it this way… you want to drive someplace, but you don’t have the address or know how to get there. So, you try to make a phone call to a friend who knew where it was… only… that does not work. So while my browser knew the name of sites like “www.google.com”, “www.gmail.com” or “www.facebook.com”, it could not get this translated to values like “172.217.15.68” to allow me to do what I wanted.

Now, with a problem like this, people typically talk about changing your network settings to use a different DNS server to give you these sorts of answers, just like you might call a different person to give you the address. One reason you see this is because most folks do not run their own DNS server, and do not have the ability look at things in details on the DNS server end of things… but I do, in part because it serves up addresses only it has at hand. But, this also means that I have a much better view on things, to know where the problem is and is not… and rather than blame some DNS server, I know the problem is inside the machine which is running Windows… someplace. And here is how I know this…

As I sit here, as with any other day, besides the browser windows/tabs, I have connections open (using SSH) to terminal sessions on various Linux servers, along with a command.com session, and more. And add to this that for the past month or so, I have been capturing all network traffic associated with my DNS servers. And so, here is what I know/see.

  • To start with, my browser or other applications report that they are unable to look up the remote host. If you use Chrome, you probably remember this as the screen with the T-Rex which you can also use to play a game jumping over cacti and the like like this…

Or the just as frustrating and non-informative one like this…

  • I then pull up my command.com session, and do things like:
    • Ping 8.8.8.8 (one of Google’s DNS servers… that works fine).
    • Ping my own DNS server at 192.168.xxx.xxx, which also works fine.
    • Do a ping www.google.com or some similar host, which fails, telling me that the host name could not be found.
    • Do a nslookup www.google.com which also fails.
  • I then pull up one of my terminal sessions on a machine other than one hosting my DNS, and do a ping www.google.com just like I did before, and that works.

At this point, I know the problem is almost certainly inside Winblows, but I confirm this by doing the following:

  • In the command.com window, ipconfig /all shows the proper DNS servers.
  • I do a ipconfig /flushdns which reports it succeeded, but everything still fails.
  • Even doing a ipconfig /release and then a ipconfig /renew does not fix the issue.

Generally, by this time, my antivirus (AVG) refuses to respond, and I cannot turn it off temporarily. And only the solution is to reboot. But the real kicker… having Winblows diagnose the problem gives me the following screens…

My response to this is…

The real kicker is when I open up my network traffic captures… when this is happening, there is absolutely no sign of any DNS requests from my Winblows box. NOT ONE FRELLING PACKET!!!

Now, if there were log files (nothing shows up in the Event Viewer), maybe I could figure out if for example, AVG ended up wedged or swapped out. Or perhaps it is the Winblows DNS client. But after months of looking, nada, zilch, nothing to indicate where I might find logs or enable debugging. And this leaves me wanting to do this with Winblows, AVG and the rest…

But for now, between the cost and other issues with AVG on top of this problem, I am starting by switching to a different anti-virus. I may or may not come back to AVG, which I started using some years ago because it supported both my Winblows machines and my Android cell phones… but at this point, I am severely disinclined to do so.

Really Chrome??? WTF!!!

I have been using Chrome for many years now, having become a convert from Firefox when it first came out. But it has been getting to be a persistent pain on a number of fronts, and I have almost reached my Popeye moment, where “I have had all I can stands, and I can stands no more…”

One of the first items which started becoming an issue was the memory use. Some time back, Chrome started getting really bad about is memory use, and it became even more visible when it started breaking out every… single… frelling… frame… on… every… single… tab. And this was true even if the tab had been in the background for hours, such as ones for Facebook, PHP.net, Python.org or whatever I happen to be working on that day. But routinely, Chrome is using over 2GB of RAM on my laptop with 4GB, even when I have just half a dozen tabs open (FB, 2 gmail, Nagios, Jenkins, and this one), having recently restarted, I have this in the task manager…

Notice how much the browser itself is using… 625MB… roughly the amount of an entire frelling CDROM. And I have not even done any real scrolling on any of these!!! But shortly before this, I had the following, from where I had last night been working with Jenkins builds, reading email, watching a few YouTube videos, and replying to some friends posts on FB… and then killed all the tabs before going to bed, rather than just restarting the browser.

Yes, your eyes are not deceiving you… over 1GB of RAM for just the browser itself.

Now, is it a memory leak, or what?? I think that given the 625MB which it started at, it is more just a design issue. And the reason I say this is that I have seen similar design issues, like MicroSoft wanting to load an entire 9GB database into memory when you wanted to back it up, or all the infamous “blue screen of death” crashes, or countless others by many other companies. And why is this?? In this case, it is because I have long been used to working with programs and having resource limits which both the application and the operating system itself put into place. If I want to open a 512MB log file, depending on various things, I can either be told “No, you need to break the file down into smaller parts”, or I get asked if I really want to open that large of a file. And at one point, browsers used to have controls on how much space they would use, both on disk and in memory, before they would just reload things from the network. But following in the infinitesimal wisdom of MicroSoft, Apple and others, things have gone beyond the point of hiding settings in a screen someplace, to in some cases, just making it something you cannot set, or have to use some poorly, if not undocumented command line option which requires changing something in the windows registry or in the shortcut properties (if you are on one of those platforms). But it forgets that basic idea of resource limits… if any site goes over a certain size in terms of memory usage, just like good operating systems and well designed applications do, Chrome should be saying “Are you really sure”, or be saying “This site wants to use too much memory, click here to adjust the limit for this site”. And, there needs to be a better way to collect information about all the browser tasks, and a better way to pass this information back to the development team. And more fundamentally, Chrome should support putting tabs to sleep when they have not been active for more than a short time, and especially getting all the frames under control. (But let us be honest… they will not, because Chrome comes from Google, which makes obscenely massive amounts of money from ads of all forms, and I think we all have seen the news articles such as this, and so to put them to sleep and potentially reduce their revenue…not likely.) But there is more to my growing dissatisfaction than just with the memory usage.

Another issue is that with the latest couple of updates, Chrome has not been filling in passwords which I have told it to save. It would be one thing if my passwords were a mix of things like “42 is the answer!”, “G0 Buckeyes!” and such, but they are not. My “simple” passwords might be like ‘<F7FZihp’ or ‘tqBfj0tf’ (if I have to type them… that last being an example of using letters from a memorable phrase such as “the quick brown fox jumped over the fence”, then playing with numbers and letters. But then, more often, I am using passwords like ‘?~<$7B62NO$n$+;;LU:,’, consisting of something between 16 and 24 characters (depending on the site, or perhaps more), generated and stored by a program, and also remembered and supplied by my browser… WHEN THE FRELLING BROWSER WORKS!!! And as you can guess, the past few updates of Chrome have not been working. And while one might ask “Well, did they change the login form?” or other things, I know it is this way across the board… even on sites which I have written. Even my test sites, residing behind a firewall and absolutely inaccessible, use passwords like these.

And another issue is speed. This one seems to be worse when the browser is sucking up memory, so I suspect partly that the Windows 7 box I use much of the time is in such a bruised and battered state, that it is like two boxers who have gone 15+ rounds, cannot see, can barely stand, much less do what they are expected to do… routinely, I see pages taking 20-30 seconds to load, with no clues if it is the browser, Windows and its network stack (which would also include the antivirus software), or what. I really should look into things like what the server side sees with a packet trace sometime, but most of the time, I am mainly doing something else, and bringing up some documentation, responding to a message somebody sent me on FB, pulling up an email, or something else… and I know it is not my network… my servers are all reachable and talking among themselves fine, my FTTP connection is fairly idle, etc. And so…

But given all this, I am seriously considering moving back to Firefox as my primary browser.

SELinux and Tuleap (part 1)

I have been looking at tuleap for a personal Agile tool, to help me track tasks as I work on personal coding projects. For example, I might be working on a new version of a disk partitioning script to use with my kickstart installs, and come up with ideas I don’t want to forget. So, to keep track of it, I have been creating tasks in Eclipse Mylyn using the stand-alone task list. But that list can be less than optimal, and it does not integrate with things like Jenkins, etc. Well, I took a little bit of time today to read up on the installation and get it up and running. Unfortunately, at the bottom of the requirements is the following line:

You must disable SELinux prior to the install.

To me, this is a huge issue… not quite to the point of storing passwords, social security numbers, credit card numbers, and such in cleartext. Indeed, in my book, passwords should be stored using a secure, one-way hash, except when it is a password needed by a system to connect to another system, and those should be stored encrypted, or at least as secure as possible. And as for social security numbers, they should be treated like passwords, but only stored if ABSOLUTELY NECESSARY!! As for credit card numbers… if anyone can show me a valid reason why a server should ever have to store one, with or without the CVV, outside of a very transient submission queue… I will be absolutely shocked. But when it comes to disabling SELinux outside of a development environment, to me this is perhaps one step down from those. The reason I say this is that SELinux was created for a very good reason… to help place limitations upon processes/applications to keep them from being able to do things which they should not. And to disable SELinux is just pure laziness.

A number of years ago, a client of mine wanted to use Zend Framework with the community edition of Zend Server, and I ran into the same thing during the install of that package. Just like tuleap, you had to disable SELinux before installing, and leave it disabled. And for a web application, this to me is about like putting a sign pointing to the pocket where your wallet is at. When I was done with the first install for that project, I had an install wrapper script which temporarily disabled SELinux, but only long enough to install it and then patch up the security modules so that I could turn SELinux back on. And when done, I sent a polite but scolding letter to them, telling them how this was a huge mistake, and gave them the information they needed to fix things in the RPMs. And tomorrow (or should I say later today), I will be using tools like ausearch, and beginning with trying to login, I will be forking the repos up on GitHub, creating patches, and begin solving this issue with a SELinux policy. And as I find more things which need fixed, I will add those as well. But this is a major piece of technical debt for which I will be opening a critical security bug, as soon as I have the beginnings of a patch ready to include. Because, regardless of what they think, it is that big of an issue.

Google Chrome Frustrations

As a developer, it is not often that a developer or developer team makes me go WTF, and has me envisioning conducting a test of both electromagnetic repulsion and the Pauling Exclusion Principle using their head and an available desk or wall, but today, the Google Chrome team has done it twice. Congratulations to them for setting several new records (minimum interval between occurrences, and the more than once in a day).

The first item is a common occurrence for me, and can sometimes happen with just a handful of tabs, or it can happen when I am having a tab-crazy day going to sites, opening new tabs to read various pages of documentation, etc. And every other day or so, I pull up the menu, open the task manager, and find one of the browser tasks playing Jabba the Hut, just sitting there big and bloated, slowly laughing at me as it consumes a GB or more of RAM (IIRC, I have seen over 2.1GB, and I only have 4GB of RAM on the machine). Sometimes, it is a task which is handling a site such as Facebook or even gmail, and at other times, it is the main browser task. Indeed, right now, my main browser task is reporting a memory footprint of just over 675MB, and a tab handling Facebook is around 570MB… which is mild. If it is a task other than the main browser task, I will often kill that task, and then reload the tab, but if it is the browser task, I have no option but to enter chrome://restart in the URL bar and restart the entire browser. And while I can open up the developer tools and grab a memory snapshot for the former (if it has not grown too big), there is no such option in the case of the main task.  But the thing is, there really should be no reason for a task to grow beyond around the 500MB point, and even then, it should only happen on a site which has lots of media on a very long page (e.g. Facebook). And even then, that is what disk caching is for, and generally indicates some stupid programming error like a memory leak, or just trying to do too damned much in RAM. And, in most cases, one puts in place an adjustable resource limit which says “Nope… free some stuff up first!” when you try to allocate too much. Why Chrome does not have such a process in place, given its nature, is beyond me.

The second item, I hit while working on a script which would allow me to automatically renew the SSL certificates on a NAS appliance I have setup. I had been using CAcert for signing my certificates given they are not charging, much less charging a mint for signing, but there are a few issues with it.  One issue is that the folks at Mozilla refusing to add their signing certificates to the trusted list which is used by pretty much everybody. Every time the CAcert folks seem to have addressed issues raised the last time they tried to get added to the list, there always seems to have been a new issue, so that using certificates signed by them require importing their root certificates. While for an internal site, that is no biggie, for an external site, that would mean you having to import those certificates to read this page… big NOPE. The second is that while renewing a certificate is just a matter of going to the website and clicking a button or three, I then have to copy/paste the new certificate and put it where it needs to go. And having to do that every six months for multiple sites/services… Yea… But more about that at the end… in the meantime, what had me once again thinking of taking some developer, PM or suit on the Chrome team, and repeating the test over and over while saying “What… the frell…were you…thinking?  Or did…you even…stop to… think about…this possibility??” Google, through their Chrome team, has been driving a HTTPS everywhere initiative, and now, regardless of how a site/program/appliance is configured, Chrome insists on switching over to HTTPS, and provides no way to use the hostname to access it via HTTP. No “Let me do this.  Yes, I am sure!” type dialog of any variety, no site setting… nada… just this…

So, after taking a bit of a break today, when I came back to this to try to debug the program which uses a halfway documented REST API, I could not use Chrome to access the WUI (Web UI), because the certificate had expired, and I use internal subdomains of my domain. Now mind you, I think that the HTTPS Everywhere initiative is the best thing since a meatloaf sandwich, and the work done by the ISRG, EFF, Google and others is great on the whole, but that is like saying someone did a great job at clearing a minefield to turn it into a school playground, when they missed at least one landmine. Worse… this application uses its own internal database to store its configuration, and all configuration is done through that same WUI Chrome is not allowing me to access to update the expired certificate.

Now, at this point, there are a couple of options…

  1. Use the numeric IP address.  Thankfully, the application for this appliance does not redirect to or rely upon the hostname, like some do.
  2. Setup and use an address in one of the gTLDs (e.g. the .com, .org, .net, .test, etc. part of the name) which is not forced to HTTPS.  I think .test is the one they talk about… but if the app relied on the hostname, how to get in and configure that alternate name??
  3. Use a different browser. HTTPS everywhere has not made full penetration into the browsers yet… but what happens if this happens a few years down the line?

All in all, it shows a critically major gap when decisions like what Chrome has done do not account for situations like this, and why when applications do not have the means to update configurations from a CLI, that too is a major design flaw.

Now… one last bit, about the certificate issue. To help handle the HTTPS Everywhere effort, folks like the EFF, Mozilla, Chrome and so many others got together to address the issues such as the cost of signed certificates, etc. They have put up the Let’s Encrypt certificate authority, which has the ACME protocol, to make things happen automagically… but not everyone has managed to integrate things yet, and who knows how many appliance applications are either dragging their feet (such as arguing that a given appliance should not be accessible from the public Internet), or have not managed to figure out how to make things work. And until everyone thinks things through 100%, I expect this sort of frustration to become more and more common, unless the browser folks give you the means to say “Yes, I really want to use HTTP and not HTTPS, as risky as that may be” for at least a given session/tab.