7 Lessons Learned as Production Support or Help Desk Agent
I worked for 7 years in Production Support role. When I explained the type of work that I did, I often related it to calling customer service when your internet or electricity goes out and you call to report them problem. I was a Level 2 agent for 5 years and then Level 1.5 agent for 2 years (L1.5 is a role they made up to keep providing services for the client). During this time period, I learned a lot as well as demonstrated skills that others had not seen before. Hopefully my experiences that I share, help you be successful in your role.
1) Clients Equate A Burning Match to Wildfire
I worked several issues that required to have the client involved in fixing the issue or the issue was reported by the client. In some of those situations, when the client was asked when the issue needed to be resolved by, he or she would say that it needed to be resolved by the end of the day. Mind you, it would be like 3 PM when this question was asked and nobody, not even myself, wanted to stay at work later than necessary.
In some situations, the stakes were high and the deadline was close, so it really was a wildfire. In other situations, the stakes were high and the deadline was at least 48 to 72 hours or more away, so the situation was a house fire that was next to a dry wooded wilderness (meaning it was not serious at the moment but had the potential to be).
If the deadline that was given to you seems unrealistic to accomplish, ask the client why he or she stated that deadline. If you ask the right questions, there is a change that, you will have another day or two to get the issue addressed. However, the client will usually want the issue addressed immediately, as they did not calculate, want, or plan for the unexpected (a.k.a. Murphy's Law) happening.
2) Work Your Hours and Leave
If you do not have an on call rotation for your job, you may skip this section. During my entire time in this role, I had to be on call. There was a primary on call rotation, but they were responsible for the initial triage of tickets that were sent to our group. It was never stated, but everyone was on call all the time for the respective applications that he or she supported. Initially I would sometimes work additional hours in addition to the time being on call. After experiencing some burnout of working 50 and 60+ hour weeks, that changed quickly.
My perspective became that I would work the required 40 hours per week and on call as scheduled, but that was it. While some issues were quick to fix, other issues would take days to fix because of their complexity or effort required. Because of these reasons and more, I worked the minimum amount of hours, as you never know when that major production issue will come in. If it comes in and you already have 50+ hours clocked for the week, you will experience burnout and make more mistakes due to fatigue and lack of sleep.
There was two week span that several major issues occurred for the applications that I was the primary and secondary support for. I had to step up and work both, as the primary for the application that I was the secondary support for, was on vacation. Somewhere during that second week, I made a mistake in resolving one of the issues, which resulted in more work having to be done. My manager had to explain this mistake to the client manager and the client manager asked why I was making these kind of mistakes given that it was not the first time that I had worked on this type of issue and that I was experienced with the clients' systems. My manager relayed this question to me and I did not know why. It occurred to me in hindsight that it was exhaustion from the additional hours having to be worked and doing double the work that I would normally do.
Had a production issue come in for one of my applications while I was working my normal hours for the day. This issue required manually rolling back the data that had been loaded due to bad formatting of the data file. There were some time constraints also, because there was a monthly process that ran related to the client billing their clients. It had to be done and I was the one that had to do it.
In total, I spent over 20+ hours being awake and worked at least a good 16 hours on this single issue. I was glad when the issue was resolved and received no push back on requesting the day off on short notice.
One thing that I have learned is that documentation is an essential part of working in production support and help desk environments. If someone cannot understand what is written, then it does not have any use.
If You Can't Explain It To A Six Year Old
Einstein has been claimed to have said that "If you can't explain it to a six year old, you don't understand it yourself". One of the managers had a similar phrase about if a random person off the street cannot do the steps in the document correctly, then the document was not well written. I agree with both of these statements. The last thing that you want to do is be contacted to resolve an issue that you already wrote documentation for it being misunderstood. If you write a document, and you are the only one that understands what is being done in the document and how to properly perform the steps in the document, that document needs to be redone.
When I first started writing documentation for our Level 1 support group, I would just write the steps or commands to be executed. After a few mishaps with the Level 1 Support team performing steps, I started to include a screenshot of each step being performed. Did this help reduce the number of errors? Yes it did. In some cases, certain tickets never got escalated to me because the document covered all scenarios and was through. If the command was copied and pasted incorrectly (usually due to the documentation website adding special characters or copying text from a Word document and pasting it directly into a Unix or Linux session) or they did not see the button on the screen that they were to click, they could use the screenshot to verify what it should look like on the screen before reporting it as bad documentation.
Good clear instructions with screenshots of the step to perform will reduce, or in some cases eliminate, the human errors that occur with resolving tickets.
Find and RTFM (Read The Freaking Manual)
Larger vendors provide a manual or some kind of documentation for their products. At one point during my career, the client did a major upgrade of Autosys. It was a significant upgrade as they were going from Autosys 4 to Autosys 11. That big of an upgrade meant that there was limited backwards capabilities. From what I understand, no or limited documentation was provided to the development teams and little training was provided to the support teams.
I took it upon myself to find the documentation for the new version that was being upgraded to, as I looking to making adjustments to some jobs that were already in place. Over time, the conversion from one version to the next did cause some production issues. By having the latest Autosys manual document provided by the vendor, I was able to help resolve some of the issues that occurred with the transition. In addition, I was asked to share it with the developers and other support teams because they wanted to see what was different and to be able to get more out of the new system.
4) How to Automate
If an issue is recurring and can be automated, then I would suggest that you automate it. Some people freak-out at the thought of automating things because they feel that if you automate enough, you will automate yourself out of a job. From my experience, that was far from the truth. Automating helped reduce the amount of hours that I worked outside of normal business hours.
You need to know how to write code. Even if you are not great at it, being able to automate mundane, repetitive tasks will save you lots of time over the duration of your career. From writing code from scratch, to pasting bits and pieces of code from Stack overflow and other websites, you can improve your efficiencies and productivity by automating.
If you do not know how to code, you can learn how to do so in your free time. What language should you learn? That depends on what languages or systems that you client uses. As mentioned earlier, mine used Linux, Oracle, and other systems, so I became proficient in writing Shell Scripts and using SQL Plus, just to name a couple.
After a realignment of the team, I became the primary support for a new set of applications. One of those applications had a process that ran on the last three days of the month and the first of the month. Part of that process was that it wrote the date of the first of the month to that file. Problem was that when the process ran on the first of the month, the date was supposed to be for the previous month. For instance, if it was the end of October or the first of November, then we had to make sure that the date in the file was for the first of October. Come to find out, this was a bug that the development team had made a change to, but regression testing was not performed on it. The development team did not consider it urgent as there was a work around available and the business team had back log items that had to be completed by a particular deadline.
What was being done by the agent that previously supported the application is that he was getting up in the early hours of the morning, around 1 AM, to manually update the date in the file and start the process. Once I took over the application, I had to start doing the same. What I would do is work my normal shift and then stay up until 1 AM to make the update and start the process as I did not want to sleep through the alarm as this process not running would have a major business impact.
After about the third month, I had had enough of doing that and looked into automating it. What the previous agent and myself came up with was a shell script that would run at 1 AM, check the current date, and based on a number of if conditions, write the appropriate date to the file.
After a confirmation run of the fix, this fix was presented to the development team. As a result, the development team added it to the code base. Also I did not have to stay up late nights on the first of the month just to edit a text file.
5) Jump the Firewall
In some companies, the technology used is so specific to and designed just for that organization, that there are no resources outside of the company that can help resolve or diagnose issues. This is a difficult for individuals that provide support for these components. Other companies do use more name brand technology products, like Oracle, Java, or Linux. There is plenty of documentation on how to use these products and what the error message or exception messages mean and resolution steps.
Keep the Privates, Private
Going outside of the firewall was frowned upon in some situations as you could risk putting NPI (Non Public Information) or company secrets into the public. I am not suggesting that you ever submit private data or company secrets on a form on the public internet as you do not know where that information will end up.
Errors and How To Do
I am saying that you can look up an error message to find the answer to a problem that you are having to quickly resolve issues. As I mentioned earlier, the less issues that you have to deal with, the more time that you have to work on other things. This applies to non production issues and items.
6) Share What You Know
There is a common notion that if you share what you know, then somebody else can do what you know how to do and then you will be out of a job. While in some industries, that may be true, from my experience it is false. My experience has been that if you know, that allows others to do what you do and allows you to do more and new things.
When I worked on tickets that came in, I was sure to include SQL queries, SSH session logs, application logs, and any other information that I used to resolve an issue. Why? Because the last thing I want to have happen, is to get called for an issue when I am on vacation or to have the on-call person call me to resolve an issue that's simple as restarting a process.
In some situations, the on-call person was able to resolve the current occurrence of an issue, based on the previous occurrence of an issue that had my notes and other information attached.
One of the applications (App A) that I supported, was upgraded and set up on a new virtual machine (Server A). As part of the new VM setup, it also got a new IP address. The old server (Server B) was powered down. An interfacing system (App B) was not able to connect to App A. The individual that was the support contact for App B, said that he could not connect to App A from his computer or App B server. I checked the server for App A, and App A was online and running as it should be. After discussing the issue, we compared the IP address that App B had and the IP address that Server B currently had.
Come to find out, App B had cached the IP address for Server B for App A. Since Server B was offline as it had been replaced by Server A, App B was saying that it could not connect to App A. It was at this point, I had to tell support agent for App B that he had to clear the DNS cache for App B and then try to connect again. He was not familiar with this so he had to reach out someone else to provide him the info on how to do this. Once he cleared the cache and restarted the process, the connection was successful. He was able to document those steps so that if the issue were to happen again in the future, that he would be prepared and not have to get additional people involved.
7) Minimize Distractions
One thing that I have learned, is that modern work environment is not really conducive to getting a lot of work done. I have discovered that some people come to work extra early or stay extra late because they can get work done without being interrupted. What I had to start doing was to minimize the amount of distractions that were coming my way.
That notification that you got an email. That notification of a new instant message. That notification that there are updates for your computer. Turn them all off. If the email is that important, the sender will let you know by following up immediately. If the IM (instant message) is that important, they would have called or came directly to your desk. You can install the updates to your computer when you do not have to be actively working or after you leave for the day.
When I worked, I would listen to music or even audio books to block out the sounds of other people eating, gossiping, on meetings, or the toilet flushing. Sometimes the work required deep focus, such as analyzing the data in a spreadsheet or writing code for the automation that you are trying to build (see item 4).
Email is one of those things that can easily get on your nerves. I had to leverage a number of built in features that Outlook has to get the most of out my email and scheduling.
Rules and Categories
Set up rules and categories in outlook based on keywords or senders that you get emails from. If you get emails that you know does not require you to read or perform any action on, then create a rule that will automatically send those emails to the trash.
Create Meetings from Email
Did you know that you can create a meeting invite from an email? What's more cool, is that the invite will have all of the people that were on the email included in it. All you have to do is open the email that you want to create a meeting for, click the "Meeting" button in the toolbar. The Meeting Request screen will appear so that you can finalize the details. Then you click Send.
By grouping the email conversations by subject, it made it far easier for me to get through all of the emails that were in my inbox. If there was a email thread that I did not need to read or pay attention to, then I could delete the entire thread in a single actions since the thread was grouped together.
Working in the production support / help desk role can be difficult and very stressful at times. Hopefully these tips have provided you some strategies on how to make your help desk / support role easier to deal with.