JChan, post: 3931429, member: 116 wrote:Hail Adventurers,
(TL;DR version at the bottom.)
I'm writing to you today to apologize for all the issues that we've been experiencing with EQ these past few months. We've been moving at a breakneck speed to grow into our newly formed Darkpaw studio and ensure that Norrath is around for many more years to come. Unfortunately, in our haste our "raid wiped." We've failed ourselves and we've sadly failed you, our dear friends, but we've reviewed our parses and we have new strats.
The following are not excuses. As some of you have noticed, we're trying to be more transparent and in doing that, I’d like to acknowledge and provide you information on some issues that we have encountered in the past few months so we can move forward together on the path to great things.
First off, I'd like to address the unexpected down times in the last week of May on the day that Aradune and Rizlona launched. Soon after these servers were unlocked, there was a major outage that affected all of our servers, our forums, and numerous other systems in our infrastructure. In accommodating our growing population and addressing current bug issues, we have been incorporating some new hardware. Unfortunately, the transition with the new hardware did not go as smoothly as planned, which brought on many unforeseen difficulties that we have overcome and a few we are still battling. This took out most of our ability to diagnose the core issue and also took out our ability to use our regular channels to communicate the outage. Once the issue was found and resolved, we focused our efforts to bring up the game servers at the time cost of fully bringing up our diagnostic tools. By not having these internal tools up, we failed to see another growing issue that later caused the second cascading crash that took out all of our worlds for the second time that day. We have taken measures to ensure the initial issue will not happen again. As for the second issue, we have taken steps to prevent it from happening; however, we are still combing through and trying to identify the ultimate underlying causes.
As for the extended downtime on the days of the server merges, you can read about the issues that we encountered at https://forums.daybreakgames.com/eq/ind ... am.266229/
You may ask why we needed to do this at the time we did. Simply put, it was to buy us more time to address some bottleneck issues that have caused crashes for all of the servers. For some of the issues that we encountered with the merge, we have already addressed them. For the remaining issues, we will be converting systems to newer infrastructure and identifying and addressing bottlenecks. Specifically addressing the temporary loss of shared bank items and platinum, we have restored all missing items and plat that were lost due to the merge process. You can find the items in your /item overflow window and the platinum will either be in your shared bank or parceled to your first character (alphabetically) on the account.
Let me address the remaining unplanned outages that have affected all or sometimes a few worlds at a time. Know that we share in your disappointment and appreciate any patience you give us during these times. In the past few months, there was a laundry list of underlying causes and rest assured that we have either implemented the definitive solutions or are in the process of developing those solutions. To give you some more details on a few causes and their resolutions, on May 21, we encountered both the The Rathe and Tunare worlds had crashed. Upon investigation we discovered that character sizes had gotten too large and when we tried to load characters above a certain size, worlds would crash. Our normal warning systems for this unfortunately did not trigger and to resolve this, we increased the capacity of character sizes and performed the emergency update later that day to temporarily address the issue. We are currently in the process of converting this to a more dynamic solution that can handle an order of magnitude larger character sizes.
In March and April, we experienced a number of crashes across different worlds. These were caused by a few different issues: 1) our client hotfixing technology, 2) in-game raid invites, and 3) infrastructure hardware failures. With the client hotfixing, we had identified the very edge case scenario that caused world crashes and deployed a permanent fix in the April update.
For the in-game raid invite issues, we had been tracking this issue for more than a year and had completely re-tooled our diagnostic systems and re-wrote large chunks of this system to find the cause of this issue. Unfortunately, in this case, the final diagnostics code that we added did worsen stability but finally clued us in on the smoking gun that showed us the exact situation that had been causing this crash for so long. The fix for this issue was server hotfixed as needed in late April with the fix deployed to the rest of the servers in the May update.
In the matter of our infrastructure hardware failures, we are currently in the midst of refreshing a number of our systems and are working with our partners to build out more regular maintenance plans so that issues like these are minimized in the future. Earlier this year we had completed a number of hardware migrations ranging from as large as total data center moves to as small as replacing memory on a number of our systems. These conversions and moves are critical to the future health of our infrastructure.
Regarding our overall quality levels, this has been a regular battle that we've had to fight many times over. Thank you to the players that have so graciously volunteered their time to assist us over the years on our test server, in betas, and in reporting issues on our forums and in /bug. Recently, as our staff has converted to working from home, many of us are now voluntarily working a notably increased number of hours to make sure our schedules don't slip. We needed to convert people to full-time work from home setup to keep people safe. With the increased stress and the fact that we were not set up to handle everyone remotely working full time, our quality levels have been hurt. These are strange times and we are human. Frankly put, we've been making silly mistakes that we wouldn't have been making a matter of months ago. Not to fret, as I am confident in my team who have been very agile in doing everything they can to fix problems as they arise.
Going forward, we need time to work on the solutions that I've touched on above. I'm asking you to continue to have patience with us and to continue to work with us in these trying times. We're going to complete and release some projects that we feel are almost done and pause for a bit to catch our breath and analyze the pain points from a holistic view. I've asked the team to double down on our current efforts in stability, performance, and quality. As a result, some of you may notice a lack of new things in game for a bit while we focus on these issues. This doesn't mean that our normally scheduled annual plans at the end of the year are at risk. Just that there will be a pause for a bit. In the meantime, please accept our apology and a 25% XP bonus on all servers which we'll be activating in-game starting June 9 at 12 p.m. PT and ending June 17 at the time of the monthly update. We have a great legacy to uphold and I trust that with your help we can make it out of these challenging times and can all share in the raid loot together.
Never give up!
I apologize for the unprecedented issues that we've been experiencing. We’ve fixed some of them. We need more time to work on fixing the rest. Please be patient. Things will get better. We're giving you a 25% XP bonus. Never give up!