Server issues

remontoire · Aug 4, 2014

~CHOPPER~ said:
I work with highly specialized industrial grade PLC's. I can get next day hardware replacements, 4-6 weeks may be a norm for your company, but it would be highly irregular for any other (in USA)

Paying enough for support you can get 24h part replacements on Carribean islands (yep, i've seen that actually happen), but ordering a new system or an extra part to just about anywhere still easily takes over a week. And they very much said they are getting a new system, not replacing parts.

forgo · Aug 4, 2014

If it was say a major router or switch it is very possible that a week on a rush replacement blade or two would occur.

That stuff I know pretty well, and I suggest redundancy, buy more than you need. The company I worked for (now part french) is the one of leading suppliers, and those cards had a very high failure rate due to shoddy production. (somewhat better now but still an issue across the industry) We had infinite supply at the time and replaced them daily for the HQ concept network building with only about 5k users.

I know that routers can bottleneck things to this degree quite easily if you have a major hardware failure. But they do build these generally as shells anymore with replaceable parts, so that entire new systems are not always necessary.

Servers I have less experience with but they are the same design principle on hardware, easily replaceable once you narrow the faulty slot and get the replacement.

"Preparation work for upgrading portions of the Entropia Universe hardware infrastructure has now begun"

Basing this off of the statement above. Upgraded portions doesn't sound like new system, but could be wrong too. ) And we would prefer portions I think, replacing whole systems is a daunting process.

aridash · Aug 4, 2014

~CHOPPER~ said:
I work with highly specialized industrial grade PLC's. I can get next day hardware replacements, 4-6 weeks may be a norm for your company, but it would be highly irregular for any other (in USA)

from previous experience of bespoke stuff, it was far quicker, they often trade on service and have spare stuff built awaiting orders. the big suppliers, HP, Dell, IBM, like to keep inventory to a minimum so effectivly build and ship to order as far as i can tell, and dont have a warehouse full for ad hoc orders. oth, going through resellers can be quicker as they do have sheds of stuff ready. there are always ways and means, my point was that off-support purchase in a week isnt unreasonable. eitherway, i still find it odd that they are upgrading rather than fixing the HW issue.

~CHOPPER~ · Aug 4, 2014

remontoire said:
And they very much said they are getting a new system, not replacing parts.

aridash said:
the big suppliers, HP, Dell, IBM, like to keep inventory to a minimum so effectivly build and ship to order as far as i can tell, and dont have a warehouse full for ad hoc orders.

That makes sense, Thanks :tiphat:

Robin · Aug 6, 2014

But have they saved $ for parts and labor?

Aris_S · Aug 6, 2014

So MA is setting up new servers now or just a big universe-wide crash just happend?

Talania · Aug 6, 2014

Aris_S said:
So MA is setting up new servers now or just a big universe-wide crash just happend?

30 minute timer for servers down and no message why or for how long.

ShiityLiLMongo · Aug 6, 2014

I think people have a right to question why it takes a week to fix hardware problems..

I used to own an internet café which got hit by lightning and took out 30 pcs....I closed the shop drove 4 hours to my supplier brought $15k worth of hardware drove home worked all nite with just 2 of us and opened the shop the next day..

Get yur shiit togetha MA!!!!!!!!!!!

rage ends ty

Svarog · Aug 6, 2014

They ran out of monthly traffic limit on that smartphone they were running the server on :silly2:

Mariko Ninja · Aug 6, 2014

Talania said:
30 minute timer for servers down and no message why or for how long.

I guess Bertha Bot went on vacation too. :laugh:

I tried to login about 50 minutes ago, launch button on the client loader was grayed out.

Piotr · Aug 6, 2014

ShiityLiLMongo said:
I think people have a right to question why it takes a week to fix hardware problems..

I used to own an internet café which got hit by lightning and took out 30 pcs....I closed the shop drove 4 hours to my supplier brought $15k worth of hardware drove home worked all nite with just 2 of us and opened the shop the next day..

Guess buying PCs or buying serverpark is a wee bit different, just my 2 pecs

Sierra · Aug 6, 2014

ShiityLiLMongo said:
I think people have a right to question why it takes a week to fix hardware problems..

I used to own an internet café which got hit by lightning and took out 30 pcs....I closed the shop drove 4 hours to my supplier brought $15k worth of hardware drove home worked all nite with just 2 of us and opened the shop the next day..

Get yur shiit togetha MA!!!!!!!!!!!

rage ends ty

I was shocked actually, just came back after many years. I can't believe that a company who holds millions in deposits can't afford next day delivery on servers. Everywhere I've worked we've got servers setup and replaced very quickly. I hope they are using prebuilt build images and stuff to set up faster!

ktpsmf · Aug 6, 2014

ShiityLiLMongo said:
I think people have a right to question why it takes a week to fix hardware problems..

I used to own an internet café which got hit by lightning and took out 30 pcs....I closed the shop drove 4 hours to my supplier brought $15k worth of hardware drove home worked all nite with just 2 of us and opened the shop the next day..

Get yur shiit togetha MA!!!!!!!!!!!

rage ends ty

Just a guess :

Problem happened already long time ago, just noone noticed. It started like 1 month ago when lags started to be not so regular.

in my opinion there was a problem in one of hard drives, that caused hardware timeouts and that is why we started to have lags in first place.

one of raid hard drives died, it was holding an sql database . Since in this situation when one hard drive dies, there is another one, but in most situations the "Healthy" hard drive starts to work unproperly and makes some mistakes in main sql database.

After "damaged" hard drive crash, system started to run in safe mode , and system administrators loaded same database that was in Healthy hard drive, but what is nessesary to make is to compare database with log.

explanation : there is 2 database components - 1 . database , 2 . Log files
if database(1) crashes or was damaged some fields, it can be restored from logfile(2)

logfile is (text) document that describes every operation made to database.

In my opinion , 1 hour of log file should be bigger than 1gb.

So, if system administrators made everything in order, they should be run system (database <-> logfile) check. and if there was like 1 month size logfile, it is normalt that it took like 1 week to check all database tables,

because database in this case is so important , because what if even 1 number was changed like (pedcard nuber on 1 person became xxxxxxx times bigger.)

i think they made everythiing in order, in time they got new hardware , they was runing in safe mode + copying database once in 1 hour, that was making additional lags.

as i sayd before all operations was made in order, but if someone would listen my opinion, it would be dump lettig servers run on 90% usage , because any server issuse makes similar situation and makes people unhappy.
ma should run servers on 60% usage or above and additional money spending it would be more than loosing them because unhappy players.

ktpsmf · Aug 6, 2014

Sierra said:
I was shocked actually, just came back after many years. I can't believe that a company who holds millions in deposits can't afford next day delivery on servers. Everywhere I've worked we've got servers setup and replaced very quickly. I hope they are using prebuilt build images and stuff to set up faster!

restoring databases from logs is not same as buying new servers.
i think they should veryfy databases first before implementing it to new hardware.

ofc i agree that after this type of situation there should be clear explanation taht will clear this minds about Mindark inactivity.

Svarog · Aug 6, 2014

Piotr said:
Guess buying PCs or buying serverpark is a wee bit different, just my 2 pecs

It's even deeper than that. MA doesn't host their servers in-house, so it's not a matter of buying parts in a next-door computer shop and bringing them home. They rent a collocation from a big NL provider - not even in the same country, so it's understandable that it's going to take some time to get there with the new hardware.

SpikeBlack · Aug 6, 2014

I think people might confusing the bog standard Proliant DL or Dell PowerEdge style machines that anyone can buy off the shelf and the type of kit MA are using to run EU. A standard office server for a few hundred people might only cost a few thousand but I know of DB servers that cost $20,000 each and the company in question needed four, that type of kit typically isn't available off the shelf.

The database itself must be huge and they're probably using the previous backup and all the logs since then. To combine the two it's not a 5 minute job and doing it on new hardware is best option. The size of the DB is going to be huge with millions or even billions of records, look how long it took them to workout the scores for the MM in previous years during which time they had to take EU down in order to run the query.

As for the failed drive scenario typically in a server raid system you include at least one spare drive for failures so that when one starts to fail the spare takes over, the raid is automatically rebuilt and someone is notified by email.

Most likely is they're using a ESXi to host multiple virtual servers on multiple host machines. If the host server is dying the vm's will fail over to another server but will cause the remaining machines to slow due to the extra load. Most likely in the case this didn't happen and they had to restore the VM's from a backup on to another host and then use the logs to keep it going.

Klod · Aug 6, 2014

Is there a eta anywhere?

ermik · Aug 6, 2014

Its holiday times in sweden.

The servers are colocated to the Netherlands.

That combined is probably why its taking some time to migrate to new servers.

Makaveli · Aug 6, 2014

Servers are back online.

Mariko Ninja · Aug 6, 2014

SpikeBlack said:
I think people might confusing the bog standard Proliant DL or Dell PowerEdge style machines that anyone can buy off the shelf and the type of kit MA are using to run EU. A standard office server for a few hundred people might only cost a few thousand but I know of DB servers that cost $20,000 each and the company in question needed four, that type of kit typically isn't available off the shelf.

The database itself must be huge and they're probably using the previous backup and all the logs since then. To combine the two it's not a 5 minute job and doing it on new hardware is best option. The size of the DB is going to be huge with millions or even billions of records, look how long it took them to workout the scores for the MM in previous years during which time they had to take EU down in order to run the query.

As for the failed drive scenario typically in a server raid system you include at least one spare drive for failures so that when one starts to fail the spare takes over, the raid is automatically rebuilt and someone is notified by email.

Most likely is they're using a ESXi to host multiple virtual servers on multiple host machines. If the host server is dying the vm's will fail over to another server but will cause the remaining machines to slow due to the extra load. Most likely in the case this didn't happen and they had to restore the VM's from a backup on to another host and then use the logs to keep it going.

Indeed, they most likely look similar as these.

ktpsmf · Aug 6, 2014

Mariko Ninja said:
Indeed, they most likely look similar as these.

Few Scars · Aug 6, 2014

Auction opens instantly and changes pages instantly: Check
Inventory opens instantly: Check
All actions instant: Check

Thank you for upgrading MA, it was worth it. Can we see a picture of the new hardware?

tamlin · Aug 6, 2014

SpikeBlack said:
The database itself must be huge ...

I think it's not very large. From memory, they only hold transactions for the last six months online.

None of the "micro transactions" (you shooting a mob, dropping a bomb etc) are logged. Not even weapon or ammo decay is logged until either checkpointed (not sure they have started doing that) or until you are "logged off" an area server. Experience suggests that synchronization is performed only when you change area (8192x8192), or do an actual logoff. When the area server crashes, for whatever reason, your tools decay and ammo (bombs/probes too?) is reset to whatever it was at last synch.

Loot entering (and leaving, I'd hope) your inventory is however seemingly immediately sent to the "inventory server" (lacking better word), but that may not be universal and always logged either. F.ex. sweat could possibly be collated to only update backend db with transaction every 'n' bottles, or 'n' minutes, else sweating could generate a lot of "noise" (read: pretty much useless) db transactions.

So the transactions would be limited to (unless I forgot something) looting, buying, selling, and explicit area-server<->inv.server synching. On a totally unscientific and not even attempt at being calculated hip, I'd venture to guess just a few TB could suffice to hold both storage and DB logs for the last six months.

tamlin · Aug 6, 2014

In addition to Few Scars positive post about auction, which I agree with (auction currently is almost instant), another thing I noticed is storage. I can't remember last time storage was this responsive. If ever.

Neil Stockton · Aug 6, 2014

tamlin said:
None of the "micro transactions" (you shooting a mob, dropping a bomb etc) are logged. Not even weapon or ammo decay is logged until either checkpointed (not sure they have started doing that) or until you are "logged off" an area server. Experience suggests that synchronization is performed only when you change area (8192x8192), or do an actual logoff. When the area server crashes, for whatever reason, your tools decay and ammo (bombs/probes too?) is reset to whatever it was at last synch.

Just because tool decay and ammo are sometimes returned to you after a server crash does not mean that the system does not log this info.

Svarog · Aug 6, 2014

tamlin said:
None of the "micro transactions" (you shooting a mob, dropping a bomb etc) are logged. Not even weapon or ammo decay is logged until either checkpointed (not sure they have started doing that) or until you are "logged off" an area server. Experience suggests that synchronization is performed only when you change area (8192x8192), or do an actual logoff.

I always quit the game just by closing the game window (playing in window mode), sometimes in the middle of a hunt right after killing last mob, and never lost anything, which suggests continuous saving of all stats.

To not speak from pure observations, I just did 2 small tests before submitting this comment - 1) killed a mob and immediately closed the game with [X] button; 2) killed a mob and immediately killed EU from Windows Task Manager. In both cases the game had no time to save changes, yet upon login back both loot and decay were present, and the proper amount of ammo was subtracted.

narfi · Aug 6, 2014

Svarog said:
I always quit the game just by closing the game window (playing in window mode), sometimes in the middle of a hunt right after killing last mob, and never lost anything, which suggests continuous saving of all stats.

To not speak from pure observations, I just did 2 small tests before submitting this comment - 1) killed a mob and immediately closed the game with [X] button; 2) killed a mob and immediately killed EU from Windows Task Manager. In both cases the game had no time to save changes, yet upon login back both loot and decay were present, and the proper amount of ammo was subtracted.

I don't think it matters if you crash, only if the server you are on crashed.
If you crash/close game window, the server still has your info and will be saved at the normal interval.

ktpsmf · Aug 6, 2014

narfi said:
I don't think it matters if you crash, only if the server you are on crashed.
If you crash/close game window, the server still has your info and will be saved at the normal interval.

all games have "cashe" server to ats between game server part and database

it desides what need to put to database and what not. it defragments all fragmented various sql commands to one nice line.

if game crashed and your action was still in cache servers line, it will not be written.

that is if database crashes and you shoot mob that gives u no loot, probably your decay will return to you

tamlin · Aug 6, 2014

Neil Stockton said:
Just because tool decay and ammo are sometimes returned to you after a server crash does not mean that the system does not log this info.

I have enough data points to have verified that claim.

EDIT: Not to mention MA "support" themselves claiming they have no logs of such actions. At all.

Neil Stockton · Aug 6, 2014

tamlin said:
I have enough data points to have verified that claim.

Feel free to share them.

Server issues

Marauder

Elite

Slayer

Stalker

Prowler

Guardian

Stalker

Old Alpha

Slayer

Prowler

Elite

Dominant

Prowler

Prowler

Slayer

Elite

Stalker

Elite

Guest

Prowler

Prowler

Marauder

Prowler

Prowler

Stalker

Slayer

Elite

Prowler

Prowler

Stalker