Showing posts from June, 2009

SCCM 2007 install

Seems to have a lot of dependencies! Installing on a bare-bones 2008 x64 VM, and it requires: - Schema extensions (warning) - WSUS SDK on site server (warning) (needs WSUS, IIS (app/web), .net 3.0, SQL 2008 SP2, Report Viewer 2005 - MS Remote Differential Compression library registered (Features) - SQL server sysadmin rights (needed to add the current user to the SQL installation with sysadmin permisson) - SMS provider communication - IIS not running (Features) - BITS installed/enabled (Features) - WebDAV installed/enabled ( ) Update: Finally able to finish this. After the debacle with Microsoft and AD replication, it's working great so far. I'll be looking at building the Office2007 package this afternoon. ( Refs here ) Update: Turns out if you don't extend the schema (we were hoping to avoid that), then it just doesn't work as it should. So we've extended the schema, rebooted, and we'

Emergency migration!

Got a few calls/emails missed while I was out at Transformers 2 - the Exchange 5.5 box filled up!! it reached its 16GB limit, and shut down all mail functions...sheesh! So now the migration is happening now...11:35pmEST, instead of Friday morning. Jeepers! So instead of having 2-3 days to set up SCCM 2007 at their site, I have hours. Ah well, gotta have some excitement!

Exmerge and NT

So, lesson for y'all! When you are running exmerge on an NT box with Exchange 5.5, here's how it works: You give it the list of boxes you want out, and it then exports each box to a PST file wherever you want it to go. The hidden catch is: It removes the data from the Exchange store, and creates a PST file! So say you just want to test the utility, but not do anything, while it's functioning users are losing Outlook data left right and centre! Crazy fun times. Also, when you try to exmerge a user's mailbox that is larger than 2GB, exmerge crashes. If you restart exmerge, everything you took out prior to that will be overwritten (since you say 'sure, why not start from scratch'), but since you didn't know that exmerge removes the data from the Exchange store, you are effectively deleting all their emails. It's awesome. So then, after you've tried restarting the exmerge process a few times (deleting all users' emails earlier in line than the 2

Network timeouts for VM guests

This is SO weird. That SQL timeout issue was caused by network timeouts, and it disappeared when we moved the SQL server to another ESX host. However, the issue re-appeared, this time on the host with the vCenter guests... Further analysis reveals this interesting tidbit: 29/06/2009 12:08:29 PM -- Reply from bytes=32 time<1ms ttl=""> 29/06/2009 12:08:34 PM -- Request timed out. ... 29/06/2009 12:09:24 PM -- Request timed out. 29/06/2009 12:09:26 PM -- Reply from bytes=32 time=1028ms TTL=128 29/06/2009 12:38:29 PM -- Reply from bytes=32 time<1ms ttl=""> 29/06/2009 12:38:35 PM -- Request timed out. ... 29/06/2009 12:39:24 PM -- Request timed out. 29/06/2009 12:39:27 PM -- Reply from bytes=32 time=1445ms TTL=128 29/06/2009 1:08:14 PM -- Reply from bytes=32 time<1ms ttl=""> 29/06/2009 1:08:20 PM -- Request timed out. ... 29/06/2009 1:09:09 PM -- Request timed out. 29/06/2009 1:09:12 PM

More to report...

While troubleshooting, I've had the RDP connection drop out on me a few times, while the VM console is still relatively active. After this happening the last time I tried pinging it, and discovered that it was unreachable - timed out. I'm not sure how to get a ping log done with a timestamp without scripting, so for now we're going to try another VMnic on a different adapter, but same network. We'll give it the same IP and disable the old one. Ok, that's done - new vmnic on a different adapter, but same IP setup. I'm running another ping log. The last one had 5 instances of 10-20 request timed outs over three hours...should know more in an hour or two. Well it's still dropping the connection. So, we need to narrow the problem down to: VMware or the guest OS. I now have pinglogs going to all the guests on that ESX server, so if I see timeouts on all the VMs, then we know it's VMware, and if no timeouts occur, then we know it's SQL4 being silly.

More netlogon.log digging

We have been having GP (on the mentioned SQL4 box) heck over the last week or so, probably caused by all the network changes all at once, but who knows, so I'm digging! Clients are fed up with massive lag, errors, and general unusability. Up until a week ago or so, things were just peachy. While troubleshooting SQL connection issues, I've come across about 1000 entries in the netlogon.log from the past three days like this: 06/23 23:30:02 [LOGON] ORTHOTIC: SamLogon: Transitive Network logon of orthotic\sqlsvc from EXCHANGE (via EXCHANGE) Returns 0xC000006A 6A, recall, being bad password, and the account shows as locked out in the adlockout tool. So a service is trying to access something on Exchange from Exchange...weird. I took a look at one of the SQL boxes, and it uses this sqlsvc account for its SQL services. The security log in SQL shows no failed logon attempts. The event log is showing W32time warnings however - might not be that important - the clock is only off by

IPv6 and dcpromo

SO. LAME. We've been having issues for the last week because dcpromo was failing across the VPN. We thought it was the firewall causing the problem, so we got a completely new firewall - no dice. We then tried a VPN within the site-to-site VPN, and it worked! Really weird. Dan checked the ports using portquery, and they matched up just fine, no errors there. But when we did a dcpromo from a remote site in, the dcpromo would fail at the replication stage - it would time out and say 'RPC call ended'. Keep in mind this machine was able to join the domain just fine. So while on hold with Microsoft, we check the error logs, and nothing shows up. Then Dan starts checking the network settings, and BOOM! Lo and behold the DCs in our site have IPv6 disabled, and the server we're trying to bring up to DC status HAS IPv6 enabled!!! Disabled it, tried the dcpromo again, worked! It was promoted in 30 seconds...used to take 5 minutes to time out. When the lady came back to i

New firewall setup

We've moved our firewall duties over to a Cisco router with security features from our old ISA2006 server. It was getting clunky, buggy, and ridiculous. While this new method of ACLs is slightly more complex, it definitely won't have the issues the ISA server had. We're still maintaining our use of ISA for internal stuff, like the web proxy, and we'll be setting it up for DMZ usage as well, eventually. Silly Cisco router can only have 3 FA ports...even though it has the physical capability to run 6 (2 onboard & 2 twin-port cards). Odd that Cisco would do that...I guess force people into the higher-end routers. (yeesh...base 2801 is $3500...then add on some WICs...up around $4500 or more!) While it'd be great to run it all off the router, we've already gone over our budget with the SAN purchase, so it'd be best not to press our luck. It's actually quite nice, this setup. I'm looking forward to getting some info from our Cisco guy on proper A

SANs and ESX guests

So, learned a valuable lesson today: If you want to restart a SAN array, FOR GOODNESS SAKE power off any VM guests using the volumes it hosts. To flesh out the details, we are moving all our volumes off the loaner PS5000 and onto our new PS6000. I'm only seeing one interface being used, so figured that was a config error. Ensured all the eth interfaces were up and had addresses, and spoke to tech support about it. They said a restart of the array might help things. Well, they didn't mention shutting down attached guests first! I knew that you shouldn't, but it didn't click that our file server was using that volume, and should have been powered off first. I restart the array, and try to move a volume again, but it's still only using one eth interface, albeit a different one this time. It turns out, from another tech support rep, that when moving volumes the PS doesn't see that as a priority, and therefore only uses one eth interface to do so. Argh! So I s

Subnetting trick

I've not been doing any true networking for quite some time, so the bits I picked up in college have been growing mental mold in my head. I've forgotten much of what I learned about subnetting, so when I had to quickly figure out what slash notation our mask was, it was Google to the rescue, or more accurately, mark-scott to the rescue! I've summed it up below. Click the link for a much better explanation. Link to original content: So you know what the mask is, but you need to quickly figure out what the slash notation is? Basically, remember these numbers: 1 128 2 192 3 224 4 240 5 248 6 252 7 254 8 255 Consider each octet of the example: If you add 8 (255) + 8 (255) + 6 (252) = 22 So = /22 How cool is that??? Thanks mark-scott, wherever you are!

Peter's account logout saga - Episode four

Dear me. It has happened again. 06/22 10:14:31 [LOGON] ORTHOTIC: SamLogon: Transitive Network logon of orthotic\peter from ISA2006 (via ISA2006) Entered 06/22 10:14:31 [LOGON] ORTHOTIC: SamLogon: Transitive Network logon of orthotic\peter from ISA2006 (via ISA2006) Returns 0xC000006A Thank goodness we are replacing the old ISA server in the next few days. A few other notes: We have removed the web proxy and VPN functions from this ISA that leaves only the firewall function to be the cause of these logon attempts! Crickets.

VPN goes down after 2-3 hours - server 2008

Weird VPN error - which is more likely linked to VMWare or Server2008. We moved our VPN access over to a new Server 2008 x64 SP2 VM. It was working fine at first, but then after a few hours all VPN access would be lost. First step was to reboot the server, and that fixed it. But again, VPN would be down after a few hours. Some troubleshooting and checking things out shows that it's not VPN going down - it's network connectivity across the board on that VM. I can't even ping anymore! So it's a failure of the entire network stack - pretty rare, in my experience. Actually...I can't ping loopback when it's I guess there's something deeper here. I just spoke to Dan about it, he said he'd added a second nic, then removed it when things weren't working correctly. He's just going to blow away this VM and start again. Strange...well, the issue seems related to the missing nic, so not much can be done about that.

IP address scheme changeover

Today we're (I'm) changing over the rest of our servers to the new IP addressing scheme. This is noteworthy because we've only really done one so far - our Sharepoint box - and it broke a few unexpected things. Reason? We are going to change all the IPs to a new subnet range to clean things up. We're halfway there, and now we can't get past ISA blocking RPC due to us trying to access different subnets, rather, go across subnets. Just doesn't work properly. Kinda silly. Anyways, we had two options, make ISA disappear, or finish up the IP address change - something I was sure would break a lot of things. So far, things are going pretty well, but I've set it up so we do everything easy first! Changing the IP for the Exchange and BES servers is a little unnerving...but I think that Exchange pretty much exclusively uses DNS - I don't ever recall seeing statically set IP addresses, except in the TCP/IP settings. I've updated the static DNS records, so

Peter's lockout saga - Episode three

Well, it fixed itself. I unlocked his account at 8:30am today, and since then he is no longer getting locked out. So. Messed. Up. We've been messing about with the ISA server, so maybe that's what caused it to stop. Hopefully this will be the last we'll hear about it.

Peter's lockout saga - Episode two

Dan, our consultant, had a really good idea for temporarily helping me out with this. Move the user to a new OU. Create a new GPO with one change: account lockout policy is set to 0 (never lock out). Set the policy to enforced, and 'block inheritance'. Voila! Actually this didn't work. Shame. It seemed like a good idea. The issue with it is that the GPO is only applied on either: the computer, or the user. Since this request is coming from neither a computer nor a user, the GPO does not apply, and the lockouts continue. To continue that line of thought, what on earth is trying to use his account? Dan checked it out a bit more, and discovered (using more auditing) that it was the NETWORK SERVICE account using the Firewall PID. Really weird. As part of our network revamp process, we're going to be isolating the functions of ISA - namely just having it work as a web proxy, and move the firewall functions over to the Cisco router, with a few other bits in between. Mo

Technical version: EqualLogic PS series controller firmware update

This documents the process for updating your EqualLogic PS5000* to a newer firmware (from 4.0.6 to 4.1.4 in our case). Should work for the PS6000* as well. Note that the material here is from the document 'PS Series Storage Arrays - Updating Storage Array Firmware' by Dell EqualLogic, the tech case sent over from Carl at Dell, and my experiences from all this. It's pretty unnerving to have a key piece of hardware die on the ONE piece of equipment that stores everything in the company, but now that I've run through this once, it's not really all that bad to do. So if you have a controller update fail, it's probably just in need of the following! ------------------------------------------------------------------------------------------------- Here's the process for getting the firmware kit from the internet to the controller. 1. On your SAN monitor box (XP or whatever), download the firmware from EqualLogic's tech site (logging in is another story!!).

EqualLogic controller firmware update gone wrong

Yesterday evening we set aside three hours for: - Installation/cabling/setup of DRAC5 cards into our six ESX production machines, which involves shutting down all the ESX machines and their guests. - Firmware update of our EqualLogic PS5000X loaner SAN array to be done while all ESX machines were off. We had set aside 7pm to 10pm for this, plenty of time - or so I thought. At 7pm I shut everything down, and at 7:10pm I started pulling out servers for the DRAC install. Cabling confusion - we have two quad-nics in each ESX box, so there are eight cables going to the SAN switch from each machine. Tack on the two for the redundant LAN cabling, and you have ten CAT6 cables from each 1U machine - four total, ten from one 2950, and twelve from the other 2950. Lot of cables in a small space!! What I'm getting at is that the cables needed to be re&re'd into the same ports on the server, so they all had to be labeled. I know, we should be labling off the bat, but time and resou

Exchange/Outlook synchronization errors

I shut down our ESX hosts last night, and after the EqualLogic debacle, I powered everything back up. Exchange came up, but refused to send emails properly. I couldn't receive any emails at all, and sending them to my gmail account gave this: ------------------------ Please note that a critical failure occurred on the sender's messaging system at the virus scanning stage. The sender should be notified and reference made to this tracking number: src53_failed_f7e789cd-9a1a- 439c-a08a-ed68b4bbef39 Regards, GFI Content Security. ------------------------ While a very polite message, it failed to mention that GFI also does not provide technical support outside of business hours. They will not be renewed if that continues to be the case. With that, I googled, and checked out their Kbase - nothing. I tried stopping all the GFI services, nothing. Tried restarting the Information Store - nothing. Tried deleting my Outlook profile - nothing. Tried logging in on another computer and

Tracking down Peter's lockout

I've been driven bananas long enough by this problem. I have an entire evening with no interruptions (of course, also no sleep!) so I'm going to use the time to figure out WHERE on earth these lockouts are coming from. We are gathering info from: 1. The ADLockout tool that monitors his logon status from both DCs. It is used to unlock his account as well. 2. A full netlogon log (all options selected) pulled from the ADLockout tool from DC-1. 3. Other methods I'm sure will be used. By this point, we've deduced: 1. It's coming from ISA (Transitive Network logon of orthotic\peter from ISA2006 (via ISA2006)) 2. It's not a bad password on a mapped share, or him typing them in incorrectly. 3. His saved cached of passwords in XP is clear. 4. He is having the problem regardless of the state of his laptop or desktop (e.g. if both are off, lockouts continue). 5. He has checked to see if his credentials were still in use for a VPN session he helped someone set up

VM project progress

Well, so far this project has been a huge learning experience. We've been through subnet changes, shared file location changes, group policy changes, user My Docs location changes, hardware upgrades & installation, SAN configuration and best practices (still fuzzy on the latter), multipathing using the SAN, and the list goes on. It's been nice to actually use some of my Cisco training at last, although after three years of inactivity, my brain cells are fuzzy on that as well. We're finally at the final stage of the SAN project - ISA. It's our last production server to be virtualized, and will be the most complex, as it's the most connected to the physical world. D has been great setting it all up, hopefully my routes will do the trick! If only the Digi PortServer TS1 would worked flawlessly off the bat, but now that it's been installed/uninstalled a few times, doesn't want to connect anymore....must be something simple. I'm really enjo

VMconvert speeds

We've had some interesting luck with VMconverter. Seems we get good speeds off the bat, but eventually the speeds crawl. I'd be happy with 20MB/s, but we start around 17MB/s, then it drops down to 4MB/s, then 2MB/s. It's actually really funny...the timer went up to 54 minutes off the bat with 17MB/s, and two hours later it's up to 1:10 and 4MB/s. Awesome. The sunny side to all this is that, barring the 'semaphore' errors, it's worked no problem.

VMconvert and error: 'semaphore timeout period'

Well, discovered an interesting issue. When running VMConverter as the server version and P2Ving remote clients, I've run into an error: ' Error 0x80070079: The semaphore timeout period has expired ' Can't really find any reasoning behind this (possibly ISA blocking RPC), but if you install the converter on the local machine it seems to work just fine. The possible reason for it could be that our VMconverter server is running on one subnet, and the client is on another subnet. ISA has two IPs, one for each subnet, but that doesn't seem to matter. We have had RPC issues going from one subnet to the other already, so this could be a viable answer.

Remote users and their associated troubles

So we have a few remote users - only ever connect via VPN. This is usually fine, unless - group policy updates need to be done!! Dun dun dunnnn. Then it stinks. Here's how we get around it for now: (I've been told that RPC over HTTPS is the cat's meow, but we'll have to wait for 2008 R2 first.) 1. Connect to VPN. 2. Run gpupdate /force. 3. Reboot. 4. Log on using dial up connection, use VPN connection. 5. It works! Sounds easy, but took forever to figure out that's how it's supposed to be done. There is no real way to auto-run the VPN prior to logging on. You can set up a service to run, but there's no guarantees there. Another catch is to ensure that the VPN you're using is set up so that anyone on the computer can use it - if you choose 'only for me' when installing your VPN, it won't work, as anyone can use the Windows logon screen!